A method for detecting g-quadruplexes and i-motifs in the genome of wheat
By utilizing in situ cleavage technology of BG4/iMab antibody and pA-Tn5 fusion protein in wheat cell nuclei, combined with high-throughput sequencing and sequence feature recognition, the problem of detecting G4/iM structure in wheat genome was solved, and efficient and accurate whole-genome distribution mapping was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHEAST NORMAL UNIVERSITY
- Filing Date
- 2026-04-20
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies for studying the G4/iM structure in plant genomes, especially wheat, suffer from problems such as formaldehyde cross-linking potentially damaging the natural nucleic acid structure, cumbersome experimental procedures, large material requirements, and low signal-to-noise ratio, making it difficult to achieve efficient and accurate whole-genome distribution detection.
The BG4/iMab antibody was used to identify the G4/iM structure in the intact cell nucleus of wheat samples, and the pA-Tn5 fusion protein was used to cleave it in situ near the binding site. Combined with high-throughput sequencing and regular expression recognition based on sequence features, the experimental steps were simplified and the signal-to-noise ratio was improved.
This technology enables efficient and precise capture of the entire G4/iM genome distribution in living plant cells, resulting in more realistic structural maps, reduced sample size, improved signal-to-noise ratio, and overcoming the shortcomings of existing technologies.
Smart Images

Figure CN122279082A_ABST
Abstract
Description
[0001] This application claims priority to Chinese Patent Application No. 202512049391.7, filed on December 31, 2025, entitled "A Method for Detecting G-Quadrimers and i-motifs in the Wheat Genome". The entire contents of that priority application are incorporated herein by reference. Technical Field
[0002] This invention relates to the field of gene detection technology, and in particular to a method for detecting G-quadruplexes and i-motifs in the wheat genome. Background Technology
[0003] Atypical secondary structures of DNA (such as G-quadruplexes (G4) and i-motifs (iM)) play important roles in gene expression regulation, and their research has been extensive in the fields of medicine and basic biology. Current techniques for studying G4 / iM structures in plants, especially crops with complex genomes (such as wheat), mainly rely on in vitro prediction, chemical probes, or chromatin immunoprecipitation (ChIP) based techniques.
[0004] Existing methods such as ChIP-seq can be used to study protein-DNA interactions, but they have obvious drawbacks when applied to dynamic nucleic acid structures such as G4 / iM: (1) Formaldehyde cross-linking is required, which may destroy the natural nucleic acid structure or introduce artifacts; (2) The experimental steps are complicated, involving multiple steps such as chromatin fragmentation and immunoprecipitation, requiring a large amount of materials, which is not friendly to plants that are difficult to obtain, such as wheat; (3) The signal-to-noise ratio is low, especially when studying non-protein bound nucleic acid structures, the background is high.
[0005] Therefore, there is an urgent need for a technical method that can capture the entire G4 / iM genome distribution in situ, efficiently, and with a high signal-to-noise ratio in living plant cells, in order to overcome the problems of structural distortion, insufficient sensitivity, and large material consumption caused by cross-linking and complex operations in existing technologies, and to achieve accurate mapping of these dynamic nucleic acid structures in complex plant genomes. Summary of the Invention
[0006] This invention provides a method for detecting G-quadruplexes and i-motifs in the wheat genome, thereby overcoming the shortcomings of existing technologies.
[0007] This invention provides a method for detecting G-quadruplexes and i-motifs in the wheat genome, comprising: Raw sequencing data of specific DNA fragments from wheat samples were obtained, including DNA fragments that were specifically bound by BG4 or iMab antibodies and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The raw sequencing data were compared to the wheat reference genome, and regions enriched with G-quadruplexes and / or i-motifs were detected. Based on sequence characteristics, sequence data of G-quadruplexes and / or i-motifs are identified in enriched regions of G-quadruplexes and / or i-motifs.
[0008] According to the method for detecting G-quadruplexes and i-motifs in the wheat genome provided by the present invention, after obtaining the raw sequencing data of specific DNA fragments from the wheat sample, the method further includes the following steps: Preprocess the raw sequencing data.
[0009] According to the present invention, a method for detecting G-quadruplexes and i-motifs in the wheat genome, wherein the preprocessing of the raw sequencing data includes: Remove sequencing adapters from the raw sequencing data; and / or, Quality filtering is performed on the raw sequencing data.
[0010] According to the present invention, a method for detecting G-quadruplexes and i-motifs in the wheat genome, wherein the method identifies sequence data of G-quadruplexes and / or i-motifs in enriched regions based on sequence characteristics, includes: Based on sequence features, G-quadruplex sequence data are identified in the enriched regions of G-quadruplexes using the G4 regular expression, where the G4 regular expression is: G3+N1-15G3+N1-15G3+N1-15G3+.
[0011] According to the present invention, a method for detecting G-quadruplexes and i-motifs in the wheat genome, wherein the method identifies sequence data of G-quadruplexes and / or i-motifs in enriched regions based on sequence characteristics, includes: Based on sequence features, i-motif sequence data are identified in the enriched regions of G-quadruplexes using the iM regular expression, where the iM regular expression is: C3+N1-12C3+N1-12C3+N1-12C3+.
[0012] A method for detecting G-quadruplexes and i-motifs in the wheat genome according to the present invention further includes the following steps: Based on the sequence data of G-quadruplexes and / or i-motifs, the proportion of wheat samples containing G-quadruplexes and / or i-motifs was statistically analyzed.
[0013] A method for detecting G-quadruplexes and i-motifs in the wheat genome according to the present invention further includes the following steps: Based on the sequence data of the G-quadruplex and / or i-motif, the whole genome G4 / iM structure map of the wheat sample was obtained.
[0014] The present invention also provides a system for detecting G-quadruplexes and i-motifs in the wheat genome, comprising: The data acquisition module is used to: acquire raw sequencing data of specific DNA fragments from wheat samples, wherein the specific DNA fragments include DNA fragments that are specifically bound by BG4 antibody or iMab antibody and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The enriched region detection module is used to: compare the raw sequencing data to the wheat reference genome and detect enriched regions of G-quadruplexes and / or i-motifs; The sequence recognition module is used to: identify sequence data of G-quadruplexes and / or i-motifs in enriched regions of G-quadruplexes and / or i-motifs based on sequence features.
[0015] The present invention also provides an electronic device, including a processor and a memory storing a computer program, wherein the processor executes the computer program to implement any of the methods described above for detecting G-quadruplexes and i-motifs in the wheat genome.
[0016] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described above for detecting G-quadruplexes and i-motifs in the wheat genome.
[0017] The present invention also provides a computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to perform any of the methods described above for detecting G-quadruplexes and i-motifs in the wheat genome.
[0018] The present invention provides a method for detecting G-quadruplexes and i-motifs in the wheat genome, which can bring at least the following beneficial effects: This invention directly utilizes a specific antibody (BG4 / iMab) to recognize and target the G4 / iM structure within intact cell nuclei that have not undergone formaldehyde cross-linking, and then uses the pA-Tn5 fusion protein to perform in-situ cleavage near the binding site. This process preserves the native state of nucleic acid secondary structures to the greatest extent possible, overcoming the fundamental deficiency of existing ChIP-seq technologies that may damage or distort these dynamic and fragile structures due to chemical cross-linking, thereby obtaining more realistic and reliable information on the distribution of genome-wide structures.
[0019] This invention employs a CUT&Tag-based workflow, completing antibody binding and DNA tagging within the cell nucleus. The experimental steps are simplified, resulting in low non-specific background. Compared to traditional ChIP-seq, this invention eliminates the need for cumbersome chromatin fragmentation, immunoprecipitation, and multiple washing steps, reducing the required sample size from grams to below. Furthermore, the simplified procedures and high efficiency of in vivo labeling significantly improve the signal-to-noise ratio for detecting non-protein-binding structures like G4 / iM, enabling more sensitive capture of weak enriched signals in the genome.
[0020] This invention combines antibody-mediated in situ cleavage technology with wheat genomics research, and develops a customized bioinformatics analysis workflow (such as sequence identification based on specific regular expressions) tailored to the high GC content and sequence characteristics of the wheat genome. This invention overcomes the analytical challenges of the wheat genome's large size, numerous repetitive sequences, and high subgenomic homology, accurately locating the peaks of G4 / iM structures in the genome and identifying their core sequence motifs, thus successfully mapping the first whole-genome G4 / iM structure atlas of wheat, filling a technological and knowledge gap in this field.
[0021] This invention integrates the entire technology chain from sample processing, library construction, high-throughput sequencing to specific data analysis. By combining experimentally obtained peak calling information with sequence rule-based motif scanning (regular expression matching), it achieves mutual verification between experimental evidence and computational predictions, significantly improving the reliability and accuracy of the results. Simultaneously, this invention lays a solid data foundation for subsequent multi-omics association analyses (such as integration with transcriptome and open chromatin data) to further explore the biological functions of G4 / iM.
[0022] Based on the whole-genome G4 / iM structure map, the principles of artificial simplification and modular assembly of plant chromosomes can be further established, laying a theoretical foundation for analyzing the chromosome structure-function coupling mechanism and constructing plant artificial chromosomes.
[0023] In summary, this invention effectively overcomes a series of bottlenecks in existing methods for studying the G4 / iM structure of plants (especially wheat), such as crosslinking damage, cumbersome procedures, high background noise, and difficulty in applying to complex genomes. It provides a reliable and powerful tool for efficiently and accurately analyzing the secondary structure landscape of whole-genome DNA in crops such as wheat under in vivo and in situ conditions, and has significant application value.
[0024] In a further application of this invention, the genome-wide distribution map of the wheat genome G-quadruplex (G4) and / or iM structure can serve as an important reference for plant genome structure design. Based on the enrichment characteristics of the G4 / iM structure in the genome, chromosomes can be divided into functional regions, thereby guiding the design of plant artificial chromosomes or optimized genome structures.
[0025] Specifically, genomic regions can be classified according to the distribution density of G4 / iM structures, including G4 / iM-rich regions and G4 / iM-low-density regions. The G4 / iM-rich regions can be preferentially retained as potential regulatory functional regions, while the G4 / iM-low-density regions can be used as candidate redundant regions for further analysis.
[0026] Furthermore, the G4 / iM sequence data identified and obtained based on this invention can also serve as potential regulatory elements for gene expression regulation-related research and functional element design, thereby providing data support and theoretical basis for the construction and modular design of plant artificial chromosomes. Attached Figure Description
[0027] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0028] Figure 1 This is a flowchart illustrating a method for detecting G-quadruplexes and i-motifs in the wheat genome, provided by the present invention.
[0029] Figure 2 This is a schematic diagram of the structure of a system for detecting G-quadruplexes and i-motifs in the wheat genome, provided by the present invention.
[0030] Figure 3 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0031] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, embodiments of this invention, and should not be construed as limiting the invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention. In the description of this invention, it should be understood that the terminology used is for descriptive purposes only and should not be construed as indicating or implying relative importance.
[0032] Figure 1 This invention provides a schematic flowchart of a method for detecting G-quadruplexes and i-motifs in the wheat genome. The execution entity of this method for detecting G-quadruplexes and i-motifs in the wheat genome can be any suitable terminal-side device or network-side device, such as a device for detecting G-quadruplexes and i-motifs in the wheat genome.
[0033] See Figure 1 The present invention provides a method for detecting G-quadruplexes and i-motifs in the wheat genome, which may include: S110. Obtain raw sequencing data of specific DNA fragments from wheat samples, wherein the specific DNA fragments include DNA fragments that are specifically bound by BG4 antibody or iMab antibody and subsequently cleaved by pA-Tn5 fusion protein near the binding site.
[0034] In this embodiment, the method for obtaining specific DNA fragments from wheat samples is as follows.
[0035] Chinese spring seeds were disinfected with sodium hypochlorite for 15 minutes, then rinsed three times with distilled water. After disinfection, the seeds were spread in 9cm round dishes with moistened filter paper and placed in the dark at room temperature. After the seeds began to sprout, they were transferred to a 22°C dark incubator and cultured until they reached about 10cm in height. The seedlings were then harvested and stored at -80°C for use in the construction of the CUT&Tag library.
[0036] The CUT&Tag library construction method is as follows: 1g of seedlings were rapidly frozen in liquid nitrogen and ground into a fine powder in a mortar. The powder was transferred to a 50ml tube containing 12.5ml Honda Buffer. After gently shaking on a shaker at 4℃ for 5 minutes, the tube was filtered through gauze and the filter cloth was rinsed with 2.5mL buffer. After collecting the filtrate, the tube was centrifuged at 4℃, 2000 × g, with slow acceleration and deceleration for 10 minutes to collect the cell nuclei. The nuclei were then washed at least twice with Honda Buffer. Subsequently, the nuclei were resuspended in 1mL Dig-wash buffer, incubated on ice for 10 minutes, and then centrifuged at 4℃, 500 × g for 5 minutes. Then, 100 μL of antibody buffer was added to each sample, diluted 1:50 with BG4 / iMab specific antibody or IgG control antibody. The reaction mixture was incubated overnight on a rotary mixer at 4℃. Then, anti-flag was added at a dilution ratio of 1:100. The reaction system was incubated on a 4 ℃ rotary mixer for 1 h. After centrifugation to remove the liquid, the secondary antibody was added to Dig-wash buffer at a dilution ratio of 1:100. The cell nuclei were then resuspended in the mixture and incubated at 4 ℃ for 4 h. After the reaction, the cells were washed twice with Dig-wash buffer. After washing, the cell nuclei were resuspended in 100 μL Dig-300 buffer with 2.5 μL of pA-Tn5 complex added and incubated at room temperature for 1 h. After that, the cells were washed twice with Dig-300 buffer, and then 300 μL of pre-chilled tagmentation buffer was added. The cells were incubated on a shaker at 37 ℃ for 4 h. After the reaction, 10 μL of 0.5 M EDTA, 3 μL of 10% SDS, and 2.5 μL of 20 mg / mL proteinase K were added to each sample, and the cells were incubated at 58 ℃ for 1 h to digest the protein. Then, add 300 μL of a phenol / chloroform / isoamyl alcohol (25:24:1) mixture, vortex for 5 seconds, transfer to a lock-in centrifuge tube, and centrifuge at 16000 × g for 5 minutes at room temperature. After transferring the upper aqueous phase, add another 300 μL of chloroform, gently invert about 10 times to mix, and allow to separate into layers. Transfer the upper liquid (approximately 300 μL) to a new centrifuge tube, and then construct a sequencing library using the NEB library preparation kit. The library is then placed on the MGI platform (BGI) for 150 bp paired-end sequencing to obtain raw sequencing data of specific DNA fragments from the wheat sample.
[0037] S120. Preprocess the raw sequencing data, wherein the preprocessing includes removing sequencing adapters from the raw sequencing data; and / or, perform quality filtering on the raw sequencing data.
[0038] The data cleaning process provides clean and reliable input for subsequent analysis.
[0039] S130. The preprocessed sequencing data is compared to the wheat reference genome, and the enriched regions of G-quadruplex and / or i-motif are detected.
[0040] In this embodiment, the preprocessed sequencing data is compared to the wheat reference genome using Bowtie2 bioinformatics software. That is, millions of locationless short reads are "anchored" to the physical location of the wheat reference genome to achieve spatial localization.
[0041] S140. Based on sequence characteristics, identify sequence data of G-quadruplexes and / or i-motifs in enriched regions of G-quadruplexes and / or i-motifs.
[0042] The output of step S130 (alignment file) shows the location of each read, but the signal for the G4 / iM structure is continuous and enriched. Step S140 (SEACR Peak Calling) uses a statistical model to identify those significantly enriched continuous genomic regions (peaks) from the scattered reads. These peaks are the genomic regions where the antibody is likely to capture the G4 or iM structure.
[0043] In one embodiment, G-quadruplex sequence data can be identified in enriched regions of G-quadruplexes using a G4 regular expression, wherein the G4 regular expression is: G3+N1-15G3+N1-15G3+N1-15G3+, The G4 regular expression means 3 or more G3 nucleotides plus 1-15 other arbitrary nucleotides repeated.
[0044] In one embodiment, i-motif sequence data can be identified in the enriched region of the G-quadruplex using an iM regular expression, wherein the iM regular expression is: C3+N1-12C3+N1-12C3+N1-12C3+, The iM regular expression means: 3 or more C nucleotides + 1-15 repetitions of any other nucleotides.
[0045] Step S130 identifies "suspected" G4 / iM regions, but further sequence evidence is needed. Therefore, this embodiment uses step S140 (custom script identification) to scan the DNA sequences within these peak regions, searching for specific sequences that conform to the G4 or iM regular expression. This provides a sequence structure basis (molecular evidence) for experimental enrichment signals (functional evidence). The two corroborate each other, greatly enhancing the credibility of the invention.
[0046] S150. Based on the sequence data of G-quadruplexes and / or i-motifs, calculate the proportion of wheat samples containing G-quadruplexes and / or i-motifs, using the formula: the number of G-quadruplex sequences divided by the total number of sequences, and the number of i-motif sequences divided by the total number of sequences.
[0047] S160. Based on the sequence data of the G-quadruplex and / or i-motif, the whole genome G4 / iM structure map of the wheat sample is obtained.
[0048] The output of step S140 is a series of coordinates and sequences. By performing correlation analysis on these sequence data with other data layers (such as open chromatin regions of ATAC-seq, chromatin interaction information of CAP-C, and existing gene annotations), the functions of G4 and iM in gene regulation can be explored.
[0049] The present invention provides a method for detecting G-quadruplexes and i-motifs in the wheat genome, which can bring at least the following beneficial effects: This invention directly utilizes a specific antibody (BG4 / iMab) to recognize and target the G4 / iM structure within intact cell nuclei that have not undergone formaldehyde cross-linking, and then uses the pA-Tn5 fusion protein to perform in-situ cleavage near the binding site. This process preserves the native state of nucleic acid secondary structures to the greatest extent possible, overcoming the fundamental deficiency of existing ChIP-seq technologies that may damage or distort these dynamic and fragile structures due to chemical cross-linking, thereby obtaining more realistic and reliable information on the distribution of genome-wide structures.
[0050] This invention employs a CUT&Tag-based workflow, completing antibody binding and DNA tagging within the cell nucleus. The experimental steps are simplified, resulting in low non-specific background. Compared to traditional ChIP-seq, this invention eliminates the need for cumbersome chromatin fragmentation, immunoprecipitation, and multiple washing steps, reducing the required sample size from grams to below. Furthermore, the simplified procedures and high efficiency of in vivo labeling significantly improve the signal-to-noise ratio for detecting non-protein-binding structures like G4 / iM, enabling more sensitive capture of weak enriched signals in the genome.
[0051] This invention combines antibody-mediated in situ cleavage technology with wheat genomics research, and develops a customized bioinformatics analysis workflow (such as sequence identification based on specific regular expressions) tailored to the high GC content and sequence characteristics of the wheat genome. This invention overcomes the analytical challenges of the wheat genome's large size, numerous repetitive sequences, and high subgenomic homology, accurately locating the peaks of G4 / iM structures in the genome and identifying their core sequence motifs, thus successfully mapping the first whole-genome G4 / iM structure atlas of wheat, filling a technological and knowledge gap in this field.
[0052] This invention integrates the entire technology chain from sample processing, library construction, high-throughput sequencing to specific data analysis. By combining experimentally obtained peak calling information with sequence rule-based motif scanning (regular expression matching), it achieves mutual verification between experimental evidence and computational predictions, significantly improving the reliability and accuracy of the results. Simultaneously, this invention lays a solid data foundation for subsequent multi-omics association analyses (such as integration with transcriptome and open chromatin data) to further explore the biological functions of G4 / iM.
[0053] In summary, this invention effectively overcomes a series of bottlenecks in existing methods for studying the G4 / iM structure of plants (especially wheat), such as crosslinking damage, cumbersome procedures, high background noise, and difficulty in applying to complex genomes. It provides a reliable and powerful tool for efficiently and accurately analyzing the secondary structure landscape of whole-genome DNA in crops such as wheat under in vivo and in situ conditions, and has significant application value.
[0054] The system for detecting G-quadruplexes and i-motifs in the wheat genome provided by this invention is described below. The system for detecting G-quadruplexes and i-motifs in the wheat genome described below can be referred to in correspondence with the method for detecting G-quadruplexes and i-motifs in the wheat genome described above.
[0055] See Figure 2 The present invention provides a system for detecting G-quadruplexes and i-motifs in the wheat genome, which may include: The data acquisition module is used to: acquire raw sequencing data of specific DNA fragments from wheat samples, wherein the specific DNA fragments include DNA fragments that are specifically bound by BG4 antibody or iMab antibody and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The enriched region detection module is used to: compare the raw sequencing data to the wheat reference genome and detect enriched regions of G-quadruplexes and / or i-motifs; The sequence recognition module is used to: identify sequence data of G-quadruplexes and / or i-motifs in enriched regions of G-quadruplexes and / or i-motifs based on sequence features.
[0056] Figure 3 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 3As shown, the electronic device may include a processor 810, a communications interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communications interface 820, and the memory 830 communicate with each other via the communication bus 840. The processor 810 can call logical instructions in the memory 830 to execute the following steps: Raw sequencing data of specific DNA fragments from wheat samples were obtained, including DNA fragments that were specifically bound by BG4 or iMab antibodies and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The raw sequencing data were compared to the wheat reference genome, and regions enriched with G-quadruplexes and / or i-motifs were detected. Based on sequence characteristics, sequence data of G-quadruplexes and / or i-motifs are identified in enriched regions of G-quadruplexes and / or i-motifs.
[0057] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0058] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being able to be stored on a non-transitory computer-readable storage medium, and the computer program being executed by a processor, enabling the computer to perform the following steps: Raw sequencing data of specific DNA fragments from wheat samples were obtained, including DNA fragments that were specifically bound by BG4 or iMab antibodies and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The raw sequencing data were compared to the wheat reference genome, and regions enriched with G-quadruplexes and / or i-motifs were detected. Based on sequence characteristics, sequence data of G-quadruplexes and / or i-motifs are identified in enriched regions of G-quadruplexes and / or i-motifs.
[0059] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the following steps: Raw sequencing data of specific DNA fragments from wheat samples were obtained, including DNA fragments that were specifically bound by BG4 or iMab antibodies and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The raw sequencing data were compared to the wheat reference genome, and regions enriched with G-quadruplexes and / or i-motifs were detected. Based on sequence characteristics, sequence data of G-quadruplexes and / or i-motifs are identified in enriched regions of G-quadruplexes and / or i-motifs.
[0060] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0061] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0062] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for detecting G-quadruplexes and i-motifs in the wheat genome, characterized in that, include: Raw sequencing data of specific DNA fragments from wheat samples were obtained, including DNA fragments that were specifically bound by BG4 or iMab antibodies and subsequently cleaved by pA-Tn5 fusion protein near the binding site. The raw sequencing data were compared to the wheat reference genome, and regions enriched with G-quadruplexes and / or i-motifs were detected. Based on sequence characteristics, sequence data of G-quadruplexes and / or i-motifs are identified in enriched regions of G-quadruplexes and / or i-motifs.
2. The method for detecting G-quadruplexes and i-motifs in the wheat genome according to claim 1, characterized in that, After obtaining the raw sequencing data of specific DNA fragments from the wheat sample, the following steps are also included: Preprocess the raw sequencing data.
3. The method for detecting G-quadruplexes and i-motifs in the wheat genome according to claim 2, characterized in that, The preprocessing of the raw sequencing data includes: Remove sequencing adapters from the raw sequencing data; and / or, Quality filtering is performed on the raw sequencing data.
4. The method for detecting G-quadruplexes and i-motifs in the wheat genome according to any one of claims 1-3, characterized in that, The method of identifying sequence data of G-quadruplexes and / or i-motifs in enriched regions of G-quadruplexes and / or i-motifs based on sequence features includes: Based on sequence features, G-quadruplex sequence data are identified in the enriched regions of G-quadruplexes using the G4 regular expression, where the G4 regular expression is: G3+N1-15G3+N1-15G3+N1-15G3+.
5. The method for detecting G-quadruplexes and i-motifs in the wheat genome according to any one of claims 1-3, characterized in that, The method of identifying sequence data of G-quadruplexes and / or i-motifs in enriched regions of G-quadruplexes and / or i-motifs based on sequence features includes: Based on sequence features, i-motif sequence data are identified in the enriched regions of G-quadruplexes using the iM regular expression, where the iM regular expression is: C3+N1-12C3+N1-12C3+N1-12C3+.
6. The method for detecting G-quadruplexes and i-motifs in the wheat genome according to any one of claims 1-3, characterized in that, It also includes the following steps: Based on the sequence data of G-quadruplexes and / or i-motifs, the proportion of wheat samples containing G-quadruplexes and / or i-motifs was statistically analyzed.
7. The method for detecting G-quadruplexes and i-motifs in the wheat genome according to any one of claims 1-3, characterized in that, It also includes the following steps: Based on the sequence data of the G-quadruplex and / or i-motif, the whole genome G4 / iM structure map of the wheat sample was obtained.
8. The application of the method according to any one of claims 1-7 in plant genome structure design, characterized in that: The genome-wide distribution information of G-quadruplexes and / or i-motif structures obtained by the method is used to divide the plant genome into functional regions and guide chromosome structure optimization or artificial chromosome design.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method for detecting G-quadruplexes and i-motifs in the wheat genome as described in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method for detecting G-quadruplexes and i-motifs in the wheat genome as described in any one of claims 1 to 7.