Generic genome construction method and corresponding structural variation mining method thereof

A technology for structural variation and construction methods, applied in the fields of genomics, proteomics, instruments, etc., can solve the problems such as the difficulty of widespread application of pan-genome, the huge requirement of computing resources, and the difficulty of direct processing and analysis, and achieve large-scale accurate structural variation. Effectiveness of analysis, reduction of computational resource requirements, accurate structural variant analysis and identification

Active Publication Date: 2021-10-29
RICE RES INST GUANGDONG ACADEMY OF AGRI SCI
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This scheme can preserve more genomic variation within the population, but in terms of the presentation and subsequent utilization of these variations, the graphical pan-genome has great shortcomings
The first is that the graphical pan-genome organizes all genomic variations in an extremely complex manner, forming a complex multi-dimensional variation information structure, which is difficult for researchers to understand intuitively, and even more difficult to directly process and analyze, making this type of pan-genome extremely difficult to be widely used in research. application
In addition, the graphical pan-genome requires huge computing resources during the application process, which limits its application in large-scale and extensive analysis.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generic genome construction method and corresponding structural variation mining method thereof
  • Generic genome construction method and corresponding structural variation mining method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] Construction of rice pan-genome:

[0035] Taking the rice Nipponbare genome (IRGSP1.0, downloaded from https: / / rapdb.dna.affrc.go.jp / website, the genome sequence file is Nipponbare.fasta), L32 and P106 are complete assemblies of two rice varieties The genome and genome sequence files are L32.fasta and P106.fasta, respectively.

[0036] The pan-genome is constructed as follows:

[0037] 1) Generate a file named location.lg, the file information is as follows:

[0038] Mummer= / home / lfp / soft / mummer-4.0.0beta2 /

[0039] Lastz= / home / lfp / soft / lastz / src /

[0040] svmu= / home / lfp / soft / svmu /

[0041] bowtie2= / home / lfp / miniconda3 / bin /

[0042] Samtools= / home / lfp / miniconda3 / bin /

[0043] ref=Nipponbare.fasta

[0044] query=L32.fasta, P106.fasta

[0045]This file is used to set the location of executable files of Mummer, Lastz, svmu, bowtie2 and Samtools software, and is used for calling during operation. Set ref (reference genome) to Nipponbare.fasta, query (comparison ge...

Embodiment 2

[0064] A mining method for genome structure variation based on Illumina next-generation sequencing data and pan-genome:

[0065] a) Taking the pan-genome sequence constructed in Example 1 as the reference genome, and using the Illumina sequencing data of rice material R91, the genome structure variation analysis and identification of R91 was carried out. Use the comparison software Bowtie2 to compare the R91 sequencing data to the pan-genome and generate a comparison file; it was found that the data ratio of the original reference genome (Nipponbare) on R91 was only 82.52%, while the data ratio on the pan-genome was compared Reached 93.25%. It proves that the pan-genome constructed in Example 1 is more complete and representative than the original reference genome (Nipponbare), can significantly improve the efficiency of sequencing data comparison, and provide an important data basis for capturing more structural variations;

[0066] b) According to the variation information ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of genome data analysis, and particularly relates to a generic genome construction method and a corresponding structural variation mining method thereof, structural variation obtained through genome comparison is put back to a linear genome, meanwhile, a structural variation site information file is added, and an efficient analysis generic genome which is linearized in the form and considers various structure variation forms is constructed. The generic genome not only can capture more brand-new structural variations which are not found by a reference genome, but also better displays the captured structural variations by combining a linearization method with a variation site information file, so that the constructed generic genome is easier to understand and analyze, and is more beneficial to subsequent application; according to the method and the process for analyzing the genome structure variation based on the generic genome and the second-generation sequencing data, a complete program code is compiled, and the efficient and accurate mining of the structure variation based on the relatively low-cost second-generation sequencing data is realized.

Description

technical field [0001] The invention belongs to the technical field of genomic data analysis, and in particular relates to a pan-genome construction method and a corresponding structural variation mining method. Background technique [0002] Pan-genome refers to the sum of all genomic variations in a population. By capturing and presenting all genomic variations in a population, pan-genome provides a complete reference genome that includes all genomic variations in a population for functional genomics research. Pan-genome has important applications in genome variation analysis, especially in genome structure variation analysis. [0003] At present, the strategies and technologies for pan-genome construction have great limitations. Common technologies include the strategy of iterative assembly using next-generation sequencing data, and the map-to-pan strategy of comparing the reference genome with sequencing data. However, the pan-genome constructed by these technologies is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/20G16B40/00
CPCG16B20/20G16B40/00
Inventor 赵均良李方平王健杨武刘斌杨梯丰陈洛
Owner RICE RES INST GUANGDONG ACADEMY OF AGRI SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products