Methods of detecting structural variations in genomic regions

A structural variation and genomic technology, applied in the field of bioinformatics, can solve the problems of lack of structural variation detection methods, false positives of structural variation detection software, affecting the application of structural variation detection, etc.

Pending Publication Date: 2021-02-09
GUANGZHOU BURNING ROCK DX CO LTD
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, structural variation detection software using one or more of the above signals has significant false positive problems, which greatly affects the application of structural variation detection in the field of precisio...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods of detecting structural variations in genomic regions
  • Methods of detecting structural variations in genomic regions
  • Methods of detecting structural variations in genomic regions

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0133] Example 1. Structural variation detection method

[0134] In the examples of the present disclosure, the following method (markSV method) was used to detect structural variation in genomic regions.

[0135] 1. Sequence comparison file generation: After the samples to be tested undergo library preparation, they are sequenced on the Illumina sequencing platform to generate FASTQ files. After performing quality control on the FASTQ files, use the comparison software BWA-MEM to compare the FASTQ files with the human reference genome (hg19 / b37) and generate a SAM file. After the SAM file is converted into a BAM file with samtools software, the BAM file is used as the input file for subsequent detection.

[0136] 2. Insert length outlier calculation: Read the read length in the BAM file to estimate the parameters of the insert length distribution and the threshold of outliers. If the BAM file contains more than 1 million read lengths, 1 million reads are randomly select...

Embodiment 2

[0159] Example 2. Simulation data performance confirmation

[0160] The simulated data covers 79 common fusion genes, a total of 11.81M regions. Simulate various SV types, and a total of 20,000 cases including SVs ranging in size from 50bp to 1,000,000bp. The sequencing depth was set at 200x, and the SV abundance was set at 50%. The method described in Example 1 (markSV method) was compared with other three mainstream SV analysis software. The three SV analysis software are Delly v0.7.9, Lumpy v0.2.13, and Manta v1.4.0. All analyzes were performed with default parameters. Keep the results of FILTER as PASS in the VCF file, where Lumpy does not set FILTER, keep all the results. The analysis results are as follows:

[0161] Table 1. Sensitivity and precision of markSV on 20,000 cases of broad-spectrum simulated data

[0162]

[0163] Table 2. Sensitivity and precision of four SV analysis software in 20,000 cases of broad-spectrum simulation data

[0164] SV...

Embodiment 3

[0166] Embodiment 3. Standard product data

[0167] The standard product data is prepared by serial dilution of HD-C670 cell line mixed samples containing two fusions (EML4-ALK, CD74-ROS1), and the diluted abundance is calibrated by the ddPCR platform. In order to ensure reproducibility, each gradient was repeated several times under different machine batches, different reagent batches, different experimental operators, etc., with a total of 58 samples. The result is as follows:

[0168] Table 3. Summary of Standards Data

[0169]

[0170] The method described in Example 1 (markSV method) was compared with other three mainstream SV analysis software. The three SV analysis software are Delly v0.7.9, Lumpy v0.2.13, and Manta v1.4.0. All were analyzed with the default analysis parameters. Keep the results of FILTER as PASS in the VCF file, where Lumpy does not set FILTER, keep all the results. The analysis results are as follows:

[0171] Table 4. Sensitivity and preci...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present disclosure relates to methods of detecting structural variations in genomic regions, including deletions, repetitions, inversions, and translocations, using high throughput sequencing data. The present disclosure also provides systems, devices, and computer-readable media for detecting structural variations in genomic regions.

Description

Technical field: [0001] The present invention belongs to the field of bioinformatics, and in particular relates to methods and systems for detecting structural variation (SV) in genomic regions using high-throughput sequencing data. [0002] technical background: [0003] Structural variants (SV) are one of the main forms of variation in the human genome. Structural variations include deletions, duplications, inversions, and translocations of chromosomal segments. Structural variants are present in both normal and tumor genomes, although they occur more frequently in tumor genomes. Structural variations in some genes may be associated with genetic risk and sensitivity to targeted therapy. [0004] High-throughput sequencing technology, also known as next-generation sequencing technology (NGS), provides convenience for low-cost and large-scale detection of structural variations. In addition, detection methods based on next-generation sequencing technology can detect structu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/20G16B20/30G06K9/62
CPCG16B20/20G16B20/30G06F18/23
Inventor 魏从翀刘成林张周毕腾腾王洪明张之宏揣少坤汉雨生
Owner GUANGZHOU BURNING ROCK DX CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products