Whole-exome sequencing data analysis method

A technology for sequencing data and analysis methods, applied in electrical digital data processing, special data processing applications, instruments, etc., and can solve problems such as failure to identify low-frequency pathogenic variants and pathogenic variants

Inactive Publication Date: 2016-09-07
WANKANGYUAN TIANJIN GENE TECH CO LTD
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the limitations of GWAS are: most of the identified association sites are located in the intergenic regions, introns, and regulatory regions of the genome; secondly, the probes of the chip are designed based on currently known (most of them are common SNPs) , failure to identify low-frequency pathogenic variants and novel pathogenic variants

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Whole-exome sequencing data analysis method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] It should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other.

[0026] The present invention will be described in detail below with reference to the accompanying drawings and examples.

[0027] Such as figure 1 As shown, the whole exome sequencing data analysis process of the present invention includes: sequencing data quality assessment and control, high-quality read screening, read comparison to the reference genome, searching for genomic variation, searching for somatic mutations in paired samples, and calculating copy number variation , Function annotation and other processes. Below, each analysis step is implemented step-by-step using the integrated software modules.

[0028] (1) Quality control of raw sequencing data. For the whole exome data in this example, the fastQC module was used to evaluate the quality of the sequencing. For example, the quality of the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a whole-exome sequencing data analysis method. The method comprises the following steps of 1) quality control of sequencing data; 2) genome mapping of the sequencing data; 3) seeking of high-confidence genome mutation by the sequencing data; and 4) annotation of mutation sites. According to the method, the analysis of large-scale data is finished through simple parameter submitting, wherein the analysis of the large-scale data comprises quality detection of original data, data denoising and genome mapping of sequencing read; an upstream part takes over original sequencing data of a lower machine; the analysis of the sequencing data is finished through a parameter automated submitting and analysis module; and candidate pathogenic mutation sites and related genes are output, thereby providing a basis for later experimental verification.

Description

technical field [0001] The invention belongs to the field of gene information data processing, and in particular relates to a whole exome sequencing data analysis method. Background technique [0002] With the completion of the Human Genome Project and the construction of the international human haplotype map, the prediction and functional research of disease susceptibility loci by analyzing genome information has been rapidly promoted. This type of research is mainly based on biochip-based genotyping technology, using genome-wide association analysis (GWAS) methods to find genetic factors associated with complex diseases. With the increasing density of probes in biochips, especially the design of shingled probes, the mining of disease risk sites is becoming more and more comprehensive. However, the limitations of GWAS are: most of the identified association sites are located in the intergenic regions, introns, and regulatory regions of the genome; secondly, the probes of t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/20
CPCG16B25/00
Inventor 薛成海雷文婕刘婷婷
Owner WANKANGYUAN TIANJIN GENE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products