Sequencing data mutation analysis system

A technology for analyzing systems and sequencing data, applied in the field of biomedicine, which can solve problems such as low number of variants and false positives

Active Publication Date: 2018-11-30
PEKING UNION MEDICAL COLLEGE HOSPITAL CHINESE ACAD OF MEDICAL SCI
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Human diseases are related to these variants. Although millions of variants can be found, there is a certain degree of false positives, and the number of variants that are actually related to diseases is very small

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sequencing data mutation analysis system
  • Sequencing data mutation analysis system
  • Sequencing data mutation analysis system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0157] Example 1 A sequencing data mutation analysis system

[0158] A sequencing data mutation analysis system, the analysis system includes a file renaming module, a quality control module, a sequence comparison module, a mutation detection module, a mutation annotation module, a scoring and rating module, a filtering module, and a mutation comment comment module;

[0159]The file renaming module is used to unify the sequence number into the analysis number, merge multiple fastq files of the same sample to establish a corresponding table of sequence number and sample number, and name it id.dic. Through the Python script, the fastq file named after the sequence number The file is renamed to a fastq file named after the sample number (or other analysis numbers). At the same time, if a sample is used on the machine multiple times or on multiple lanes, it can be automatically merged, as long as the sample numbers are consistent.

[0160] The quality control module data pruning u...

Embodiment 2

[0219] Embodiment 2 specific running example

[0220] 1. Data introduction

[0221] Data Type: Whole Exome Sequencing

[0222] Tissue source: DNA from blood of brain arteriovenous malformation (BAVM) patients and their parents

[0223] Experimental Design: Exon Capture Sequencing

[0224] Sequencing platform: Illumina HiSeq 4000

[0225] 2. System use

[0226] The whole exome velocimetry data analysis process is as follows: figure 2 The process shown includes: renaming of sequencing data, quality assessment and control of sequencing data, detection and annotation of mutations, scoring and comments of mutations, etc. Next, use the function modules integrated in the software to realize each analysis step step by step:

[0227] 1) Use the file renaming module to name the sequencing number as the analysis number, unify the naming format, and merge multiple fastq files of the same sample to establish a correspondence table between the sequencing number and the sample number;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a sequencing data analysis system. The analysis system comprises a file renaming module, a quality control module, a sequence comparison module, a mutation detection module, amutation annotation module, a scoring and grading module, a filtering module and a mutation comment remarking module. According to the system, mutations can be detected and annotated from fastq-formatted data of a sequencing original lower computer according to a single sample or family samples, and the mutations can be scored; and after quality control, a file containing all the mutations of thesamples and annotation information and scoring information of the mutations as well as a file containing all rare mutations of the samples and annotation information and scoring information of the rare mutations are obtained finally, so that it is convenient to mine the information in sequencing data more quickly, accurately and comprehensively.

Description

technical field [0001] The invention belongs to the field of biomedicine and relates to a gene mutation analysis system. Background technique [0002] With the development of sequencing technology and the reduction of cost, in the field of human health, human genome sequencing will become the mainstream trend in the future, and precision medicine will be the ultimate goal of sequencing. Therefore, how to accurately discover the variations in the sequencing results and comprehensively annotate the discovered variations has become a necessary means to achieve precision medicine. [0003] The discovery of variant sites refers to the search for different base types at the same positions in the human individual genome and the human reference genome. These variant sites may be pathogenic sites that affect human health or cause human diseases. Based on the next-generation sequencing technology, the sequence obtained by sequencing is compared with the genome, and the difference bas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/20
Inventor 吴南吴志宏邱贵兴赵森吴勇闫子慧杨欣壮
Owner PEKING UNION MEDICAL COLLEGE HOSPITAL CHINESE ACAD OF MEDICAL SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products