Annotation method and annotation system of whole-genome variant data

A whole-genome and reference-genome technology, applied in the field of bioinformatics, can solve problems such as too many software, not including specific populations, and no guiding annotation suggestions, etc., to achieve the effect of improving accuracy and completeness

Inactive Publication Date: 2016-11-23
天津诺禾医学检验所有限公司
View PDF3 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Existing annotation tools such as Annovar developed and widely used by the Children's Hospital of Philadelphia in the United States provide gene structure annotations based on the above points; population mutation frequency annotations, but do not include frequencies of specific populations (such as Chinese populations); There are many software provided for value annotations, and there are no guiding annotation suggestions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Annotation method and annotation system of whole-genome variant data
  • Annotation method and annotation system of whole-genome variant data
  • Annotation method and annotation system of whole-genome variant data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0035]According to a typical embodiment of the present invention, a method for annotating whole-genome variation data is provided. The method includes the following steps: S1, creating a variation data file: adopting the international standard VCF format to store the variation data as an input file; S2, segmenting multi-allelic genotypes: first performing genotype judgment, and using bases that are consistent with the reference genome 0 means that the bases inconsistent with the reference genome are represented by 1, 2, 3..., and then the multi-allelic types of SNP and InDel are split, so that the allelic types are represented by 0 and 1; S3, InDel Occurrence position normalization: use Leftalignment&Parsimony's normalization method to normalize InDel occurrence position; and S4, annotation: perform gene structure annotation, allele frequency annotation, harmfulness prediction of variant sites, and pathogenicity annotation.

[0036] A situation where there are multiple genotyp...

Embodiment 1

[0057] This embodiment integrates modules and software such as the norm module in bzgip (v1.0), tabix (v1.0), BCFtools (v1.0), ANNOVAR software (version 2015-03-22), self-written program, and integrates A variety of open databases and internal databases, running under the Linux system.

[0058] The following detailed description of the annotation method of the present embodiment (such as figure 1 shown):

[0059] 1) Variation data file: It is stored in the international standard VCF4.1 format as an input file; population, disease, and gender are optional input parameters.

[0060] 2) Multi-Allele (Multi-Allele) Genotype Segmentation: One allele (Allele) has multiple genotypes (Genotype); in the same or different populations / populations, different genes of alleles The phenotype frequency is different, which may lead to different phenotypes (Phenotype), different diseases or morbidity, so it is necessary to classify Multi-Allele. Firstly, the genotype is judged, the bases con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an annotation method and an annotation system of whole-genome variant data. The method comprises the following steps of S1, creating a variant data file, wherein the variant data are stored according to a national standard VCF format as an input file; S2, performing multi-allele genotyping, firstly performing genotype judgment, representing a basic group which is consistent with a reference genome by zero, and representing the basic groups which are inconsistent with the reference gene group by 1, 2, 3,..., then performing SNP and InDel multi-allele type resolution so that the allele type is represented by zero and one; S3, causing InDel generation position normalization, namely performing InDel generation position normalization according to a left justification and simplification normalization method; and S4, performing annotation, namely performing gene structure annotation, allele frequency annotation, variable site harm prediction and pathogenicity annotation. The annotation method and the annotation system improve integrity and accuracy of annotation information.

Description

technical field [0001] The present invention relates to the technical field of bioinformatics, in particular to an annotation method and annotation system for whole-genome variation data. Background technique [0002] With the development of sequencing technology and the reduction of cost, in the field of human health, human whole genome sequencing will surely become the mainstream trend in the future, and precision medicine will be the ultimate goal of sequencing. Accurately annotating the variation of the human genome is a necessary means to achieve precision medicine. [0003] The discovery of variant sites refers to the search for different base types at the same positions in the human individual genome and the human reference genome. These variant sites may be pathogenic sites that affect human health or cause human diseases. Based on the next-generation sequencing technology, the sequence obtained by sequencing is compared with the genome, and the difference bases bet...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 相深杨俊辉吴俊
Owner 天津诺禾医学检验所有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products