Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Perl language based automation analysis method for population-specific SNP loci

An automatic analysis and population technology, applied in the fields of genomics, instrumentation, proteomics, etc., can solve the problem of not identifying population-specific SNPs, etc., and achieve the effect of improving convenience, data processing efficiency and server usage efficiency

Inactive Publication Date: 2019-03-08
SHANGHAI PASSION BIOTECHNOLOGY CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] But now there are a lot of software for identifying SNP sites of a single sample, but there is no process for identifying population-specific SNPs directly based on the SNP site information of a single sample

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Perl language based automation analysis method for population-specific SNP loci

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0022] see figure 1 , a kind of automatic analysis method based on the population-specific SNP site of perl language shown in the figure, comprises the following steps:

[0023] (1) Prepare raw data for analysis;

[0024] The data is a vcf file, which is a common file format for storing variation site information in bioinformatics analysis. Compatible with the result files of all mainstream software for identifying variant sites (samtools / GATK, etc.).

[0025] (2) Set the criteria for SNP site filtering;

[0026] mainly:

[0027] a. The accuracy of the SNP site, that is, the GQ value in the vcf file, the default is 30;

[0028] b. The depth of SNP site sequencing, that is, the DP value in the vcf file, the default is 10;

[0029] c. Whether to consider the case of missing sequencing, that is, the case where the genotype in the vcf file is . / ., it is not considered by default.

[0030] (3) Define specific criteria;

[0031] That is, set the values ​​of thresholds A and B...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a perl language based automation analysis method for population-specific SNP loci. The method specifically includes the following steps: (1) filtering a sample; (2) defining specificity; (3) defining comparison populations; and (4) selecting comparison modes. The beneficial effects of the method are as follows: 1, the method is a perl script based automation analysis method, can perform fully-automatic analysis on vcf files, and can enhance data processing efficiency and server using efficiency; 2, the input files of the method are variant vcf file formats and can be inseamless connection with the variation result files generated by all existing mainstream software, so that the convenience of analysis can be greatly enhanced; and 3, different parameter settings arereserved by the flow in the method, so that two populations comparison can be realized, and three or more populations comparison can be realized as well, and therefore, needs with different researchpurposes can be met.

Description

technical field [0001] The invention relates to the field of high-throughput sequencing biological information analysis in the technical field of molecular biology, and specifically refers to an automatic analysis method for population-specific SNP sites based on perl language. Background technique [0002] The second-generation sequencing technology represented by Illumina, also known as high-throughput sequencing technology, has the advantages of high throughput, low cost, and short sequencing time, and is currently widely used in the field of molecular biology. With the help of high-throughput sequencing technology, scientists can quickly obtain the genome information of each research sample, which greatly promotes the development of molecular biology. [0003] Population refers to a group of individuals with relatively similar genetic background in a species in a natural state or in artificial classification. Population-specific molecular markers refer to molecular mark...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B20/30
Inventor 刘坤艳
Owner SHANGHAI PASSION BIOTECHNOLOGY CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products