A method for identifying intra-individual SNPs in Sanger sequencing of diploid PCR products

A recognition method and diploid technology, applied in character and pattern recognition, biological neural network models, instruments, etc., can solve problems such as inability to analyze sequencing files and unsuitable sequencing

Inactive Publication Date: 2016-09-14
SOUTH CHINA AGRI UNIV +1
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, none of these software can analyze a single sequencing file. For example, novoSNP and Mutation Surveyor require a reference sequence, which is not feasible when the reference gene sequence is sequenced and the measured sequence contains introns (not in the gene sequence); PolyPhred 5.0 The comparison of more than 8 sequencing files is required to accurately interpret the SNP, which is not suitable for sequencing of a single or a small number of samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for identifying intra-individual SNPs in Sanger sequencing of diploid PCR products
  • A method for identifying intra-individual SNPs in Sanger sequencing of diploid PCR products
  • A method for identifying intra-individual SNPs in Sanger sequencing of diploid PCR products

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The present invention will be further described below in conjunction with specific examples.

[0069] The method for identifying SNPs in individuals in the Sanger sequencing of diploid PCR products described in this embodiment is as follows:

[0070] 1) Separating the fluorescence data of the four bases of adenine A, guanine G, cytosine C and thymine T from the chromatograms of diploid PCR product Sanger sequencing; the original data were sequencers from Applied Biosystems The generated sequencing chromatogram file with the extension .ab1 complies with the ABIF file format. Refer to the "Applied Biosystems Genetic Analysis Data File Format" released by the company in September 2009 to obtain the directory for storing file information. The directory contains The name of the file, the data type of the element, the number of elements and other related attributes can be used to separate the fluorescence data of A, G, T and C through information such as the element byte, the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for identifying SNPs in individuals in Sanger sequencing of diploid PCR products. First, the adenine A, guanine G, cytosine C and thymine T contained in the chromatogram are separately isolated. Fluorescence data of different bases; use wavelet multi-scale analysis method to filter and denoise the separated fluorescence data respectively; then analyze the waveform characteristics of the fluorescence data of four bases, detect the first peak and the second peak of the waveform, select The three waveform features of peak distance, height ratio and fluctuation ratio are used as the elements of SNP site discrimination; the BP neural network with a structure of 3‑10‑1 is selected as the classifier for SNP site detection, and the Levenberg Marquardt algorithm is used to Train the BP neural network; use piecewise linear transformation to map the output to the SNP evaluation score of 0-100, define the category of the SNP site as 1-5 according to the evaluation score, and judge the SNP confidence of the site accordingly Spend. The invention can effectively detect the SNP site in the individual of the sequencing file.

Description

technical field [0001] The invention belongs to the field of computer automatic recognition, relates to bioinformatics, pattern recognition, statistics, signal processing and computer software technology, and in particular relates to a method for targeting diploid polymerase chains when there is no reference sequence and only a few samples. A method for identifying single nucleotide polymorphisms (Single nucleotide polymorphisms, SNPs) within individuals in Sanger sequencing of Polymerase chain reaction (PCR) products. Background technique [0002] SNP refers to the variation (or polymorphism) caused by the substitution of a single nucleotide at the level of genetic material DNA. SNP has the characteristics of universality, representativeness, heredity, stability, etc., and reflects rich genetic information. The most common heritable variation, SNPs have also become widely used genetic markers. SNPs may lead to differences in individual phenotypes. For example, SNPs may be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/46G06N3/02
Inventor 邓继忠甘四明黄华盛李梅于晓丽袁之报金济
Owner SOUTH CHINA AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products