Genotype predicting method based on deep learning

A prediction method and deep learning technology, applied in the fields of informatics, bioinformatics, instruments, etc., can solve the problems of large computing resources, consumption, and time-consuming, and achieve the effect of solving computing resources, reducing computing time, and saving computing volume.

Pending Publication Date: 2019-11-01
成都二十三魔方生物科技有限公司
View PDF14 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a genotype prediction method based on deep learning, which solves the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genotype predicting method based on deep learning
  • Genotype predicting method based on deep learning
  • Genotype predicting method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] Such as Figure 1~2 Shown, a kind of deep learning-based genotype prediction method of the present invention comprises the following steps:

[0037] A: Construct a preliminary training set based on the collected gene fragments;

[0038] B: Perform gene phasing for the preliminary training set, and perform 0, 1, 0.5 encoding on the two haplotypes after gene phasing; divide the encoded data into a training set and a test set according to the ratio of 0.7:0.3;

[0039] C: Construct a neural network model according to the obtained adjacent SNP sites, and use a training set to train the model;

[0040] D: After the test set is processed in steps A and B, it is substituted into the trained neural network model, and the predicted value and model credibility of the test set are obtained;

[0041] E: Substitute the gene sequence that actually needs to be predicted into the model after processing, intercept the sites with a certain degree of reliability as effective prediction,...

Embodiment 2

[0070] Such as Figure 1~4 As shown, based on the method in Example 1, the randomly selected data of 2278 people, that is, 4556 haplotypes were tested, and the results were calculated for comparison.

[0071] The following provides relevant terminology explanations and descriptions:

[0072] Haplotype: Genetically, a combination of alleles that share multiple loci on the same chromosome

[0073] SNP: Single Nucleotide Polymorphism

[0074] Proceed as follows:

[0075] S1: Select the genetic database of 48906 individuals, select 1086 SNP sites as the center, and take 40 sites before and after each as the data set. If the target site is a null value, the data will be discarded. If the adjacent SNP site has a certainty rate of more than 10%, the site will be removed and a new site will be added as input. After cleaning, the effective target site is 896. ; the target site is taken out from the set as the Y value, and the rest as the X value;

[0076] S2: Perform gene phasing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a genotype predicting method based on deep learning. The genotype predicting method comprises the following steps of A, according to a collected gene fragment, constructing a preliminary training set; B, performing gene phase determining on the preliminary training set, and performing 0,1,0.5 coding on two haplotypes after gene phase determining, segmenting the coded data to a primary training set and a testing set; C, constructing a neural network model according to the acquired adjacent SNP sites, and training the model by means of the primary training set; D, processing the testing set in steps A and B, introducing the processed testing set into a trained neural network model, and calculating a predicted value and model credibility of the testing set; and E, processing an actual to-be-predicted gene sequence and introducing a model, intercepting a site with credibility which reaches a certain degree for effective predicting, and outputting a result. The genotype predicting method can settle problems of high calculating resource consumption and overlong time consumption in existing gene predicting technology.

Description

technical field [0001] The invention relates to the technical field of gene sequencing, in particular to a genotype prediction method based on deep learning. Background technique [0002] In the process of gene sequencing, the detection quality fluctuates due to external environmental factors such as pressure, temperature, and air, as well as internal factors such as chip treatment, dosage errors, and machine fixed error rates. The detected gene sequence is accompanied by some random sequencing deletions and a small amount of sequencing errors. Sequencing deletions and sequencing errors at key sites will affect the accuracy of gene interpretation results, so it is necessary to repair these genes. [0003] At present, IMPUTE supplementary testing technology is usually used to repair genes, but the existing IMPUTE supplementary testing technology is too slow to be widely used in large-scale gene prediction. This technology will make up for this defect. Contents of the inven...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B5/00G16B25/00
Inventor 叶伟健杨武兵王勉
Owner 成都二十三魔方生物科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products