Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Ancestor source polymorphism prediction method based on big data artificial intelligence algorithm

A technology of artificial intelligence and predictive methods, applied in the field of big data research

Active Publication Date: 2019-07-09
成都二十三魔方生物科技有限公司
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide an ancestral polymorphism prediction method based on a big data artificial intelligence algorithm, and to predict the ancestral polymorphism by constructing an artificial intelligence algorithm. This method can overcome the disadvantages of existing population polymorphism methods Groups with very similar genetics, such as genetic polymorphisms in China, can also refine the prediction of polymorphisms in chromosome genome segments. This method plays an important role in the study of gene groups and the location of genome segments in certain population-associated diseases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ancestor source polymorphism prediction method based on big data artificial intelligence algorithm
  • Ancestor source polymorphism prediction method based on big data artificial intelligence algorithm
  • Ancestor source polymorphism prediction method based on big data artificial intelligence algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] like figure 1 As shown, the present invention is based on the ancestral polymorphism prediction method of big data artificial intelligence algorithm, comprises the following steps:

[0043] A: According to the population genome data, construct a reference population genetic data training set, and record the population samples of the training set;

[0044]B: Carry out gene orientation on the population samples of the training set, divide the genomic data of each sample into two haplotypes, and encode the two haplotypes after orientation with 1 and -1, and at the same time, according to the number of SNPs The genome is divided into windows;

[0045] C: Select the best classifier through the voting strategy, form a window observation sequence, and use the result of the classifier as the next input;

[0046] D: Construct the transfer matrix and emission matrix of the window, establish a hidden Markov model, and use the hidden Markov model to correct the observation sequen...

Embodiment 2

[0077] like Figure 1~2 As shown, based on the method in Example 1, the present invention is used to predict ancestral polymorphisms in the public database of thousands of people, and the results are calculated and performance comparisons are made.

[0078] The following provides relevant terminology explanations and descriptions:

[0079] 1000 Genome Project: Thousand Genomes Project

[0080] HGDP: Human Genome Diversity Project

[0081] Haplotype: Genetically, a combination of alleles that share multiple loci on the same chromosome

[0082] SVM: Support Vector Machine, a machine learning method for artificial intelligence

[0083] HMM: Hidden Markov Model

[0084] SNP: Single Nucleotide Polymorphism

[0085] CHB: Northern Han

[0086] CHS: Southern Han

[0087] JPT: Japanese

[0088] S1. According to the 1000 Genomes Project database, obtain the genome data of the public population, and obtain the population polymorphism label corresponding to each sample; select thr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an ancestor polymorphism prediction method based on a big data artificial intelligence algorithm, which comprises the following steps of: A, constructing a training set according to population genome data; B, performing gene orientation on the group samples of the training set, and performing 1 and -1 encoding on the two haplotypes after orientation, and performing window division on the genome at the same time; c, selecting an optimal classifier through a voting strategy to form a window observation sequence, and taking the result of the classifier as the next step ofinput; d, constructing a transfer matrix and an emission matrix of the window, and establishing a hidden Markov model; and E, predicting probability distribution of a hidden state through the hidden Markov model, solving an optimal ancestor source result label, and outputting the optimal ancestor source result label as a final result. Through the method, the defect that an existing population polymorphism method has an important effect on genetic population research and positioning of some population-associated disease genome fragments can be overcome, and meanwhile the polymorphism of the chromosome genome fragments can be finely predicted.

Description

technical field [0001] The invention relates to the technical field of big data research, in particular to an ancestor polymorphism prediction method based on a big data artificial intelligence algorithm. Background technique [0002] Since humans came out of Africa, each evolution of the human race may carry different gene mutations, and each change of genetic information may cause species diversity. With the continuous reproduction of thousands of generations of humans, there are more and more mutations in the human genome, which may lead to differences among different groups of humans. Predicting the genetic diversity of different populations and accurately positioning and judging the genetic diversity of chromosomal genome segments is beneficial to population research and analysis, as well as the location of disease genes on the genome. [0003] The current population polymorphism prediction algorithm cannot effectively distinguish some genetically similar groups. For e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/12
CPCG06N3/126
Inventor 叶伟健杨武兵王勉
Owner 成都二十三魔方生物科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products