SNP marker combinations for inferring populations in different geographic regions of Asia
A technology based on geographic regions and populations, applied in the biological field, can solve problems such as difficulty in applying genetic analysis of super-large sample populations, large demand for DNA samples, and high cost of genome-wide SNP analysis
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] In this example, six geographical populations of East Asia, West Asia, South Asia, North Asia, Central Asia, and Southeast Asia islands were screened out from the 1000 Genomes Project (1000GP), EGDP, HGDP, SGDP, SSIP, and SSMP databases, and the genomes of a total of 2276 samples were obtained. Horizontal SNP data. The population and sample size included in each region are shown in Table 1. After the integration and merging of data from different sources, site quality control and individual screening, an original data set consisting of 349,381 SNP points from 2,128 unrelated individuals was formed for the subsequent construction of indicative SNP combinations of geographical ancestry.
[0042] 1000GP: the 1000Genomes Project, Thousand Genomes Project, A global reference for human genetic variation, Nature 526(7571)(2015)68-74.
[0043] EGDP: Estonian Biocentre Human Genome Diversity Panel, Genomic analyzes inform on migration events during thepeopling of Eurasia, Natur...
Embodiment 2
[0059] Example 1 extracts SNP combinations that are useful for ancestry inference from a total of 349,381 SNPs. The algorithm can weigh the ancestry inference ability of the SNP itself and the information overlap between different SNPs to obtain the best combined inference effect. Add the screened SNPs one by one, and calculate the average classification accuracy rate AAC, and get the curve as figure 1 shown. The classification accuracy AC is defined as the ratio of the number of correctly classified samples to the total number of test samples,
[0060]
[0061] The average classification accuracy (AAC) is defined as the average value obtained by repeatedly calculating the AC value 1000 times when the test set is randomly selected.
[0062] In this example, three methods are used to evaluate the performance of the SNP reference system obtained in Examples 1-3. The first way is to directly compare the real ancestry with the predicted ancestry; the second way is to calcula...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com