Population classification of genetic data set using tree based spatial data structure

A genetic data and spatial data technology, applied in the field of genetic analysis and medical application, can solve the problem of not using complete genetic data sets, and achieve the effect of reducing computational complexity

Inactive Publication Date: 2015-04-22
KONINKLJIJKE PHILIPS NV
View PDF5 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Furthermore, population classification methods that rely on observing discrete genetic markers (e.g., population-specific indicator alleles) in a genetic dataset do not utilize the full genetic dataset in the population classification process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Population classification of genetic data set using tree based spatial data structure
  • Population classification of genetic data set using tree based spatial data structure
  • Population classification of genetic data set using tree based spatial data structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] refer to figure 1 , diagrammatically shows a system for generating a population classifier for classifying a genetic dataset. The system is suitably implemented by a computer or other electronic data processing apparatus 10 programmed to perform the disclosed processing operations and receives as input a plurality of genetic data sets 12 for members of a reference population. Genetic data sets can include, for example, genetic sequencing data (nuclear DNA data, mitochondrial DNA data, RNA data, methylation data, etc.), protein expression data generated using microarrays or other laboratory processing. In some embodiments, the genetic data set 12 includes a whole genome sequence WGS data set or other large amount of genetic sequences generated by a next generation sequencing device. Genetic data set 12 optionally may include more than one type of genetic data, for example, both sequencing data and microarray data. Genetic datasets 12 are substantially overlapping (ie, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Reference feature vectors are constructed representing reference genetic data sets of a reference population. The reference feature vectors are transformed using a linear transformation to generate reduced dimensionality vector representations of the reference genetic data sets of the reference population. A tree-based spatial data structure is constructed to index the reference genetic data sets as data points defined by at least some dimensions of the reduced dimensionality vector representations of the reference genetic data sets of the reference population. The linear transform may be generated by performing feature reduction on the reference feature vectors. A feature vector representing a proband genetic data set is transformed using the linear transformation to generate a reduced-dimensionality vector representation that is located in the tree-based spatial data structure to perform population assignment for the proband genetic data set.

Description

technical field [0001] The following generally relates to the field of genetic analysis, the field of medicine, and to applications in the field of genetic analysis, the field of medicine, for example including the field of oncology, the field of veterinary medicine and the like. Background technique [0002] Large genetic data sets for individuals can be acquired using technologies such as microarrays and "next generation" sequencing systems, where microarrays can generate tens to hundreds of thousands of genetic data points, e.g., each corresponding to a protein of interest "Next-generation" sequencing systems are capable of outputting large sequences and even entire genome sequences constituting bases of millions or more. From such datasets, various genetic markers such as single nucleotide polymorphisms (SNPs), copy number variations (CNVs), etc., which are medically tested, eg indicative of a particular type of cancer, can be identified. [0003] Interpretation of such...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/24G06F17/30G06F19/28G16B20/20G16B20/10G16B20/40G16B40/00
CPCG06F19/24G16B40/00G16B20/00G16B20/10G16B20/20G16B20/40
Inventor B·查克拉巴蒂P·穆尼亚帕S·库马尔R·辛格A·马特胡尔
Owner KONINKLJIJKE PHILIPS NV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products