A
system and method for determining the exact pair of alleles corresponding to polymorphic genes from
sequencing data and for using the polymorphic
gene information in formulating an immunogenic composition. Reads from a
sequencing data set mapping to the target polymorphic genes in a canonical
reference genome sequence, and reads mapping within a defined threshold of the
target gene sequence locations are extracted from the
sequencing data set. Additionally, all reads from the set
data set are matched against a probe reference set, and those reads that match with a high
degree of similarity are extracted. Either one, or a union of both these sets of extracted reads are included in a final extracted set for further analysis. Ethnicity of the individual may be inferred based on the available sequencing data which may then serve as a basis for assigning prior probabilities to the
allele variants. The extracted reads are aligned to a
gene reference set of all known
allele variants. The
allele variant that maximizes a first
posterior probability or
posterior probability derived
score is selected as the first allele variant. A second
posterior probability or posterior probability derived
score is calculated for reads that map to one or more other allele variants and the first allele variant using
a weighting factor. The allele that maximizes the second posterior probability or posterior probability
score is selected as the second allele variant.A
system and method for identifying somatic changes in polymorphic loci using WES data. The exact pair of alleles corresponding to the polymorphic
gene are determined as described using a normal or
germline sample from an individual. A tumor or otherwise diseased sample is also retrieved from the individual and the corresponding WES data is generated. Reads corresponding to the polymorphic gene are extracted as described in the
paragraph above. These reads are then aligned to the inferred pair of allele sequences. The alignment of the
germline or normal reads to the inferred pair of alleles, along with the alignment of the tumor or diseased reads to the inferred pair of alleles are simultaneously used as inputs to somatic
change detection algorithms to identify somatic changes with greater precision and sensitivity.