Unlock instant, AI-driven research and patent intelligence for your innovation.

Transcript Determination Method

a transcript and determination method technology, applied in the field of transcript determination methods, can solve problems such as large number of parameters, and achieve the effect of accurate assessment of transcript abundan

Inactive Publication Date: 2016-11-10
LEXOGEN GMBH
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The patent describes a way to estimate the length of a transcript by using a sum function that is made up of probability distribution functions. The start and end positions of the transcript can be estimated based on the genetic coordinate where the partial area is a fraction of the whole area under the curve of the sum function or the first or last probability distribution function. This method can be used with different shapes of the probability distribution function and can be tested using model nucleic acids with known start and end positions. The estimated length of the transcript can be useful for understanding its function and structure.

Problems solved by technology

This results in a large number of parameters that need to be trained even for a single transcript.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transcript Determination Method
  • Transcript Determination Method
  • Transcript Determination Method

Examples

Experimental program
Comparison scheme
Effect test

example 1

Introduction to NGS Methods

[0091]In order to infer which mRNA molecules were present in the original sample, NGS reads are mapped onto a reference genome with known methods, such as the Burrows-Wheeler transform. For each read this gives a set of genetic coordinates, potentially including information about splice sites. The mapping process is visualized in FIG. 1. Here the location of the short read that has been produced by the sequencer is identified within the reference genome. This process is repeated for all the reads generated by the sequencer, which results in a large number of short sequences on the genetic axis as indicated by the short straight lines below the filled black curve in FIG. 1. The combined statistics of the mapped reads lead to different types of histograms on the genetic axis (i.e. different types of coverage envelope curves). The black filled curve in FIG. 1, for instance, depicts the coverage (envelope). At a given position on the genetic axis the value of ...

example 2

Coordinate Transformations

2.1. Positions in Genome and Transcript Coordinates

[0099]The genetic axis is the sequence of base pairs that have been sequenced for an organism, which usually starts at zero or one and can reach up to several hundred million base pairs long, depending on the complexity of the organism. In addition, the genetic axis is usually subdivided into chromosomes or contigs. The genetic axis is visualized at the top of FIG. 5 which indicates that this graphic represents a selection of a genome on chromosome 11 approximately between base pair 53,242,500 and 53,244,200. A transcript is usually defined as a sequence of exons (exon1, . . . , exonN) on the genetic axis, where the i-th exon is an interval [s(exoni), . . . , e(exoni)] on the genetic axis which starts at s (exoni) and ends at e(exoni). The gap between two successive exons [e(exoni)+1,s(exoni+1)−1] is called an intron and the connection from the last nucleotide preceding an intron to the first nucleotide fol...

example 3

Estimation of Transcript Probabilities and Transcript Specific Probability Distributions

[0109]The model that is described in the following uses mixtures of mixtures of functions and will therefore be called the Mix2 model.

3.1 Mathematical Foundations of the Mix2 Model

[0110]In the following, r can represent both a fragment and a position. However, for convenience, r will always be referred to as a fragment. The probability of observing a particular fragment r in the genetic locus ptotal(r) is a sum of the probabilities of observing the fragment for a transcript weighted by the probability that the transcript generates a fragment. Hence the ptotal(r) is given by the following mixture of probability distributions.

ptotal(r)=∑i=1Nαip(rt=i)(15)

[0111]As described in section 2, if r is compatible with t=i then p(r|t=i)=trans(T(r)|t=i) and p(r|t=i)=0 otherwise. The method assumes that the probability distributions ptrans(r|t=i) are mixtures, i.e.

ptrans(rt=i)=∑j=1Miβi,jptrans(rt=i,b=j)(16)

[01...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method of estimating transcript abundances includes: (a) obtaining transcript fragment sequencing data from a potential mixture of transcripts of a genetic locus of interest; (b) assigning this data to genetic coordinates of the locus of interest to obtain a data set of fragment genetic coordinate coverage and a coverage envelope curve; (c) setting a number of transcripts of the mixture; (d) pre-setting a probability distribution function of modelled genetic coverage for each transcript i composed of the product of a weight factor αi and the sum of at least 2 probability subfunctions j independently weighted by a weight factor βi,j; (e) adding the probability distribution functions for each transcript to obtain a sum function; (f) fitting the sum function to the coverage envelope curve to optimize αi and βi,j to increase the fit; and (g) repeating steps (e) and (f) until a pre-set convergence criterion has been fulfilled.

Description

FIELD OF INVENTION[0001]The present invention relates to providing information of transcript (e.g. mRNA) abundances based on next generation sequencing (NGS) reads.BACKGROUND[0002]Next generation sequencing technology produces a large amount of short reads when sequencing a nucleic acid sample. An essential step in next generation sequencing is the library preparation or library prep for short. This process takes mRNA or cDNA as input and produces a library of short cDNA fragments, each corresponding to a section of an mRNA molecule. These fragments are then sequenced by an NGS sequencer, usually not in their entirety but partially at their start and / or at their end. This results in short sequences of nucleotides which are called reads and are most commonly stored by the NGS sequencer as sequences of a group of four ASCII characters such as A, C, G, T or 0, 1, 2, 3, representing the nucleobases of the genetic code. In order to infer which mRNA molecules were present in the original ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F19/22G06F19/24G16B30/00G16B40/00
CPCG06F19/24G06F19/22G16B30/00G16B40/00
Inventor TURK, ANDREAS
Owner LEXOGEN GMBH