Gene sequence similarity calculating method based on multiple k values
A similarity calculation and gene sequence technology, applied in sequence analysis, bioinformatics, instruments, etc., can solve the problems of important information loss, less important, unavoidable parameter k selection, etc., and achieve high comparison accuracy, Avoid guesswork, solve the effect of parameter k infinite and random selection
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0026] See attached figure 2 , the present invention carries out the gene sequence similarity calculation square based on a plurality of k values according to the following steps:
[0027] (1) Calculate the "Markov" background probability of the gene sequence
[0028] Step 1: Extract the "k-mer" set of the sequence, set a value range of k S=[kmin, kmax],
[0029] For each value of k in S, a "k-mer" set d is extracted from the two sequences by a sliding window of size k k , d k is of size And get the set d of all different k.
[0030] Step 2: Use the maximum likelihood method to estimate the transition probability of "Markov", assuming that each "k-mer" obeys the "Markov" model, and use the maximum likelihood method according to the subsequence distribution of the two sequences x, y Obtain the transition probability T(S of the "Markov" model i , S j ).
[0031] Step 3: According to the "Markov" model, the probability of occurrence of each "k-mer" in the transition i...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


