Gene sequence similarity calculating method based on multiple k values

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A similarity calculation and gene sequence technology, applied in sequence analysis, bioinformatics, instruments, etc., can solve the problems of important information loss, less important, unavoidable parameter k selection, etc., and achieve high comparison accuracy, Avoid guesswork, solve the effect of parameter k infinite and random selection

Active Publication Date: 2019-06-18

EAST CHINA NORMAL UNIVERSITY

View PDF2 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

A certain segment in a sequence may be more meaningful and contain more important information for the current sequence alignment, while other segments may be less important, using the same method to deal with different 'k-mer' features may cause some important loss of information

Although there are related methods that consider adding different weights to different "k-mers" to improve the accuracy of sequence alignment, the selection of parameter k still cannot be avoided

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0026] See attached figure 2 , the present invention carries out the gene sequence similarity calculation square based on a plurality of k values according to the following steps:

[0027] (1) Calculate the "Markov" background probability of the gene sequence

[0028] Step 1: Extract the "k-mer" set of the sequence, set a value range of k S=[kmin, kmax],

[0029] For each value of k in S, a "k-mer" set d is extracted from the two sequences by a sliding window of size k k , d k is of size And get the set d of all different k.

[0030] Step 2: Use the maximum likelihood method to estimate the transition probability of "Markov", assuming that each "k-mer" obeys the "Markov" model, and use the maximum likelihood method according to the subsequence distribution of the two sequences x, y Obtain the transition probability T(S of the "Markov" model i , S j ).

[0031] Step 3: According to the "Markov" model, the probability of occurrence of each "k-mer" in the transition i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a gene sequence similarity calculating method based on multiple k values. The method is characterized in that a plurality of k-mers which are extracted from the k values form an integrated vector of frequency number and frequency; the importance degree of each k-mer is autonomously identified; and three weight-equipped D2-type similarities of MKWD2, MKWDs2 and MKWD*2 are obtained. Compared with the prior art, the gene sequence similarity calculating method has the following fine advantages which are not easily ignored: accurately extracting the characteristic, realizingsimple analysis process, simplifying biological sequence processing, realizing high comparison accuracy, greatly shortening sequence comparison time, greatly improving working efficiency, effectivelypreventing guessing to an optimal parameter k, and settling a problem of infinite and random selection of the parameter k in a relatively good manner.

Description

technical field [0001] The invention relates to the technical field of gene sequence comparison, in particular to a gene sequence similarity calculation method based on multiple k values. Background technique [0002] The calculation of gene sequence similarity is of great significance to the analysis of gene sequence function and structural relationship, as well as the construction of biological evolutionary trees. At present, alignment-based sequence alignment methods are generally used in the alignment of gene sequences. This alignment method is not only time-consuming and laborious, but also has low accuracy and low work efficiency. Today, with the development of science and technology, especially with the development of next-generation gene sequencing technology, the exponential growth of biological data, the alignment-based comparison method has been far from meeting the time requirements for processing large-scale gene sequence data . In order to speed up the proces...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G16B30/00

Inventor 钱莹章炯民张雨

Owner EAST CHINA NORMAL UNIVERSITY

Gene sequence similarity calculating method based on multiple k values

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology