Phylogenetic tree construction method based on sequence pattern mining algorithm

A technology of sequence pattern mining and phylogenetic tree, applied in sequence analysis, biostatistics, bioinformatics, etc., can solve the problems of low accuracy of alignment methods, inability to compare, low similarity, etc., to optimize similarity the measured effect of

Active Publication Date: 2019-03-29
XI AN JIAOTONG UNIV
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (2) During the evolution process, as the sequence continues to accumulate single-base mutations and some small fragment insertions and deletions, the similarity between sequences is getting lower and lower. When the sequence similarity is lower than a certain critical point, based on the comparison The accuracy of the method will decrease rapidly, and it will not even be possible to compare
[0008] (3) Since most of the multiple sequence alignment methods are based on dynamic programming algorithms, the time complexity is high and the resource usage is high, especially when there are large-scale low-similarity sequences that need to be compared, it will be a time-consuming and labor-intensive process
[0010] (5) The distance between two sequences depends too much on those regions that have been aligned, ignoring some sequence fragments that themselves have certain biological significance
However, when they calculate the distance between each two sequences, they only use the common pattern between the current two sequences to calculate the similarity between the sequences, ignoring the potential relationship between the global sequences and ignoring the The difference and connection before the set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Phylogenetic tree construction method based on sequence pattern mining algorithm
  • Phylogenetic tree construction method based on sequence pattern mining algorithm
  • Phylogenetic tree construction method based on sequence pattern mining algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0044] A method for constructing a phylogenetic tree based on a sequential pattern mining algorithm of the present invention includes the following two steps.

[0045] Step 1: In order to accurately calculate the similarity between sequences, overcome the complex calculation process and various speculation assumptions based on the multiple sequence alignment method, overcome the extraction of too many interference patterns based on kmer or substring and ignore the global sequence Insufficient similarity relationship between them, develop a scheme based on sequence pattern mining algorithm to mine the specific patterns that frequently appear in multiple sequences for large-scale, low-similarity sequence sets, and then build a phylogenetic tree based on these patterns is the purpose of the present invention. The task completed in step 1 is to mine the specific pattern...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a phylogenetic tree construction method based on a sequence pattern mining algorithm. The phylogenetic tree construction method based on the sequence pattern mining algorithmcomprises the following steps: mining a specific pattern which is hidden in a sequence set and can be used for measuring sequence similarity to obtain an initial pattern set; filtering an unclosed frequent pattern in the initial patter set to acquire an optimized pattern set capable of representing the sequence set; and constructing a patter vector set and calculating the distance between numericvectors so as to construct a distance matrix for producing a phylogenetic tree. The sequence pattern which frequently appears in the sequence set is extracted by a sequence patter mining algorithm, the sequence set is converted into binary system by the filtered pattern set or the distance matrix is calculated in the form of giving weight information to the pattern vector set, and then the phylogenetic tree is constructed. For the large-scale and low-similarity sequence set, the more representative pattern in the sequence set can be mined by utilizing a pattern growth strategy, so that the extraction of a redundancy pattern which is useless for measuring sequence similarity is voided and measurement on the similarity among the sequences within the global range is optimized.

Description

technical field [0001] The invention relates to a method for constructing a phylogenetic tree, in particular to a method for constructing a phylogenetic tree based on a sequence pattern mining algorithm. Background technique [0002] Since the 1980s, with the continuous development of computer technology, sequencing technology and molecular biology, researchers from various countries have implemented a number of genome projects. Through genome sequencing, protein sequencing and structural analysis, a large number of molecular biological learning data. However, in the face of these exponentially growing data, researchers have only obtained a small amount of valuable information, and a large amount of potential information with biological significance is submerged in various databases, and people's ability to analyze and process data has fallen far behind. ability to generate data. There is an urgent need for computing equipment with more powerful computing power and more af...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B40/00G16B30/00
Inventor 叶凯康永永杨晓飞贾鹏蔺佳栋郭立
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products