DNA sequence similarity analysis method based on S-PCNN and Huffman encoding

A DNA sequence and Huffman coding technology, applied in the field of bioinformatics, can solve problems such as accumulation errors, low DNA sequence discrimination, similarity analysis errors, etc.

Inactive Publication Date: 2016-01-06
YUNNAN UNIV
View PDF1 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The earlier DNA similarity analysis method is to compare the two DNA sequences one by one. When the lengths of the two sequences are different, the comparison will be more difficult.
In recent years, the two-dimensional graphical representation of DNA sequences has become an important method for analyzing DNA sequences, but this series of methods will generate accumulated errors during the encoding process, which will eventually cause errors in similarity analysis
Since then, there has been an algorithm for mapping DNA sequences to a two-dimensional Cartesian coordinate system, and then performing DTW distance analysis similarity, but its discrimination against DNA sequences is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DNA sequence similarity analysis method based on S-PCNN and Huffman encoding
  • DNA sequence similarity analysis method based on S-PCNN and Huffman encoding
  • DNA sequence similarity analysis method based on S-PCNN and Huffman encoding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

[0021] The basic idea of ​​the present invention is: count the number of 64 kinds of triplet codons in the DNA fragment, and obtain its occurrence probability, then carry out Huffman coding to the probability of 64 kinds of triplet codon occurrences, and then convert each code into Decimal numbers, and normalize the encoding to the range of 0 to 1, replace the triplet codon character sequence of DNA with a digital sequence, and then send the encoded DNA digital sequence to the S-PCNN model for clustering calculation to obtain DNA An oscillatory time series (OTS) of a sequence. Finally, the Euclidean distance between the oscillation time series of different DNA sequences is calculated, and the degree of kinship between species is judged by the Euclidean distance. Its method flow chart is as follows figure 1 shown;

[0022] Specifically, the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a new desoxyribonucleic acid (DNA) sequence similarity analysis method by combining a simplified pulse coupled neural network (S-PCNN) model and Huffman encoding. The method comprises: firstly, carrying out Huffman encoding by taking triplet codons (A, G, C, T) as basic encoding units according to features of the S-PCNN model, wherein the encoding enables a DNA character sequence to be digitized and be suitable for the features, of the S-PCNN model, for extracting a DNA sequence; secondly, carrying out feature clustering on the encoded DNA sequence by using the S-PCNN model so as to obtain an oscillation time series (OTS); and finally, measuring the similarity of two sections of DNA sequences through an Euclidean distance of the OTS. According to the method of the present invention, the DNA sequence of a first exon of beta globulin frequently used by nine species is selected, experiments prove that the method of the present invention can effectively distinguish DNA similarity between different species, thereby reflecting good classification performance.

Description

technical field [0001] The invention belongs to the technical field of bioinformatics, and in particular relates to a DNA sequence similarity analysis method based on S-PCNN and Huffman coding. Background technique [0002] Deoxyribonucleic acid (DNA) is the main genetic material of organisms. It is composed of four bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Each organism has its own unique base arrangement, and this arrangement and combination relationship stores the genetic information of the organism. With the development of human gene sequencing and protein sequencing technology, and the facilities of the genome project, the amount of data in biological sequence databases (DNA, RNA and protein sequences) has increased unprecedentedly, and the intelligent processing of its massive information has become an urgent need for biological researchers. need. [0003] The analysis of DNA sequences can help people decipher the information of genetic codes, h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22
Inventor 聂仁灿金鑫周冬明贺康建王佺何敏余介夫谭明川
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products