Biological sequence analysis method based on gap spectrum

A biological sequence and analysis method technology, applied in the field of non-comparison analysis, can solve the problems of not being able to use structure and function, and not being able to meet high speed and high sensitivity at the same time, and achieve the effect of performance improvement

Inactive Publication Date: 2009-08-05
CHINA AGRI UNIV
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This makes it impossible to use traditional methods based on sequence similarity to study the structure and function of these molecules
Moreover, traditional computerized biological sequence analysis trades off sensitivity for speed
When processing massive amounts of data, high speed and high sensitivity cannot be satisfied at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Biological sequence analysis method based on gap spectrum
  • Biological sequence analysis method based on gap spectrum
  • Biological sequence analysis method based on gap spectrum

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Example 1 DNA sequence analysis based on gap spectrum

[0036] The subsequence (seq1) of the DNA sequence (SEQ1) with a full length of 20 is marked with one-dimensional coordinates.

[0037] A A A A G G C C T A C A A T T A C T C T DNA sequence with a full length of 20 (negative strand direction)

[0038] The DNA sequence (SEQ1) with a full length of 20 is marked with one-dimensional coordinates along its minus-strand direction. The coordinate identification starts at the 4th base (identified as 1), ends at the penultimate base 5 (identified as 13), and finally obtains a DNA subsequence (seq1) with a length of 13 and a one-dimensional coordinate identification. The identification results are as follows:

[0039] A G G C C T A C A A T T A DNA subsequence of length 13

[0040] 1 2 3 4 5 6 7 8 9 10 11 12 13 A DNA subsequence of length 13 identified by one-dimensional coordinates

[0041] Taking character A as an example, find the coordinates of the second, third, and last...

Embodiment 2

[0151] This embodiment introduces a new feature to the two DNA sequences on the basis of Embodiment 1, that is, for the frequency data of the gap spectrum (Table 16) in Embodiment 1, calculate the gap between the characters when the corresponding frequency is maximum in the gap spectrum. The calculation results are shown in Table 21.

[0152] Table 21 The distance between characters at the maximum frequency

[0153] frequency statistics A-A A-C A-G A-T C-A C-C C-G C-T G-A G-C G-G G-T T-A T-C T-G T-T Frequency in seq1

characters at max

spacing

3,6

3,4,7

2

2,5

5

3,4

/

7

7

2

/

9

2,3,4

,7

2

/

5,6 Frequency in seq2

characters at max

spacing

3

5,9

4,6,8...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an analytic method of a biological sequence, which comprises the following steps: firstly, calculating the clearance spectrum of the biological sequence, calculating the distance between two adjacent characters in the biological sequence, and respectively calculating the occurrence frequency of the identical distance among the characters to form a clearance spectrum; secondly, calculating the similarities among different biological sequences; and thirdly deriving the homology or the biological functions of the different biological sequences. The analytic method of the biological sequences has the advantages of high speed, high sensitivity, high accuracy, and the like.

Description

technical field [0001] The invention belongs to the field of bioinformatics, and in particular relates to a non-alignment analysis method of biological sequences. Background technique [0002] As we all know, biological sequences include nucleic acid and amino acid sequences, which contain a lot of life information. At present, biological sequence sequencing is not a difficult task. In various databases at home and abroad, a large amount of biological sequence data has been accumulated. In order to make good use of these massive data and reveal the deeper structural and functional information behind the biological sequence data, computerized biological sequence analysis methods have been developed. The basic idea of ​​traditional computerized biological sequence analysis methods is that when two molecules have similar sequences, they are likely to have similar three-dimensional structures and functions. Therefore, searching for the homologous sequence of the target biolog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): C12Q1/68G06F19/00
Inventor 安冬苏谦
Owner CHINA AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products