Method for Expression of RNA Secondary Structure Sequence Similarity Based on Cross-correlation Coefficient

A technology of cross-correlation coefficient and secondary structure, applied in genomics, electrical digital data processing, special data processing applications, etc., can solve problems such as algorithm complexity growth, achieve convenient extraction of numerical features, simple extraction of numerical features, eliminate The effect of degradation

Active Publication Date: 2018-02-13
DALIAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the dynamic programming algorithm will increase the complexity of the sequence alignment algorithm exponentially with the increase in the number of sequences.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for Expression of RNA Secondary Structure Sequence Similarity Based on Cross-correlation Coefficient
  • Method for Expression of RNA Secondary Structure Sequence Similarity Based on Cross-correlation Coefficient
  • Method for Expression of RNA Secondary Structure Sequence Similarity Based on Cross-correlation Coefficient

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] Below in conjunction with accompanying drawing, the present invention will be further described, and detailed steps are as follows:

[0046] Step 1: The RNA sequence is represented by a series of bases. The bases and free radicals in the base pair are represented by different methods. The free radical bases are represented by A, U, G, and C, and the bases in the base pair The bases are represented by A', U', G', C', from which the characteristic sequence of RNA secondary structure can be obtained. Such as figure 1 It is the structure diagram of AIMV-3, and its characteristic sequence is AUGCU′C′A′U′G′C′A′AAACU′G′C′A′U′G′A′AUGCC′C′C′UAAG′G′ G'AUGC (from 5' to 3');

[0047] Step 2: Correspond each base to a different initial position in three-dimensional space, the corresponding initial position is as follows:

[0048]

[0049]

[0050] Step 3: use the base correspondence formula to map the bases of each RNA secondary structure characteristic sequence into a seri...

Embodiment 2

[0056] Step 1: Express the 9 kinds of RNA viruses with sequences, the free radical bases are represented by A, U, G, C, and the bases in the base pair are represented by A', U', G', C', so that The viral RNA can be represented by a characteristic sequence;

[0057] Step 2: Initialize the base, initialize the base as follows:

[0058]

[0059]

[0060] Step 3: Map the feature sequence to an empty point in the three-dimensional space according to the mapping formula, and the formula is as follows:

[0061]

[0062] in, The formula is as follows

[0063]

[0064] By connecting points in 3D space at a time, 9 viral RNAs can be graphically represented.

[0065] Step 4: Concatenate all the sequences to be compared into a new long sequence, and compare each sequence to be compared with the new sequence in S4, and bring it into the newly proposed cross-correlation coefficient formula to get each Correlation coefficient, cross-correlation values ​​see Figure 4 , the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of sequence similarity analysis, and designs a novel RNA three-dimensional representation method based on a chaos game and RNA secondary structure similarity analysis based on cross correlation coefficients. On the basis of the original two-dimensional representation method, the three-dimensional representation method is provided; the novel representation method can eliminate the degradation phenomena of the original representation method; then, the similarity of an RNA sequence is described by extracting numeric features and newly defined cross correlation coefficients from a new representation figure; finally, the method is applied to 9 kinds of RNA viruses, such that the feasibility of the method is analyzed; an experimental result shows that the method has feasibility; and furthermore, in the aspect of clustering analysis, the experimental result shows that effective data is extracted more easily.

Description

technical field [0001] The present invention relates to a new three-dimensional image representation method and cross-correlation coefficients to characterize the similarity of RNA secondary structure sequences, specifically a new three-dimensional representation method to represent RNA characteristic sequences and use cross-correlation coefficients to describe the similarity of RNA secondary structures degree, which belongs to the field of sequence similarity analysis. Background technique [0002] Sequence alignment is to align the bases between sequences, and obtain the highest score through a certain scoring mechanism, which reflects the degree of similarity between sequences. At present, there are many algorithms for sequence alignment, and most of them adopt the idea of ​​dynamic programming. Needleman and Wunsch first proposed the Needleman-Wwunsch algorithm for global alignment of double sequences in 1970, which is a dynamic programming algorithm. Later, Smish and W...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 张强邢姗姗王宾魏小鹏
Owner DALIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products