Method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences

A DNA sequence and identification method technology, applied in the field of systems biology, can solve the problems of many candidate modes, long running time, and difficulty in finding repeating sequences of DNA sequences, and achieve the effect of improving the recognition efficiency and the recognition efficiency.

Inactive Publication Date: 2018-11-06
CENT SOUTH UNIV
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In 2011, Zhou et al. proposed a genome (DNA) frequent pattern mining method based on the frequent subtree mining strategy. This method reduces the stored sequence according to the suffix tree structure, but the method intercepts two subsequences from the DNA sequence , compare the two subsequences, that is to say, this method needs to compare all the subsequences between two pairs, and count how many identical subsequences there are according to the comparison results, which is difficult to find Repeated sequences with a high number of occurrences in DNA sequences, and time-consuming
In 2013, Jiang et al. constructed frequent approximate patterns on the basis of introducing the concept of similarity, and proposed a frequent approximate pattern mining method SFAP, but the sequences mined by this method are not exactly the same, but similar, so not strictly a repeat region
In 2015, Mao et al. proposed the AMSMA method, which stores the genome (DNA) sequence information obtained by scanning the database in an association matrix for better time and space efficiency, but the row of the association matrix in this method Represents the identified DNA subsequence, and the number of columns has 4 columns, which are A, G, T and C, a total of 4 bases. By combining the DNA subsequence of each row with the bases of each column, you can Obtain an extended DNA subsequence, but there are too many candidate patterns in this method, resulting in long running time and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences
  • Method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences
  • Method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0048] In order to overcome the above-mentioned problems in the prior art, an embodiment of the present invention provides a method for identifying repetitive regions in DNA sequences, figure 1 It is a schematic flowchart of a method for identifying repetitive regions in a DNA sequence according to an embodiment of the present invention, such as figure 1 As shown, the method includes:

[0049] S101. For the constructed n-item sequence, identify the number of occurrences of the n-item sequence in the DNA sequence.

[0050] It should be noted that the n-item sequence in the embodiment of the present invention represents a DNA subsequence with a length of n and n≥2, and a l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences. The method comprises the steps of: identifying occurrence number of constructedn-item sequences in the DNA sequences; taking the n-item sequences with the occurrence number greater than a preset threshold as the repetitive regions and constructing a n-item sequence set of all the n-item sequences serving as the repetitive regions; and if the number of the n-item sequences in the n-item sequence set is not unique, constructing (n+1)-item sequences between two n-item sequencesin the n-item sequence set according to a preset rule. Compared with the prior art, the method provided by the embodiment of the invention has the advantages that only the constructed DNA subsequences are needed to be identified, so that identified objects are greatly reduced; the process of obtaining the repetitive regions can also be obtained by counting the occurrence number in the identification process, so that the identifying efficiency is further improved; and longer DNA subsequences are constructed from the repetitive regions through the preset rule with no need for firstly combiningthe repetitive regions with single bases and traversing the entire DNA sequence one by one, so that the identifying efficiency of the genomic repetitive regions can be greatly improved.

Description

technical field [0001] The invention relates to the technical field of systems biology, more specifically, to a method and a device for identifying repetitive regions in DNA sequences. Background technique [0002] As we all know, deoxyribonucleic acid (DNA) is a double-stranded molecule composed of deoxyribonucleotides, and the genetic information of organisms is always stored in related DNA sequences, which can form genetic instructions to guide biological development and vital functions. The DNA sequence consists of two linear strands coiled in a double helix structure, and each strand can be represented by a linear sequence of adenine (A), thymine (T), cytosine (C) or guanine (G). Additionally, the two strands in a DNA sequence obey the base pairing rules (A with T and C with G). Therefore, modern bioinformatics organizes DNA molecules into a string and stores it in a database for scientific research. With the development of bioinformatics and molecular biology experi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22
Inventor 李敏刘莉娟廖兴宇王建新
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products