Method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A DNA sequence and identification method technology, applied in the field of systems biology, can solve the problems of many candidate modes, long running time, and difficulty in finding repeating sequences of DNA sequences, and achieve the effect of improving the recognition efficiency and the recognition efficiency.

Inactive Publication Date: 2018-11-06

CENT SOUTH UNIV

View PDF2 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In 2011, Zhou et al. proposed a genome (DNA) frequent pattern mining method based on the frequent subtree mining strategy. This method reduces the stored sequence according to the suffix tree structure, but the method intercepts two subsequences from the DNA sequence , compare the two subsequences, that is to say, this method needs to compare all the subsequences between two pairs, and count how many identical subsequences there are according to the comparison results, which is difficult to find Repeated sequences with a high number of occurrences in DNA sequences, and time-consuming

In 2013, Jiang et al. constructed frequent approximate patterns on the basis of introducing the concept of similarity, and proposed a frequent approximate pattern mining method SFAP, but the sequences mined by this method are not exactly the same, but similar, so not strictly a repeat region

In 2015, Mao et al. proposed the AMSMA method, which stores the genome (DNA) sequence information obtained by scanning the database in an association matrix for better time and space efficiency, but the row of the association matrix in this method Represents the identified DNA subsequence, and the number of columns has 4 columns, which are A, G, T and C, a total of 4 bases. By combining the DNA subsequence of each row with the bases of each column, you can Obtain an extended DNA subsequence, but there are too many candidate patterns in this method, resulting in long running time and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0048] In order to overcome the above-mentioned problems in the prior art, an embodiment of the present invention provides a method for identifying repetitive regions in DNA sequences, figure 1 It is a schematic flowchart of a method for identifying repetitive regions in a DNA sequence according to an embodiment of the present invention, such as figure 1 As shown, the method includes:

[0049] S101. For the constructed n-item sequence, identify the number of occurrences of the n-item sequence in the DNA sequence.

[0050] It should be noted that the n-item sequence in the embodiment of the present invention represents a DNA subsequence with a length of n and n≥2, and a l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences. The method comprises the steps of: identifying occurrence number of constructedn-item sequences in the DNA sequences; taking the n-item sequences with the occurrence number greater than a preset threshold as the repetitive regions and constructing a n-item sequence set of all the n-item sequences serving as the repetitive regions; and if the number of the n-item sequences in the n-item sequence set is not unique, constructing (n+1)-item sequences between two n-item sequencesin the n-item sequence set according to a preset rule. Compared with the prior art, the method provided by the embodiment of the invention has the advantages that only the constructed DNA subsequences are needed to be identified, so that identified objects are greatly reduced; the process of obtaining the repetitive regions can also be obtained by counting the occurrence number in the identification process, so that the identifying efficiency is further improved; and longer DNA subsequences are constructed from the repetitive regions through the preset rule with no need for firstly combiningthe repetitive regions with single bases and traversing the entire DNA sequence one by one, so that the identifying efficiency of the genomic repetitive regions can be greatly improved.

Description

technical field [0001] The invention relates to the technical field of systems biology, more specifically, to a method and a device for identifying repetitive regions in DNA sequences. Background technique [0002] As we all know, deoxyribonucleic acid (DNA) is a double-stranded molecule composed of deoxyribonucleotides, and the genetic information of organisms is always stored in related DNA sequences, which can form genetic instructions to guide biological development and vital functions. The DNA sequence consists of two linear strands coiled in a double helix structure, and each strand can be represented by a linear sequence of adenine (A), thymine (T), cytosine (C) or guanine (G). Additionally, the two strands in a DNA sequence obey the base pairing rules (A with T and C with G). Therefore, modern bioinformatics organizes DNA molecules into a string and stores it in a database for scientific research. With the development of bioinformatics and molecular biology experi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F19/22

Inventor 李敏刘莉娟廖兴宇王建新

Owner CENT SOUTH UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and device for identifying repetitive regions in deoxyribonucleic acid (DNA) sequences

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology