Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

DNA sequence query system based on shared data approximation

A DNA sequence, query system technology, applied in the field of data mining, can solve the problems of low utilization of data outline space, space waste, etc., to achieve the effect of improving space utilization, saving storage space, and ensuring query efficiency

Active Publication Date: 2020-03-06
XI AN JIAOTONG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This will make the data surface space utilization rate of small archives low, resulting in serious waste of space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DNA sequence query system based on shared data approximation
  • DNA sequence query system based on shared data approximation
  • DNA sequence query system based on shared data approximation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the implementation of the present invention will be described in detail below with reference to the drawings and examples.

[0036] Such as figure 1 As shown, the system is composed of three subsystems, which are the data preprocessing subsystem, the data outline establishment and update subsystem, and the DNA sequence query subsystem. The input to the system is DNA data from different archives.

[0037] First, the above-mentioned DNA data is input into the data preprocessing subsystem, and the raw data is parsed into a binary group of DNA sequence and archive information.

[0038] Then, the processed two-tuple enters the data outline establishment and update subsystem, which maps the two-tuple to the data outline for compressed storage.

[0039] Finally, for a given DNA sequence of interest, the DNA sequence query subsystem will determine which arc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a DNA sequence query system based on shared data approximation. The system compresses DNA sequences from different archives into the same data approximation. For an interested DNA sequence, which archives the sequence exists in can be quickly inquired through the data infarct. The DNA sequence query system comprises three subsystems, namely a data preprocessing subsystem, adata approximate establishing and updating subsystem and a DNA sequence query subsystem. Useful information is extracted from a given DNA sequence; original data infringement is proposed to compress the DNA sequence data of the plurality of archives; and the DNA is inquired by utilizing the data infarct. The system can be used for DNA sequence query, and the archive library data of interest is further searched for further research by querying the archive library where the DNA of interest is located.

Description

technical field [0001] The invention belongs to the technical field of data mining, in particular to a DNA sequence query system based on a shared data outline. Background technique [0002] Whole-genome shotgun sequencing of microbial genomes has become an important part of comparative genomics research, and has been widely used in many fields such as tracking foodborne disease outbreaks, mapping drug resistance distribution, and diagnosing infectious diseases. These DNA sequence data are stored in various archives, such as the European Nucleotide Archive, etc. The same DNA sequence may be stored in multiple archives at the same time. Researchers pay attention to the DNA sequences they are interested in, and by searching the archives where these DNA sequences are located, they can further obtain the archives they are interested in, so as to carry out follow-up research work. Therefore, it is an important problem to quickly determine in which archives a DNA sequence of int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B50/30G16B50/50
CPCG16B50/30G16B50/50
Inventor 王平辉李润东狄佳孙飞扬樊子恩
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products