Biological sequence data-base searching multilayered accelerating method based on flow process

A biological sequence and database technology, which is applied in the field of multi-level acceleration of biological sequence database search based on stream processing, and can solve problems such as searching of biological sequence databases that have not been seen before.

Inactive Publication Date: 2008-04-09
NAT UNIV OF DEFENSE TECH
View PDF1 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current above-mentioned three types of technologies have not yet adopted this kind of stream processing technology, and t...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Biological sequence data-base searching multilayered accelerating method based on flow process
  • Biological sequence data-base searching multilayered accelerating method based on flow process
  • Biological sequence data-base searching multilayered accelerating method based on flow process

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0090] specific implementation plan

[0091] Fig. 1 is the general flow chart of the present invention, mainly comprises following six steps, and wherein the 1st~2 steps are the establishment of cluster system and the preprocessing of database sequence, only need to carry out once when system is established at the beginning and database has update, In each subsequent specific task of searching the database, only steps 3 to 6 need to be performed.

[0092] 1. Build a cluster system with a stream processor. The cluster system is composed of multiple personal computers, each of which is a node of the cluster system, and each node has its own independent storage system, and the communication between nodes adopts the way of message transmission. Nodes are sequentially numbered 0, 1, 2, ..., n p -1. here n p is the total number of nodes in the cluster system (in the actual system, n p The value of is an integer power of 2, etc.). In order to facilitate subsequent processing, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-level acceleration method of flow-based biological sequence database search, which is to accelerate the search speed of a biological sequence database on the premise of ensuring search accuracy and relatively low cost. The technical proposal is that a cluster system composed of a plurality of personal computers shall be created firstly, and a master control node machine is assigned; the master control node machine distributes the database sequence and stores into each node machine in the cluster system, so as to fill and rearrange an inquiry sequence, and distribute the inquiry sequence to all the node machines in the cluster system; each node machine executes the search task in parallel, so as to be responsible for the completion of search tasks of the inquiry sequence in a local database sequence; the master control node machine collects, summarizes and outputs the results of parallel search tasks on all the node machines. The invention makes the search tasks be executed in parallel between the n node machines of the cluster, each node machine distributes the comparative calculation task of two sequences to p hardware calculation clusters to be conducted in parallel, thereby realizing the multi-level acceleration objective in parallel of three layers including a cluster node layer, a flow-level calculation layer, as well as a flow inner core command layer.

Description

technical field [0001] The invention mainly relates to a method for retrieving and comparing massive biological information in life science and information science, especially a method for multi-level acceleration of biological sequence database search. Background technique [0002] In recent years, with the continuous development and deepening of the genome projects of various species, massive biological sequence data have been generated, which mainly include DNA (deoxyribonucleic acid), RNA (ribonucleic acid) sequence data and protein sequence data. For example, the amount of data in GenBank / EMBL / DDBJ, the three major international nucleic acid databases, doubles approximately every 15 months, and the speed of data growth is still accelerating. According to Moore's Law, the growth rate of biological data is or will soon be The growth rate exceeds the computer processing power, so according to this growth trend, searching and processing these data will take longer (in the c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F19/00G06F19/22
Inventor 王勇献王正华董蕴源车永刚徐传福彭宇行王意洁邢座程
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products