Unlock instant, AI-driven research and patent intelligence for your innovation.

PacBio sequencing platform IT architecture constructed on basis of Hadoop

A sequencing platform and data technology, applied in the field of bioinformatics, can solve problems such as poor accuracy, achieve unlimited capacity expansion, improve medical conditions, and infinitely expand computing power

Inactive Publication Date: 2018-01-09
华子昂
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The third-generation sequencing technology, also known as single-molecule DNA sequencing, is the principle of distinguishing base signal differences through modern optics, polymers, nanotechnology and other means to achieve the purpose of directly reading sequence information. The fragment read length is better than that of the second-generation equipment, but its accuracy is worse than that of the second-generation equipment. With the improvement of technology in the future, the third-generation sequencing equipment will be more stable and mature

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • PacBio sequencing platform IT architecture constructed on basis of Hadoop
  • PacBio sequencing platform IT architecture constructed on basis of Hadoop
  • PacBio sequencing platform IT architecture constructed on basis of Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031]A PacBio sequencing platform IT architecture based on Hadoop, including hardware modules, software modules, data hierarchy modules, and server group design modules; wherein,

[0032] like figure 1 As shown, the hardware module mainly realizes data transmission and storage through the connection of basic hardware such as switches, routers, servers, and external storage, and successfully executes calculations, as follows:

[0033] Basic hardware components: server nodes, switches, uninterruptible power supply, minimum configuration requirements;

[0034] 1) Server nodes: at least 6 nodes, the minimum configuration of each node is shown in Table 1:

[0035] Table 1 Minimum hardware configuration table for each node

[0036] quantity

components

2

Intel Xeon E5V2 series processor 8-core

1

Riser card, up to 6 × 8PCle slots + 1 16PCle slot

8

8GBRDIMM, 1600MT / s, low voltage, single rank, ×4 bandwidth

1

RAID controller, 1GBNV...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a PacBio sequencing platform IT architecture constructed on basis of Hadoop. The PacBio sequencing platform IT architecture comprises a hardware module, a software module, a data hierarchy module and a server group design module, wherein the hardware module is used for realizing data transmission and storage through basic hardware and executing calculation; the software module is used for updating mass biological information data collected through an analysis instrument to an HDFS, importing mass data obtained from a third-party SQL database into the Hadoop to carry outrelated processing, and finally mining valuable knowledges and rules from experimental data; the data hierarchy module is used for processing the obtained mass biological information data; and the server group design module is used for symbiotic design and role distribution of PacBio SMRT analysis server groups and big data server group hardware. According to the PacBio sequencing platform IT architecture, high-efficiency processing for mass data is realized, and high-speed and low-cost reading for gene sequences is formed; and the PacBio sequencing platform IT architecture is applied to biological information data analysis, medical data simulation and analysis, and has significance in the aspects of disease prediction, diagnosis and treatment.

Description

Technical field: [0001] The invention relates to the technical field of biological information, in particular to an IT architecture of a PacBio sequencing platform built on the basis of Hadoop. Background technique: [0002] In recent years, the development of high-throughput sequencing technology has brought more and more data to be processed by bioinformatics. How to efficiently mine valuable knowledge and laws from massive experimental data at low cost has become a major issue in bioinformatics research. hotspots. Cloud computing (Cloud Computing) is a new result of the development of grid, parallel and distributed computing. People can obtain computing, storage capabilities and infrastructure provided by the cloud through the network. The cloud computing platform can effectively store, process and analyze massive biological information data. [0003] In 1986, the first commercial gene sequencing equipment appeared. After 19 years, the second-generation sequencing equip...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G06F19/28
Inventor 罗崇珺万君兴华子昂华晨
Owner 华子昂