Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Gene sequencing data reading method and system

A gene sequencing and data reading technology, applied in the field of bioinformatics, can solve problems such as the sharp increase of gene sequence fragments, and achieve the effect of uniform size

Active Publication Date: 2016-09-07
SENRIS BIOTECHNOLOGY (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since the length of the genome source sequence varies from 100,000 bases (such as pigpox virus, Escherichia coli) to 1 billion bases (such as the yellow race, cucumber, and panda genomes), complex environments (such as seawater, human large intestine, etc.) Metagenome data can even reach tens of billions of bases, and the coverage of these samples needs to reach 30-100 times, which leads to a dramatic increase in the number of gene sequence fragments generated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene sequencing data reading method and system
  • Gene sequencing data reading method and system
  • Gene sequencing data reading method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] Embodiment 1 of the present invention provides a method for reading gene sequencing data. Such as figure 1 As shown, the method includes the following steps:

[0038] Step S101: Analyze user parameters to determine the number of tasks. The user parameters described in this embodiment include hard performance, the total size of gene sequencing data, the length of the homologous gene reference sequence, etc., and the number of required tasks is reasonably selected according to the user parameters. The tasks in this embodiment are MPI processes.

[0039] Step S102: Divide the sequencing data into file blocks of the same size according to the number of tasks. Specifically, in this embodiment, the sequencing data is divided into file blocks of the same size according to the number of tasks, and the start position and end position of each file block are obtained. If the number of tasks is n and the total size of gene sequencing data is S, then the starting position of the...

Embodiment 2

[0046] Embodiment 2 of the present invention provides a method for reading gene sequencing data. Such as Figure 4 As shown, the method includes the following steps:

[0047]Step S201: Initialize tasks, establish connections between all nodes, and collect statistics on node information and process information. In this embodiment, task initialization is performed, and the information of all computer nodes involved in computing, task identification numbers communicated with the group, and the number of all potential tasks capable of participating in group communication are obtained for statistics. The tasks in this embodiment are MPI processes.

[0048] Step S202: Analyze the user parameters to determine the number of tasks.

[0049] Step S203: Divide the sequencing data into file blocks of the same size according to the number of tasks.

[0050] Step S204: Adjust the start address and end address of each file block.

[0051] The above steps have been described in detail in...

Embodiment 3

[0062] Embodiment 3 of the present invention provides a method for reading gene sequencing data. In this embodiment, a high-performance mainframe utilizes different threads in the program to complete the reading of gene sequencing data. The method includes the following steps:

[0063] Step S301: Analyzing user parameters to determine the number of threads of the program.

[0064] Step S302: Divide the sequencing data into file blocks of the same size according to the number of program threads.

[0065] Step S303: Adjust the start address and end address of each file block.

[0066] Step S304: each thread reads the file from the adjusted file block result.

[0067] Steps S302 to S304 have been described in detail in Embodiment 1, and will not be repeated here.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of bioinformatics and provides a gene sequencing data reading method. The method includes the steps: parsing user parameters and determining the number of tasks; dividing sequencing data into file blocks in the same size according to the number of the tasks; regulating an initial address and a terminated address of each file block; and reading regulated file block results of each task. The invention further provides a gene sequencing data reading system and a gene sequencing data reading device provided with the same. By the method, the system and the device, parallel reading of the gene sequencing data is achieved, the file blocks are uniform in size, and one sequence can be prevented from being divided to two different file blocks.

Description

technical field [0001] The invention relates to the technical field of bioinformatics, in particular to a method and system for reading gene sequencing data. Background technique [0002] The sequencing of biological macromolecules runs through the development of bioinformatics from beginning to end, especially the sequencing of nucleic acids and proteins. Biological genomes include all cell structures and genetic information of life activities, fundamentally guiding the rapid development of organisms. Accurate and real-time access to genetic information of organisms can effectively guide life science research. Sequencing technology can quickly obtain the genetic information on DNA and comprehensively explain the diversity and complexity of the genome, playing an increasingly important role in bioinformatics research. [0003] In recent years, the next-generation sequencing technology has brought great changes to bioinformatics, and has achieved remarkable developments in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/20G06F17/21
Inventor 孟金涛魏延杰成杰峰冯圣中
Owner SENRIS BIOTECHNOLOGY (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products