Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for quickly and uniformly fragmenting fastq files

A file and uniform technology, applied in the field of gene sequencing, can solve the problems of inefficient processing and time-consuming, and achieve the effect of improving work efficiency, saving time, and reducing the number of calls

Pending Publication Date: 2020-10-16
BERRYONCOLOGY PRECISION MEDICAL DEVICE CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This processing method is not efficient. For example, because the fastq file is very large, it may contain hundreds of millions of read data. The step of counting the total number of read data alone will consume a lot of time.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for quickly and uniformly fragmenting fastq files
  • Method for quickly and uniformly fragmenting fastq files
  • Method for quickly and uniformly fragmenting fastq files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0058] In addition, the term "and / or" in this article is only an association relationship describing associated objects, which means that there may be three relationships, for example, A and / or B, which may mean: A exists alone, A and B exist at the same time, There are three cases of B alone. In addition, the character " / " in this article genera...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for quickly and uniformly fragmenting fastq files, which comprises the following steps of: concurrently loading the fastq files through two threads, constructing read data and outputting the read data; setting a plurality of fragments, and distributing the read data to fragment caches of the corresponding fragments according to a uniform fragmentation algorithm; andwriting read data in the fragment cache into an output fastq file of the corresponding fragment through an asynchronous fragment thread in an asynchronous fragment thread pool. According to the invention, a parallel working mode is utilized, and a plurality of threads simultaneously act synergistically, so that the operation efficiency is improved; a uniform fragmentation algorithm is designed, which fragment the read data belong to is efficiently calculated in real time in the fragmentation process, the process that a fastq file needs to be completely traversed to count the total number of reads in a traditional method is avoided, and time consumed by a program is saved; by applying the fragmentation caching technology, the calling frequency of a system lock and the operation frequency of a thread safety queue and a file system data writing interface are greatly reduced, the load of an operating system is reduced, and the purpose of quickly and uniformly fragmenting fastq files is achieved.

Description

technical field [0001] Embodiments of the present invention generally relate to the field of gene sequencing, and more specifically, relate to a method for quickly and evenly segmenting fastq files. Background technique [0002] In the field of gene sequencing, the samples are sequenced by a sequencer and output files such as BCL. After further data conversion and data splitting, the fastq file will be obtained. A fast file is a text-based file used to store biological gene base sequences, corresponding quality scores, and related information. It is the most commonly used file format in the field of gene sequencing. Each fastq file is composed of read data one by one, and each read data is composed of 4 lines of data. The bioinformatics analysis of genes is actually the analysis of this fastq file, and in more detail, it is the analysis of a large number of read data contained in the fastq file. [0003] However, due to the huge amount of genetic data, the size of a fastq...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/17G06F3/06G06F9/50
CPCG06F16/17G06F3/0659G06F3/064G06F3/0643G06F3/065G06F9/5016G06F9/5022G06F2209/5011Y02D10/00
Inventor 黄俊松文晋邵艳军
Owner BERRYONCOLOGY PRECISION MEDICAL DEVICE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products