Method for quickly acquiring comparison result data of target genome region

A target gene and genome technology, which is applied in the field of quickly obtaining the data of target genome region comparison results, can solve the problems of reducing BAM storage, insufficient storage resources, lack of universality, etc., and achieves convenient operation and low computing resource requirements. , a wide range of effects

Pending Publication Date: 2021-10-08
SUZHOU SMK GENE TECH LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] b. Reducing the BAM still needs to occupy a certain amount of storage resources, and with the increase of the sample size, there is still a limitation of insufficient storage resources, so people have to face another problem, that is, how to solve the storage problem of reducing the BAM of a large number of samples
[0010] c. This method is not universal, and different data analysts have a preference for gene regions with functional genes, which has a strong correlation with the knowledge background of the analysts, so it will cause the same original sample The secondary data BAM file, after being reduced by different analysts, will generate a completely different reduced BAM file

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for quickly acquiring comparison result data of target genome region
  • Method for quickly acquiring comparison result data of target genome region
  • Method for quickly acquiring comparison result data of target genome region

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] Example 1 The method of quickly realizing the comparison result data of the target genome region from the raw sequencing data of the sample

[0050] Overview of the overall process:

[0051] (1) Construction of reference genome and chromosome index;

[0052] (2) Acquisition of target gene coordinate interval;

[0053] (3) Construction of mapping files;

[0054] (4) Target sequence file generation;

[0055] (5) Target sequence chromosome alignment and BAM reconstruction.

[0056] Detailed method flow and module explanation:

[0057] (1) Construction of reference genome and chromosome index

[0058] Please refer to figure 2 , the function of this step is to build a reference genome index file, which is used to compare the sample data to the reference genome, and obtain the relevant coordinate position information of the related sequence in the genome, and be used for subsequent construction of the mapping file. The construction of the chromosome index is used to q...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for quickly acquiring comparison result data of a target genome region. The method comprises the following steps: acquiring a reference genome sequence file and coordinate information files of all genes by using a public genome database on the basis of sample original sequencing data, and constructing a reference genome index file and a chromosome index file; constructing a mapping relationship between the sequence line number of the sample original sequencing data and genome comparison coordinates, and quickly reconstructing sample original sequencing data of a target gene sequence by using the mapping relationship; and carrying out sequence comparison by utilizing the chromosome index file and the sample original sequencing data of the target gene sequence to obtain an original comparison data file of the target gene sequence, and carrying out sequencing and duplicate removal to obtain final comparison result data of a target genome region. The method has the characteristics of simplicity in deployment, convenience in operation, high efficiency, high throughput and wide application range. Compared with an original secondary data BAM file, the obtained result basically has no information loss.

Description

technical field [0001] The invention relates to the technical field of genome variation detection in bioinformatics and precision medicine, and in particular to a method for quickly obtaining target genome region comparison result data based on original sequencing data of samples. Background technique [0002] With the rapid development of precision medicine, high-throughput sequencing technology (Next-Generation Sequencing, NGS) has gradually become the preferred method of genetic testing. The NGS sequencing of samples also brings a large amount of sequencing data, which leads to higher and higher requirements for computing power and storage of computers. At present, people classify NGS data according to different types. Different levels of data also have different requirements for storage forms. The general classification is as follows: [0003] a. The original data of the sample, usually the file format is FASTQ, is the first-level data, which needs to be stored for a lo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/10G16B50/00G06F16/22
CPCG16B30/10G16B50/00G06F16/2228
Inventor 栗海波姜玥梁萌萌
Owner SUZHOU SMK GENE TECH LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products