Method for removing human gene sequence in macro genome sequencing data

A technology of sequencing data and metagenomics, which is applied in the field of genetic engineering, can solve the problems of insufficient removal of human genome sequences and high false positives in microbial analysis, and achieve the effect of increasing speed and saving computing resources

Inactive Publication Date: 2018-06-22
深圳市泰康吉音生物科技研发服务有限公司
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] In view of the deficiencies in the above-mentioned technologies, the present invention provides a method for removing human gene sequences from metagenomic sequencing data, which solves the problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for removing human gene sequence in macro genome sequencing data
  • Method for removing human gene sequence in macro genome sequencing data

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0062] In order to express the present invention more clearly, the present invention will be further described below with reference to the accompanying drawings.

[0063] See figure 1 , The method for removing human gene sequences from metagenomic sequencing data of the present invention includes the following steps:

[0064] Step S1: Construct a reference gene set from the original sequencing data of the Thousand Genome Project samples, and after downloading it, first perform quality control on the data and filter the low-quality data to obtain high-quality data, which is used for sequencing read comparison , So as to better remove human read;

[0065] Step S2: After obtaining the high-quality sequencing reads of thousands of genome data, use genome assembly software to assemble them into longer gene fragments, and then compare them with the sequencing reads as a reference sequence. After the assembly is completed, select the length greater than The 150bp gene fragment is used as t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for removing a human gene sequence in macro genome sequencing data. The method comprises the following steps that a reference gene set is constructed through originalsequencing data of genome planning samples of thousands of people, and after high-quality sequencing read of genome data of thousands of people is obtained, genome assembly software assembles the sequencing read into a long gene segment to serve as a reference sequence later to be compared with the sequencing read; gene fragment data is extracted from all non-tumor samples from an NCBI database toserve as gene fragment data of an NCBI Bioproject for subsequent processing; the data of the genome data of thousands of people and the data of the NCBI Bioproject are merged, and after redundancy elimination, the data becomes a non-redundant gene fragment data set; virus genome sequences in the non-redundant gene fragment data set are found out and removed from the gene fragment sequence; the gene fragment sequence without the virus genome sequences serves as a reference genome for removing a human sequence in the macro genome sequencing data.

Description

technical field [0001] The invention relates to the field of genetic engineering, in particular to a method for removing human gene sequences from metagenomic sequencing data. Background technique [0002] At present, metagenomic sequencing can be applied to the monitoring of intestinal flora status and the detection of infectious pathogenic microorganisms. Compared with other technologies, it has the advantages of high detection throughput, wide detection coverage, and no need to predict the types of microorganisms in advance. With the rapid decline in the cost of high-throughput gene sequencing and the rapid increase in sequencing speed, the application of metagenomic sequencing in microbial detection will become more and more extensive. [0003] The main sources of metagenomic sequencing samples are body fluids or tissues from different parts of the human body. Generally, after the DNA in the samples is extracted, the extracted DNA is subjected to gene sequencing of the w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22
CPCG16B30/00
Inventor 苏政肖卫民苏闻赵崇涛黄瑞坤
Owner 深圳市泰康吉音生物科技研发服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products