Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

False positive gene mutation filtering method for targeted capture of gene sequencing data

A gene sequencing and targeted capture technology, applied in the field of data science, can solve the problems of small target area, difficulty in obtaining training data, and inability to cope with batch differences, and achieve the effect of solving batch differences and small scale

Active Publication Date: 2019-08-02
XI AN JIAOTONG UNIV
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of the MF strategy is that the required large-scale training data is often difficult to obtain due to cost reasons, and the model is only suitable for situations where the data characteristics are similar to the data characteristic model of the training data, that is, it cannot cope with batch differences
In addition, because the target capture gene sequencing technology only sequences the target capture region, the target region is small (small sample set), and the number of potential gene mutations is 2-3 orders of magnitude less than that of whole genome sequencing, even It is difficult to provide training data that meets the requirements of the MF model regardless of cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • False positive gene mutation filtering method for targeted capture of gene sequencing data
  • False positive gene mutation filtering method for targeted capture of gene sequencing data
  • False positive gene mutation filtering method for targeted capture of gene sequencing data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] see Figure 4 , a method for filtering false positive gene mutations aimed at targeted capture gene sequencing data of the present invention, comprising the following steps:

[0052] S1. Preprocessing the sample data of genetic detection mutation sites;

[0053] S101. Read the gene mutation detection report file output by the mutation detection software, that is, the output file conforming to the VCF format standard, read the data of each attribute from it, and use data standardization and normalization methods to preprocess the sample data. In the data preprocessing module, the feature data and descriptions of the mutations extracted from the VCF format file are shown in Table 1.

[0054] Table 1 The characteristics and physical meaning of mutations extracted from VCF format files

[0055]

[0056]

[0057]

[0058] In order to ensure the relevance and importance of the selected features, the feature selection engineering method in machine learning is adopte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a false positive gene mutation filtering method for targeted capture of gene sequencing data. The false positive gene mutation filtering method comprises the following steps: preprocessing gene mutation detection data; selecting three different supervised learning algorithms based on a triple training method to construct three different initial classifiers H1, H2 and H3, namely selecting three different supervised learning automatons and a learner generated based on an initial training set; training the H1, H2 and H3 to obtain an extended training set, and updating themodel; and marking the unmarked sample set U by using the trained model, and filtering according to a marking result. The method solves the problem that the traditional method cannot effectively copewith the batch difference.

Description

technical field [0001] The invention belongs to the technical field of data science with the application background of precision medicine, and specifically relates to a method for filtering false positive gene mutations for targeted capture of gene sequencing data. Background technique [0002] Since the completion of the draft of the human genome, in the past two decades, gene sequencing technology has made many milestone breakthroughs and has quickly entered the market. Among them, the second generation sequencing technology (English name: Next Generation Sequencing, English abbreviation: NGS) is the most Mature. Targeted capture gene sequencing is a clinical application of NGS. Because of its high cost performance and strong scalability, it is currently one of the most widely used technologies in precise tumor diagnosis and treatment. In recent years, targeted capture gene sequencing has been gradually popularized in the routine diagnosis and treatment of tumors, and a l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62C12Q1/6869
CPCC12Q1/6869G06F18/2155G06F18/214
Inventor 王旭文王嘉寅张选平韩博刘涛管彦芳王申杰王妙
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products