A False Positive Gene Mutation Filtering Method for Targeted Capture Genetic Sequencing Data

A technology of gene sequencing and target capture, applied in the field of data science, can solve the problems of small target area, difficult to obtain training data, unable to cope with batch differences, etc., and achieve the effect of small scale

Active Publication Date: 2021-08-13
XI AN JIAOTONG UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of the MF strategy is that the required large-scale training data is often difficult to obtain due to cost reasons, and the model is only suitable for situations where the data characteristics are similar to the data characteristic model of the training data, that is, it cannot cope with batch differences
In addition, because the target capture gene sequencing technology only sequences the target capture region, the target region is small (small sample set), and the number of potential gene mutations is 2-3 orders of magnitude less than that of whole genome sequencing, even It is difficult to provide training data that meets the requirements of the MF model regardless of cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A False Positive Gene Mutation Filtering Method for Targeted Capture Genetic Sequencing Data
  • A False Positive Gene Mutation Filtering Method for Targeted Capture Genetic Sequencing Data
  • A False Positive Gene Mutation Filtering Method for Targeted Capture Genetic Sequencing Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] see Figure 4 , a method for filtering false positive gene mutations aimed at targeted capture gene sequencing data of the present invention, comprising the following steps:

[0052] S1. Preprocessing the sample data of genetic detection mutation sites;

[0053] S101. Read the gene mutation detection report file output by the mutation detection software, that is, the output file conforming to the VCF format standard, read the data of each attribute from it, and use data standardization and normalization methods to preprocess the sample data. In the data preprocessing module, the feature data and descriptions of the mutations extracted from the VCF format file are shown in Table 1.

[0054] Table 1 The characteristics and physical meaning of mutations extracted from VCF format files

[0055]

[0056]

[0057] In order to ensure the relevance and importance of the selected features, the feature selection engineering method in machine learning is adopted, and the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a false positive gene mutation filtering method for targeted capture of gene sequencing data, which preprocesses the detection data of gene mutation; selects three different supervised learning algorithms based on the triple training method to construct three different initial classifications Device H 1 , H 2 , H 3 , that is, select three different supervised learning automata and generate a learner based on the initial training set; for H 1 , H 2 , H 3 Perform training to obtain an expanded training set, thereby updating the model; use the trained model to mark the unlabeled sample set U, and complete the filtering according to the marking results. The invention solves the problem that the traditional method cannot effectively deal with batch differences.

Description

technical field [0001] The invention belongs to the technical field of data science with the application background of precision medicine, and specifically relates to a method for filtering false positive gene mutations for targeted capture of gene sequencing data. Background technique [0002] Since the completion of the draft of the human genome, in the past two decades, gene sequencing technology has made many milestone breakthroughs and has quickly entered the market. Among them, the second generation sequencing technology (English name: Next Generation Sequencing, English abbreviation: NGS) is the most Mature. Targeted capture gene sequencing is a clinical application of NGS. Because of its high cost performance and strong scalability, it is currently one of the most widely used technologies in precise tumor diagnosis and treatment. In recent years, targeted capture gene sequencing has been gradually popularized in the routine diagnosis and treatment of tumors, and a l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62C12Q1/6869
CPCC12Q1/6869G06F18/2155G06F18/214
Inventor 王旭文王嘉寅张选平韩博刘涛管彦芳王申杰王妙
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products