Genomic microsatellite wide-area length distribution estimation method considering tumor purity factors

A technology of length distribution and microsatellite, which is applied in the field of data science and can solve problems such as calculation deviation

Active Publication Date: 2019-09-13
XI AN JIAOTONG UNIV
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a method for estimating the genome microsatellite wide-area length distribution considering the tumor purity factor to solve the calculation deviation caused by the purity of the tumor sample in the input data. Sequencing read length limits the length of detectable genomic microsatellites, enabling wide-area length detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genomic microsatellite wide-area length distribution estimation method considering tumor purity factors
  • Genomic microsatellite wide-area length distribution estimation method considering tumor purity factors
  • Genomic microsatellite wide-area length distribution estimation method considering tumor purity factors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The present invention provides a genome microsatellite length distribution and state estimation method ELMSI (Estimation of Long Micro-satellite) based on tumor purity, the input data is the data of normal samples and their paired tumor samples, based on the purity estimation software, using The estimated tumor purity accelerates the process of deconvolution to solve the length distribution of mixed microsatellites, combining microsatellite read counting and maximum likelihood estimation to accurately identify the length distribution and status of short microsatellites in mixed samples, combined with maximum expectation Algorithms and the central limit theorem are used to infer the length distribution and state of long microsatellites, which solves the limitations of sample tumor purity and sequencing read length on MSI detection.

[0084] The present invention is based on following assumptions with general consensus in the academic circle:

[0085] 1. When the patient ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a genomic microsatellite wide-area length distribution estimation method considering tumor purity factors. The method includes the steps of completing data feature extraction;finding out microsatellite candidate regions; using a clustering algorithm to filter ignored microsatellite candidate regions; traversing read segments of the regions and performing segmentation; estimating the tumor purity of a given sequencing sample; estimating the length distribution parameter of the tumor tissue microsatellite; using the average length distribution of the long microsatelliteto reflect its overall length distribution; estimating the average length of the microsatellite based on the coverage containing the microsatellite specified window, then iteratively estimating the coverage of the specified window using the updated microsatellite mean length, and completing the detection of the pure tumor sample long microsatellite; and determining the long tumor microsatellite state to complete the wide-area length distribution estimation. The invention solves the calculation deviation caused by the purity problem of the tumor sample of the input data, breaks the limitation of the sequencing read segment length on the length of the detectable genomic microsatellite, and realizes the wide-area length detection.

Description

technical field [0001] The invention belongs to the technical field of data science with the application background of precision medicine, and specifically relates to a method for estimating the wide-area length distribution of genome microsatellites considering the factor of tumor purity. Background technique [0002] Genomic microsatellite (English name: micro-satellite, English abbreviation: MS) is a DNA sequence composed of specific oligonucleotide units (usually 1-6 nucleotide fragments) repeated, with diversity in length, Often referred to as the length distribution. Microsatellite instability (English name: micro-satellite instability, English abbreviation: MSI) refers to a hypermutation pattern caused by a defect in the DNA mismatch repair system (English name: defective DNA mismatch repair, English abbreviation: dMMR). It is characterized by extensive length diversity of microsatellite repeats and increased frequency of single nucleotide variants (English name: sin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/00G16B30/10G16B40/00G16B5/00
CPCG16B5/00G16B20/00G16B30/10G16B40/00
Inventor 王嘉寅王以瑄张选平闫新兴冯旋赵仲孟
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products