A method for estimating the wide-area length distribution of genomic microsatellites considering the factor of tumor purity

A technology of length distribution and microsatellites, applied in the field of data science, can solve problems such as calculation deviations, achieve the effect of breaking through length limitations, realizing wide-area length detection, and solving calculation deviations

Active Publication Date: 2021-08-13
XI AN JIAOTONG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a method for estimating the genome microsatellite wide-area length distribution considering the tumor purity factor to solve the calculation deviation caused by the purity of the tumor sample in the input data. Sequencing read length limits the length of detectable genomic microsatellites, enabling wide-area length detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for estimating the wide-area length distribution of genomic microsatellites considering the factor of tumor purity
  • A method for estimating the wide-area length distribution of genomic microsatellites considering the factor of tumor purity
  • A method for estimating the wide-area length distribution of genomic microsatellites considering the factor of tumor purity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The present invention provides a genome microsatellite length distribution and state estimation method ELMSI (Estimation of Long Micro-satellite) based on tumor purity, the input data is the data of normal samples and their paired tumor samples, based on the purity estimation software, using The estimated tumor purity accelerates the process of deconvolution to solve the length distribution of mixed microsatellites, combining microsatellite read counting and maximum likelihood estimation to accurately identify the length distribution and status of short microsatellites in mixed samples, combined with maximum expectation Algorithms and the central limit theorem are used to infer the length distribution and state of long microsatellites, which solves the limitations of sample tumor purity and sequencing read length on MSI detection.

[0084] The present invention is based on following assumptions with general consensus in the academic circle:

[0085] 1. When the patient ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a genome microsatellite wide-area length distribution estimation method considering tumor purity factors, completes data feature extraction; finds microsatellite candidate regions; uses clustering algorithm to screen neglected microsatellite candidate regions; traverses the read segments of the region and segmentation; estimate the tumor purity of a given sequencing sample; estimate the length distribution parameters of tumor tissue microsatellites; use the average length distribution of long microsatellites to reflect its overall length distribution; The average length of satellites, and then use the updated average length of microsatellites to iteratively estimate the coverage of the specified window, and complete the detection of long microsatellites in pure tumor samples; determine the status of long tumor microsatellites to complete the estimation of wide-area length distribution. The invention solves the calculation deviation caused by the purity of the tumor sample in the input data, breaks through the limitation of the length of the sequencing read segment on the length of the detectable genome microsatellite, and realizes the wide-area length detection.

Description

technical field [0001] The invention belongs to the technical field of data science with the application background of precision medicine, and specifically relates to a method for estimating the wide-area length distribution of genome microsatellites considering the factor of tumor purity. Background technique [0002] Genomic microsatellite (English name: micro-satellite, English abbreviation: MS) is a DNA sequence composed of specific oligonucleotide units (usually 1-6 nucleotide fragments) repeated, with diversity in length, Often referred to as the length distribution. Microsatellite instability (English name: micro-satellite instability, English abbreviation: MSI) refers to a hypermutation pattern caused by a defect in the DNA mismatch repair system (English name: defective DNA mismatch repair, English abbreviation: dMMR). It is characterized by extensive length diversity of microsatellite repeats and increased frequency of single nucleotide variants (English name: sin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/00G16B30/10G16B40/00G16B5/00
CPCG16B5/00G16B20/00G16B30/10G16B40/00
Inventor 王嘉寅王以瑄张选平闫新兴冯旋赵仲孟
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products