Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Benchmark Set-Based Method for Genome Structural Variation Performance Detection

A detection method and technology for structural variation, applied in the fields of genomics, proteomics, instruments, etc., can solve the problems of lack of detection methods for variation identification results, insufficient detection methods for genome structure variation, etc., to achieve convenient data processing and analysis, fast Detection method, the effect of speeding up the pace

Active Publication Date: 2022-06-21
HARBIN NORMAL UNIVERSITY
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The purpose of the present invention is to solve the problem that the existing genome structure variation detection method is not comprehensive enough, and lacks a public variation recognition result detection method, and proposes a genome structure variation performance detection method based on a reference set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Benchmark Set-Based Method for Genome Structural Variation Performance Detection
  • A Benchmark Set-Based Method for Genome Structural Variation Performance Detection
  • A Benchmark Set-Based Method for Genome Structural Variation Performance Detection

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0044] Embodiment 1: The specific process of a benchmark set-based genome structure variation performance detection method in this embodiment is as follows:

[0045] The present invention proposes a public genome structure variation performance detection method, and conducts a more systematic and detailed analysis on the structural variation of the common types of insertion, deletion, duplication, inversion and translocation in the genome.

[0046] According to whether the data used is simulated data or real data, the detection methods of genome structure variation performance can be divided into: simulated data and real data.

[0047] On the simulated data, due to the benchmark set of structural variation, this type of genomic structural variation performance detection method is suitable for objective analysis and comparison of different structural variation performance detection methods.

[0048] On real sequencing data, due to the lack of a benchmark set of structural varia...

specific Embodiment approach 2

[0058] Embodiment 2: The difference between this embodiment and Embodiment 1 is that in step 1, based on the user variation identification result set and the benchmark set, the insertion, deletion, duplication, and inversion variation in the genome structure variation are calculated on the quantitative index The statistical results of variation of , and output to the terminal screen; the specific process is: The specific process is:

[0059] Due to the identification results of existing identification methods (such as Sniffles, nextSV, PBHoney, SMRT-SV, etc.), there are usually some variations with too large interval lengths. Such variations have no obvious significance due to their too large lengths. Therefore, in SV_STAT In the method, this type of variation is considered to be invalid variation, and these variations need to be removed before the performance detection of genomic structural variation to obtain more objective results. The higher the number of such invalid vari...

specific Embodiment approach 3

[0075] Embodiment 3: This embodiment is different from Embodiment 1 in that: the number of true positives and false positives in the identification results of user insertions, deletions, duplications or inversions after removing invalid variants is calculated; , number of true negatives identified for deletion, duplication or inversion variants, number of false negatives not identified; and recall, precision, F 1 score; the specific process is:

[0076] (1) Traverse S 1 , remove the invalid mutations whose mutation length is greater than 100kb, and obtain the mutation set S' whose mutation length meets the requirements 1 ;

[0077] (2) Statistics S' 1 with S 2 In the two sets, the number of mutations corresponding to each mutation length Size in 0≤mutation length≤2kb is stored in the form of two-tuple (Size,Num);

[0078] (3) Set S' 1 Each mutated Region in i with S 2 Each mutated Region in j Perform a pairwise comparison to the Region i and Region j Calculate the o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a genome structure variation performance detection method based on a benchmark set, and the invention relates to a genome structure variation performance detection method based on a benchmark set. The purpose of the present invention is to solve the problems that the existing genome structure variation detection methods are not comprehensive enough and lack a public variation recognition result detection method. The specific process of a genome structure variation performance detection method based on a benchmark set is as follows: step 1, based on the user variation identification result set and the benchmark set, calculate the variation statistics of insertion, deletion, duplication, and inversion variation in quantitative indicators in the genome structure variation Result; step 2, based on the user variation identification result set and the reference set, calculate the quantity index of the break point interval of the translocation variation identification result in the genome structure variation. The invention is used in the field of genome structure variation performance detection.

Description

technical field [0001] The present invention relates to a genome structure variation performance detection method based on a benchmark set. Background technique [0002] The study of human genome structural variation is of great significance in genome evolution, population polymorphism analysis, pathogenic variants, and human health. Variations in the human genome are mainly divided into three categories: (1) Single Nucleotide Variation (SNV), which generally refers to the difference in a single DNA base; (2) Small Indel (collectively referred to as insertion and deletion), which refers to It is the insertion or deletion of a small fragment sequence that occurs at a certain position in the genome, and its length is usually less than 50bp; (3) There are many types of large structural variation, including the insertion of large fragment sequences with a length of more than 50bp, Deletions, chromosomal inversions, sequence translocations within or between chromosomes, and some...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/20
CPCG16B20/20
Inventor 朱晓雷宇孟悦边奕心赵松丁云鸿李玉霞
Owner HARBIN NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products