Unlock instant, AI-driven research and patent intelligence for your innovation.

Variation probability of genetic transcription and algorithm of variation direction

A technology for gene transcription and mutation, which is applied in the field of high-throughput sequencing of biological information, and can solve the problems of complex calculation, long time required, and mis-location of seeds.

Inactive Publication Date: 2018-10-26
中科政兴(上海)医疗科技有限公司
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the birth and rapid development of high-throughput sequencing technology, the cost of sequencing is getting lower and higher, and the throughput is getting higher and higher, which greatly promotes the research on bioinformatics. It helps to find some sites related to the disease, which is of great significance for the subsequent pathological determination of the disease and the exploration of treatment options. However, the massive high-throughput data and the accuracy requirements for the test results make InDel detection face great challenges. There are two problems in directly aligning short sequences to the reference sequence. One is that the calculation is complex and takes a long time during the mapping process. The other is that when the read matches the reference sequence, the read will be mapped to its reference sequence. The first match, which is usually not the best match, in order to solve the above two problems, the present invention first constructs a hash table on the seed set of the reference sequence generated by the sliding window method, and then uses the hash table to compare Read for positioning, because the reference sequence has a large amount of information, and the creation of the hash table consumes memory, so the sequence is binary compressed while creating the hash table, which greatly reduces the memory usage and improves the speed of analysis. During the InDel detection process There are also two problems. One is that the length of reads generated by high-throughput sequencing technology is very short, not to mention the seeds used as subsequences of reads. Therefore, a seed is often positioned at multiple positions in the reference sequence when positioning. The other is that the distribution of InDel on the read is random, and sometimes the position of InDel is covered by the seed, so that the seed is erroneously located at a certain position on the reference sequence. In order to improve the correctness of InDel detection, the present invention proposes First, the sliding window method is used for read to select multiple subsequences and compare them to the reference sequence to obtain their respective candidate sites, and in order to reduce the false positive of the candidate sites, the present invention introduces supportNum, and in the subsequent evaluation process based on the supportNum setting Localized values, narrowing down the final set of InDel candidates

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Variation probability of genetic transcription and algorithm of variation direction
  • Variation probability of genetic transcription and algorithm of variation direction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0036] see Figure 1~2 , in an embodiment of the present invention, an algorithm for gene transcription variation probability and variation direction, the steps are as follows:

[0037] 1. Hash table creation

[0038]Hash table (Hash table, also called hash table) is a data structure that is directly accessed according to the key value (Key value). It accesses records by mapping the key value to a position in the table to speed up the search. Speed, this map...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses variation probability of genetic transcription and an algorithm of a variation direction. Massive high-flux data and an accuracy requirement on a detection result enable InDeldetection to face a big challenge, and if a short sequence is directly matched with a reference sequence, two problems can be caused, one of the problems is that complicated calculation in a mapping process requires long time, and the other problem is that when read is matched with the reference sequence, read is mapped to first match of the reference sequence, and therefore, generally, this is not a perfect match; in order to solve the two problems, a hash table is firstly built for a seed set of the reference sequence produced by a sliding window method, then, read is positioned by utilizingthe hash table during matching and then the hash table is built for the seed set of reference sequence produced by the sliding window method; because of massive information quantity of the referencesequence, memory of the hash table is consumed, so that the sequence is subjected to binary compression when the hash table is built at the same time, and the occupation of the memory is greatly reduced.

Description

technical field [0001] The invention relates to the biological information field of high-throughput sequencing, in particular to an algorithm for gene transcription variation probability and variation direction. Background technique [0002] With the birth and rapid development of high-throughput sequencing technology, the cost of sequencing is getting lower and higher, and the throughput is getting higher and higher, which greatly promotes the research on bioinformatics. It helps to find some sites related to the disease, which is of great significance for the subsequent pathological determination of the disease and the exploration of treatment options. However, the massive high-throughput data and the accuracy requirements for the test results make InDel detection face great challenges. There are two problems in directly aligning short sequences to the reference sequence. One is that the calculation is complex and takes a long time during the mapping process. The other is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/28G06F19/12G06F19/18
CPCG16B5/00G16B20/00G16B40/00G16B50/00
Inventor 邵莉佟艳辉李鹏
Owner 中科政兴(上海)医疗科技有限公司