Method and device for quick contrast and analysis of short sequence for second-generation sequencing

A second-generation sequencing and analysis method technology, applied in the field of bioinformatics, can solve the problems of high memory usage and low efficiency of sequencing data comparison, and achieve the effects of fast comparison speed, shortened comparison time, and saving memory resources

Active Publication Date: 2017-01-04
北京普康瑞仁医学检验所有限公司
View PDF4 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention provides a method and device for quick comparison and analysis of next-genera

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for quick contrast and analysis of short sequence for second-generation sequencing
  • Method and device for quick contrast and analysis of short sequence for second-generation sequencing
  • Method and device for quick contrast and analysis of short sequence for second-generation sequencing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0096] Example 1 Data analysis of non-invasive prenatal testing based on the rapid comparison and analysis method of short sequences of next-generation sequencing

[0097] The module flow of the present invention to realize the non-invasive prenatal detection first uses perl language and shell commands to preprocess the human reference genome hg19.fa, and process it into the format required by the index query library, including three columns:

[0098] The first column: 36-mer DNA sequence fragment;

[0099] The second column: the chromosome number where the fragment is located;

[0100] The third column: the position on the chromosome where the fragment is located.

[0101] Then its core method is realized based on C language algorithm and data structure.

[0102] The design process and operation of the present invention require hardware and software environment: Linux system; more than 3 cores; more than 35 memory; C library under the Linux platform; Gcc compiler; Gdb debug...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for quick contrast and analysis of a short sequence for second-generation sequencing, which can solve the problems of low contrast efficiency and high memory occupation ratio of sequencing data. The method comprises the following steps of obtaining a DNA (deoxyribonucleic acid) short sequence obtained by sequencing, and respectively mapping and encoding the DNA short sequence by a first hash algorithm and a second hash algorithm, so as to respectively obtain a first index and a second index; according to a preset index query library, the first index and the second index, contrasting the DNA short sequence and a reference gene group, wherein the index query library consists of an unit structure array, and each unit structure comprises value and index 2; storing the array index offset of each unit structure as the corresponding index 1, namely the index value corresponding to the structure array, wherein K is the length of segment sequence; according to the contrast result, when the contrast result is correct, obtaining the value of the K-mer segment contrasted with the corresponding DNA short sequence, and determining the chromosome number of the corresponding DNA short sequence and the site on the chromosome.

Description

technical field [0001] The invention belongs to the field of biological information engineering, and relates to biological information technology and computer application technology, in particular, to a short-sequence rapid comparison and analysis method for second-generation sequencing of DNA sequences. Background technique [0002] DNA sequencing plays the most fundamental and broadest role in deciphering the genetic sequence codes of species life. As early as the discovery of the DNA double helix, someone reported the DNA sequencing technology, but the process was too complicated. Shortly afterwards, in 1977, Sanger invented the terminal termination sequencing method, which was a milestone. So far, with the development of bioinformatics science, the Sanger sequencing method has been unable to meet the needs of research, so the second-generation sequencing technology with lower cost, higher throughput and faster speed has emerged as the times require. Its core idea is to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22
CPCG16B30/00
Inventor 郑洪坤郭强许德德马威锋孙乔慧
Owner 北京普康瑞仁医学检验所有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products