Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Discontinuous context modeling and maximum entropy principle based gene compression method

A compression method and context technology, applied in the direction of electrical components, code conversion, etc., to achieve the effects of improving diversity and comprehensiveness, accurate synthesis probability, and improving efficiency and integrity

Active Publication Date: 2014-01-29
SHANGHAI JIAO TONG UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of performing statistical modeling on the sequence to obtain the probability and then using arithmetic coding, almost all compression methods observe the symbol sequence one by one and perform sequential modeling for prediction, and then obtain the final predicted probability through the Bayesian averaging method. Oversimplified predictive models are very detrimental to capturing correlations within gene sequence forms arranged in non-traditional regularities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Discontinuous context modeling and maximum entropy principle based gene compression method
  • Discontinuous context modeling and maximum entropy principle based gene compression method
  • Discontinuous context modeling and maximum entropy principle based gene compression method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

[0026] The invention provides a gene compression method based on discontinuous context modeling and the principle of maximum entropy, which improves the comprehensiveness of the statistical model by combining the idea of ​​discontinuous context modeling and traditional continuous context modeling, and obtains Logistic regression according to the principle of maximum entropy The model determines that more accurate predictions are fed into the arithmetic coder. In addition, two situations of referen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a discontinuous context modeling and maximum entropy principle based gene compression method. The method includes that in a first stage, two actual situations with reference sequences and without reference sequences are considered simultaneously, and repetitive sequences in or among gene sequences are represented with a dictionary method, so that compression efficiency is improved; in a second stage, a statistical encoder is composed of a predictor and an arithmetic encoder, the predictor introduces a modeling scheme in a discontinuous bit context combination type on the basis of traditional continuous context models for non-repetitive sequences, and prediction probabilities independently generated by all models are merged with a Logistic regression formula generated on the basis of the maximum entropy principle, so that a more accurate final prediction probability value is obtained and put into the arithmetic encoder. By the method, compression efficiency can be improved remarkably, and efficient storage is realized.

Description

technical field [0001] The present invention relates to an information compression method for super-large-scale gene sequences, and specifically relates to a method that combines discontinuous context modeling ideas with traditional continuous context modeling ideas, and uses the principle of maximum entropy to obtain the statistics of the final predicted probability. compression method. Background technique [0002] DNA is an important material basis for the survival, continuation and development of organisms, and has great scientific and social value. At present, DNA research is widely used in many important fields such as biology, medicine, and genetic science, such as protecting endangered biological species by collecting and preserving DNA information, predicting information based on human gene sequences, and finding gene variation rules to treat cancer Tumor etc. Various DNA sequence determination projects that provide basic experimental data for these disciplines ha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H03M7/40
CPCH03M7/4018H03M7/3079
Inventor 熊红凯李平好
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products