Method and device for classifying chromosome sequences and plasmid sequences

A chromosome and sequence technology, applied in DNA microarray pattern recognition, character and pattern recognition, instruments, etc., can solve the problems of low accuracy rate of chromosome sequence and plasmid sequence, poor effect, low efficiency of training model, etc., to improve training Efficiency and training effect, the effect of improving accuracy

Active Publication Date: 2016-06-01
SHENZHEN INST OF ADVANCED TECH
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, an embodiment of the present invention provides a method and device for classifying chromosomal sequences and plasmid sequences to solve the problems of low accuracy in classifying chromosomal sequences and plasmid sequences, low efficiency of training models, and less effective problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for classifying chromosome sequences and plasmid sequences
  • Method and device for classifying chromosome sequences and plasmid sequences
  • Method and device for classifying chromosome sequences and plasmid sequences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0029] figure 1 It shows the implementation flowchart of the method for classifying chromosomal sequences and plasmid sequences provided by the embodiment of the present invention, and is described in detail as follows:

[0030] In step S101, chromosome sequence and plasmid sequence are obtained.

[0031] For example, the chromosomal sequences and plasmid sequences of all sequenced bacteria (Bacteria) were obtained from the US National Center for Bioinformatics, including 2044 chromosomal sequences and 3198 plasmid sequences, and these data were used as experimental data.

[0032] In step S102, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is applicable to the technical field of data mining and provides a method and a device for classifying chromosome sequences and plasmid sequences. The method comprises steps: chromosome sequences and plasmid sequences are acquired, and a first training sample and a second training sample are obtained; frequency characteristics of all k character short strings and reverse complementary sequence pairs thereof are extracted and a first frequency characteristic table and a second frequency characteristic table are generated, wherein k is no less than 2 but no more than 5; a training set and a test set are extracted from the first frequency characteristic table and the second frequency characteristic table, and a chi-square test algorithm is adopted to calculate weight values of all characteristic data in the training set; a random forests algorithm is adopted and according to the characteristic data whose weight values meet preset conditions, a classification model is trained; and according to the classification model, the chromosome sequences and the plasmid sequences are classified. Thus, the training efficiency and the training effects of the classification model are improved, and accuracy on classification on the chromosome sequences and the plasmid sequences is improved.

Description

technical field [0001] The invention belongs to the technical field of data mining, and in particular relates to a method and a device for classifying chromosome sequences and plasmid sequences. Background technique [0002] Metagenomics is a biological omics technology that mixes multiple microbial cells in the environment into one sample, and uses genome sequencing technology to obtain all DNA (Deoxyribonucleic acid, deoxyribonucleic acid) data of the sample. Metagenomic data provide information on all microbial populations active in the environment, and play a key role in the study of major issues including human diseases, biomass energy and the evolution of life in nature. [0003] After the DNA is extracted, since the existing sequencing technology can only sequence sequences with a certain length limit, it is necessary to break the very long DNA chain into fragments for sequencing. After sequencing all the fragments, it is necessary to assemble the fragments and resto...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/46G06K9/52
CPCG06V10/464G06V10/431G06V2201/04G06F18/241
Inventor 周丰丰彭超王普葛瑞泉
Owner SHENZHEN INST OF ADVANCED TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products