Virus identification method and device

An identification method and virus technology, which are applied in the field of virus identification methods and virus identification devices, can solve problems such as large database dependence, and achieve the effect of improving detection rate and accuracy rate

Active Publication Date: 2016-10-19
深圳华大因源医药科技有限公司
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The methods currently used are basically based on the comparison of homologous sequences, and rely heavily on the already constructed database

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Virus identification method and device
  • Virus identification method and device
  • Virus identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] figure 2 Indicates the virus identification process and the determination process of the boundary, mainly including:

[0042] 1. Sample selection

[0043] This embodiment selects 16 plant samples, and the sample names are: Cooks_footf, Grass100f, Poplar100f, TCV_add, TCV, TCV-TYMV_add, TCV-TYMV, TGM-CK, TYMV-2, TYMV, Willow100f, GSM548932, GSM548933, peach_flower, peach_fruit, peach_leaf. Among them, TGM-CK is a sterile seedling purely cultured in the laboratory, TCV, TCV-TYMV, and TYMV are artificially actively infected with viruses on the basis of purely cultured seedlings, TCV (Turnip crinkle virus) is turnip crinkle virus, TYMV (Turnip yellow mosaic virus) is turnip yellow mosaic virus, sample TCV is infected with TCV virus, sample TCV-TYMV is infected with two kinds of viruses, TCV and TYMV, sample TYMV is infected with TYMV virus, sample TYMV-2 is the experimental repetition, sample TCV_add and TCV- TYMV_add is the duplication of sequencing data (technical dup...

Embodiment 2

[0063] Consideration and determination of factors related to virus identification, including the amount of sequencing data, assembly method, five-element judgment criteria, suggestion and evaluation of prediction models.

[0064] 1. Determine the amount of sequencing data

[0065] In order to find the virus in the sample as much as possible, we expect to measure more data, but the cost of sequencing will also increase as the amount of sequencing increases. Moreover, if the amount of sequencing can already cover all the sequences in the sample, it is also necessary to measure more data. It is a waste, therefore, it is necessary to roughly determine the amount of sequencing data for small RNA.

[0066] In order to evaluate this amount, we measured 100M sequences, and took the ratio of the number of non-redundant sequences in different data volumes to the number of non-redundant sequences as a reference for evaluating the saturation of sequencing data.

[0067] image 3 Display...

Embodiment 3

[0084] Prediction and validation of whether an unknown sequence is from a virus.

[0085] 1) Low-quality data filtering. The fq file obtained by sequencing is dejoined and low-quality, and the fasta format file is generated, and the number of redundancy is counted, that is, the ID of each sequence includes the number of repetitions of the sequence, such as: t00001200. Where t00001 is the ID, and 200 is the repeat number of the sequence.

[0086] 2) Known ncRNA filtering. Comparing the fasta files completed by preliminary filtering with known ncRNA databases, including: http: / / www.sanger.ac.uk / software / Rfam, GenBank's noncoding RNA database (noncoding RNAdatabase) ( http: / / www.ncbi.nlm.nih.gov / ), using BLAST 2.2.23 software, requiring an e value <0.01. Then compare it with the miRBase (release 18) database, which requires exactly the same comparison. Get further filtered fasta format files.

[0087] 3) Data assembly. If the species already has a genome sequence, then th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a virus identification method comprising the following steps: obtaining RNA sequencing data of a to-be-tested sample; assembling the first portion of the sequencing data so as to obtain assemble sequences; comparing the first portion of the sequencing data with the assemble sequences so as to obtain a comparison result; determining mutation sites on the assemble sequences according to the comparison result, and determining at least one of the following assemble sequences a-c: a, at least one from the group formed by average entropy and median entropy, and mutation site proportion; b, at least one from the group formed by the average mutation rate and median mutation rate, and the mutation site proportion; c, mutation site proportion; comparing at least one from the assemble sequences a-c with the corresponding boundary, and determining the assemble sequence falling into the boundary to be from virus. The invention also discloses a virus identification device; the virus identification method and / or device can accurately predict whether an unknown sequence is a virus sequence or not without depending on the homologous sequence alignment.

Description

technical field [0001] The invention relates to the field of biological detection, in particular, the invention relates to a virus identification method and a virus identification device. Background technique [0002] As of June 30, 2014, there were 2,827 virus species published by the International Conference on Taxonomy of Virology (ICTV). In 2011, there were 2,484 species, and in 2009, there were 2,285 species. Almost 100 viruses were discovered every year. This speed is far behind that of other microorganisms such as bacteria. This is because the virus itself cannot grow and needs to parasitize in the host cell, so it is difficult to isolate and culture. However, some researchers predict that as long as there are cells, there will be viruses. Viruses are very widespread and huge in nature. Now known viruses , less than one ten-thousandth of the total. With the development of sequencing technology, the cost continues to decrease and the throughput continues to increase,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22C12Q1/70C12Q1/68C12M1/34
Inventor 麻锦敏王珲
Owner 深圳华大因源医药科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products