Biological classification method and system for species based on triple neural network

A neural network and classification method technology, applied in the field of species classification, can solve problems such as complex model results, long preprocessing and learning time

Active Publication Date: 2020-08-21
XIAMEN UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods require a complex preprocessing process for the input data, and have complex requirements for the model ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Biological classification method and system for species based on triple neural network
  • Biological classification method and system for species based on triple neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0065] As an implementation manner, step 102 specifically includes:

[0066] Determine the frequency a of k-tuple j appearing in the sequence to be classified j , where j=1,…,4 k , k is the length of tuple, 4 k is the number of tuples;

[0067] The k-tuple frequency vector of the sequence to be classified is determined as

[0068] For example, for a DNA sequence G, use a sliding window of length k to scan the entire DNA sequence from beginning to end, calculate the number of times (frequency) that k-tuple appears in the entire DNA sequence, and obtain the k-tuple frequency vector.

[0069] As an implementation manner, before step 102, this embodiment also includes: training the neural network model. As an optional implementation manner, the training method of the neural network model includes:

[0070] Build three identical neural networks with weight sharing;

[0071] Obtain a sample sequence; the sample sequence includes several sequences in each category;

[0072] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a biological classification method and system for species based on a triple neural network. The method comprises the following steps: acquiring a to-be-classified sequence, wherein the to-be-classified sequence is a DNA sequence, an RNA sequence, an amino acid sequence, a genome data sequence, a transcriptome data sequence, a metagenome data sequence or a metatranscriptomedata sequence; determining a k-tuple frequency vector of the to-be-classified sequence; carrying out dimension reduction processing on the k-tuple frequency vector of the to-be-classified sequence byadopting a neural network model; respectively calculating distances between the to-be-classified sequence and various sample sequences based on the k-tuple frequency vector having undergone dimensionreduction; and determining the category closest to the to-be-classified sequence as the category of the to-be-classified sequence. The method has the characteristics of simple data preprocessing and high classification speed.

Description

technical field [0001] The invention relates to the technical field of species classification, in particular to a method and system for biological classification of species based on a triple neural network. Background technique [0002] With the rapid development of sequencing technology, many unknown sequence data are generated in the biological field. Classifying and positioning them is a key step in sequence analysis. Traditional species classification is based on sequence comparison, which not only requires a lot of computing power and a lot of time, but also has low accuracy. [0003] Species classification methods based on deep learning are more computationally efficient than traditional comparison-based methods, and have been widely used in the classification of genomes and metagenomics. Existing deep learning-based classification algorithms are able to model complex dependencies between input data (such as genome fragments) and target variables (such as origin of s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00G16B40/20
CPCG16B30/00G16B40/20
Inventor 王颖王怡雯
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products