A method and system for species biological classification based on triplet neural network

A neural network and classification method technology, applied in the field of species classification, can solve the problems of complex model results, long preprocessing and learning time, etc., and achieve the effect of convenient and fast calculation, low computing power requirement, and fast processing speed

Active Publication Date: 2022-04-29
XIAMEN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods require a complex preprocessing process for the input data, and have complex requirements for the model results, requiring a long time for preprocessing and learning, which limits the application of these methods in species classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for species biological classification based on triplet neural network
  • A method and system for species biological classification based on triplet neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0065] As an implementation manner, step 102 specifically includes:

[0066] Determine the frequency a of k-tuple j appearing in the sequence to be classified j , where j=1,…,4 k , k is the length of tuple, 4 k is the number of tuples;

[0067] The k-tuple frequency vector of the sequence to be classified is determined as

[0068] For example, for a DNA sequence G, use a sliding window of length k to scan the entire DNA sequence from beginning to end, calculate the number of times (frequency) that k-tuple appears in the entire DNA sequence, and obtain the k-tuple frequency vector.

[0069] As an implementation manner, before step 102, this embodiment also includes: training the neural network model. As an optional implementation manner, the training method of the neural network model includes:

[0070] Build three identical neural networks with weight sharing;

[0071] Obtain a sample sequence; the sample sequence includes several sequences in each category;

[0072] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a species biological classification method and system based on a triple neural network. The method includes: obtaining a sequence to be classified, the sequence to be classified is a DNA sequence, an RNA sequence, an amino acid sequence, a genome data sequence, a transcriptome data sequence, a metagenomic data sequence or a macrotranscriptome data sequence; determining the sequence to be classified k-tuple frequency vector; adopt neural network model to carry out dimensionality reduction processing to the k-tuple frequency vector of described sequence to be classified; calculate respectively described sequence to be classified and each The distance between the class sample sequences; the category with the closest distance to the sequence to be classified is determined as the category of the sequence to be classified. The invention has the characteristics of simple data preprocessing and fast classification speed.

Description

technical field [0001] The invention relates to the technical field of species classification, in particular to a method and system for biological classification of species based on a triple neural network. Background technique [0002] With the rapid development of sequencing technology, many unknown sequence data are generated in the biological field. Classifying and positioning them is a key step in sequence analysis. Traditional species classification is based on sequence comparison, which not only requires a lot of computing power and a lot of time, but also has low accuracy. [0003] Species classification methods based on deep learning are more computationally efficient than traditional comparison-based methods, and have been widely used in the classification of genomes and metagenomics. Existing deep learning-based classification algorithms are able to model complex dependencies between input data (such as genome fragments) and target variables (such as origin of s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B30/00G16B40/20
CPCG16B30/00G16B40/20
Inventor 王颖王怡雯
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products