Genome annotation method utilizing second-generation and third-generation transcriptome sequencing data

A transcriptome sequencing and genome annotation technology, applied in the fields of genomics, proteomics, instruments, etc., can solve the problems of slow speed, inability to use at the same time, and insufficient genome prediction accuracy and efficiency, so as to improve accuracy and operability. Strong and efficient effect

Active Publication Date: 2020-06-19
三亚博瑞源科技有限公司
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of these two strategies is that they can make full use of the information provided by the second-generation sequencing data, but they cannot directly use the third-generation sequencing data
The disadvantage of PASA is that it cannot make full use of the large amount of intron position information that NGS data can p...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome annotation method utilizing second-generation and third-generation transcriptome sequencing data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0020] In this example, the whole genome gene structure of plateau fish is annotated by using the genome annotation method of the present invention. Such as figure 1 As shown, the genome annotation method using second-generation and third-generation transcriptome sequencing data of the present invention includes the following steps:

[0021] Step 1: Align the three-generation full-length transcriptome sequencing sequence to the target genome to obtain the initial structural information of each coding gene, which specifically includes the following steps:

[0022] Step 1.1: Predict the protein-coding sequence of the third-generation full-length transcriptome sequencing sequence: analyze the third-generation full-length transcriptome sequencing sequence to obtain a full-length non-chimeric sequence (FLNC), and predict the possible protein-coding...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of genome annotation, and provides a genome annotation method utilizing second-generation and third-generation transcriptome sequencing data. The method comprises the following steps: 1, comparing a third-generation full-length transcriptome sequencing sequence to a target genome to obtain initial structure information of each encoding gene; 2, comparing the second-generation transcriptome sequencing short sequence to a target genome, and extracting intron cleavage site information from a comparison file; and 3, obtaining final structure informationof each coding gene by combining the initial structure information of each coding gene and the intron cleavage site information. According to the method, a full-length transcript sequence can be obtained without splicing a third-generation transcriptome and second-generation transcriptome sequencing data can provide a large amount of intron cleavage site evidences can also be fully considered, and the accuracy and efficiency of genome annotation are greatly improved.

Description

technical field [0001] The invention relates to the technical field of genome annotation, and relates to a method for annotating the coding gene structure of the whole genome of eukaryotic organisms, in particular to a method for genome annotation using second-generation and third-generation transcriptome sequencing data. Background technique [0002] Generally, after the whole genome sequence is assembled, the structure of the protein-coding gene needs to be predicted. Prediction usually uses a combination of three strategies: de novo prediction, prediction based on sequence homology of closely related species, and prediction based on transcriptome data. Since the transcriptome is the most direct evidence of the genes expressed by a species, the prediction results based on the transcriptome data are considered to be the most credible, and the weight given by the strategy when integrating the prediction results of all strategies to obtain the final gene set Generally the hi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/00G16B20/30
CPCG16B20/00G16B20/30
Inventor 袁晓辉刘海平肖世俊
Owner 三亚博瑞源科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products