A Genome Annotation Method Using Second- and Third-Generation Transcriptome Sequencing Data

A technology of transcriptome sequencing and genome, applied in genomics, proteomics, instruments, etc., can solve problems such as slow speed, inability to use at the same time, and insufficient accuracy and efficiency of genome prediction, so as to achieve strong operability and improve accuracy , the effect of improving efficiency

Active Publication Date: 2022-05-24
三亚博瑞源科技有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of these two strategies is that they can make full use of the information provided by the second-generation sequencing data, but they cannot directly use the third-generation sequencing data
The disadvantage of PASA is that it cannot make full use of the large amount of intron position information that NGS data can provide, and the speed is slow
[0004] It can be seen that the existing genome annotation methods have their own advantages and disadvantages, but none of them can take advantage of the advantages of second-generation and third-generation sequencing at the same time, and the accuracy and efficiency of genome prediction are not high enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Genome Annotation Method Using Second- and Third-Generation Transcriptome Sequencing Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

[0020] In this example, the genome annotation method of the present invention is used to annotate the genome structure of plateau fish. like figure 1 As shown, the genome annotation method using the second-generation and third-generation transcriptome sequencing data of the present invention includes the following steps:

[0021] Step 1: Compare the three-generation full-length transcriptome sequencing sequences to the target genome to obtain the initial structural information of each coding gene, which specifically includes the following steps:

[0022] Step 1.1: Predict the protein-coding sequence of the third-generation full-length transcriptome sequencing sequence: analyze the third-generation full-length transcriptome sequencing sequence to obtain a full-length non-chimeric sequence (FLNC), and predict the possible protein-coding sequence o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of genome annotation, and provides a genome annotation method using second-generation and third-generation transcriptome sequencing data. The method of the present invention includes the following steps: Step 1: align the three-generation full-length transcriptome sequencing sequence to the target genome to obtain the initial structural information of each coding gene; Step 2: align the second-generation transcriptome sequencing short sequence to the target genome Genome, extract intron splicing site information from the comparison file; step 3: combine the initial structure information of each coding gene and intron splicing site information to obtain the final structure information of each coding gene. The present invention can not only make full use of the advantage of high accuracy that the full-length transcript sequence can be obtained without splicing the third-generation transcriptome, but also fully consider the advantage that the second-generation transcriptome sequencing data can provide a large number of intron splicing site evidences , greatly improving the accuracy and efficiency of genome annotation.

Description

technical field [0001] The invention relates to the technical field of genome annotation, and relates to a method for annotation of eukaryotic whole genome coding gene structure, in particular to a method for genome annotation using second-generation and third-generation transcriptome sequencing data. Background technique [0002] Generally, after the whole genome sequence is assembled, the structure of its protein-coding gene needs to be predicted. Prediction typically employs a combination of three strategies: de novo prediction, prediction based on sequence homology in closely related species, and prediction based on transcriptome data. Since the transcriptome is the most direct evidence of genes expressed by a species, the prediction results based on transcriptome data are considered to be the most reliable, and the weight given by the strategy when integrating the prediction results of all strategies to obtain the final gene set Generally the highest. [0003] Among t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/00G16B20/30
CPCG16B20/00G16B20/30
Inventor 袁晓辉刘海平肖世俊
Owner 三亚博瑞源科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products