Method and device for assembling genomes by utilizing long transcriptome sequencing result

A transcriptome sequencing and genome technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as limited transcriptome data

Inactive Publication Date: 2012-11-21
CHINESE ACAD OF FISHERY SCI
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the transcriptome data that can be compared by the above three comparison programs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for assembling genomes by utilizing long transcriptome sequencing result
  • Method and device for assembling genomes by utilizing long transcriptome sequencing result
  • Method and device for assembling genomes by utilizing long transcriptome sequencing result

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0083] Assembly of the zebrafish genome sequence using Sanger sequencing reads from the zebrafish transcriptome

[0084] Materials: Download 1,546,467 zebrafish transcriptome Sanger in FASTA format from the website of the National Center for Biotechnology Information (NCBI, National Center for Biotechnology Information) (http: / / www.ncbi.nlm.nih.gov / dbEST / index.html) Sequencing reads. 37,298 pre-assembled zebrafish genome fragments were downloaded from the Sanger Institute, UK (http: / / www.sanger.ac.uk / Projects / D_rerio / wgs.shtml), with an average length of 143,274 bp.

[0085] Step 01: Download the BLAT (BlAST-like alignment tool) program from the University of California, Santa Cruz (http: / / hgdownload.cse.ucsc.edu / admin / exe / ), select the stand-alone version mode, and use each transcriptome Sequencing reads were used as query sequences, and genomic fragments were used as matching sequences. According to the default parameters of the alignment program, 1,546,467 Sanger sequencin...

example 2

[0112] Assembly of tilapia genome fragments using 454-sequenced reads from the tilapia transcriptome

[0113] Materials: 5,900 tilapia whole genome sequence fragments were downloaded from the Broad Institute of the United States (http: / / bouillabase.org / ), with an average length of 2.8M. Download tilapia 454 sequencing reads in fastq format from the US Center for Biotechnology Information website (http: / / www.ncbi.nlm.nih.gov / sra / SRX078333 and http: / / www.ncbi.nlm.nih.gov / sra / SRX078329).

[0114] Methods: First, the Solexa QA software package (solexaqa.sourceforge.net) was used to filter low-quality transcriptome sequencing reads and short transcriptome sequencing reads with default parameters. Fastq format was then converted to fasta format using fastq2fasta.pl from the srtoolbox package (http: / / brianknaus.com / software / srtoolbox / ).

[0115] Next, the tilapia genome fragments were assembled according to steps 01 to 07 in Example 1.

[0116] Results: The average length of tila...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a device for assembling genomes by utilizing a long transcriptome sequencing result. The method adopts the scheme that transcriptome sequencing reading segments and genome segments of the same species are compared; only one transcriptome sequencing reading segment from a comparison to one genome segment is removed, query blocks of reserved transcriptome sequencing reading segments are filtered under specified conditions; block connection relative to the reserved query blocks are obtained according to the specified conditions; and the genome segments are connected on the basis of the block connection, thereby completing assembling of the genome sequences. According to the method for assembling the genomes by utilizing the long transcription sequence result, long segment sequencing data including a large amount of existing and published Sanger data can be utilized, so that the genome sequences can be assembled by utilizing the long transcriptome sequencing reading sequence.

Description

technical field [0001] The invention relates to a method and a device for assembling a genome by using long transcriptome sequencing results. Background technique [0002] At present, at least four sequencing technologies are used in genome and transcriptome research, including the traditional first-generation Sanger sequencing technology and Roche 454 sequencing technology as the second-generation high-throughput sequencing technology, Illumina sequencing technology and AB company's sequencing technology. SOLiD technology. The average sequencing read length generated by Sanger sequencing technology and Roche454 sequencing technology is more than 300 bp, while the sequencing read length generated by Illumina sequencing technology and SOLiD technology is less than 150 bp. [0003] During genome sequence assembly, genome sequencing libraries of different lengths need to be constructed to connect two genome fragments. Using a sequencing library with relatively short genome fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/20
Inventor 李炯棠薛尉汪金兔祝雅萍孙效文
Owner CHINESE ACAD OF FISHERY SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products