Transcriptome analysis method and system without reference genome sequence

A technology of transcriptome analysis and reference genome, which is applied in the field of transcriptome sequencing data analysis, can solve the problems of transcriptome analysis without reference genome sequence and achieve high accuracy

Active Publication Date: 2021-02-23
TIANJIN MODERN INNOVATIVE TCM TECH CO LTD
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The main purpose of the present invention is to provide a transcriptome analysis method and system without a reference genome sequence to solv

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transcriptome analysis method and system without reference genome sequence
  • Transcriptome analysis method and system without reference genome sequence
  • Transcriptome analysis method and system without reference genome sequence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] In this embodiment, a transcriptome analysis method without a reference genome sequence is provided. The flow chart of the analysis method is as follows figure 1 shown, including:

[0048] Step S101, obtaining valid second-generation RNA sequencing data and third-generation RNA effective sequencing data of the sample to be tested;

[0049] Step S102, using the second-generation RNA effective sequencing data to correct the third-generation RNA effective sequencing data to obtain the third-generation corrected effective data;

[0050] Step S103, performing de-redundancy on the three generations of corrected valid data to obtain a unigene sequence;

[0051] Step S104, performing sequence alignment on the second-generation RNA effective sequencing data and the unigene sequence to obtain an alignment file;

[0052] Step S105, using the alignment file, counting the number of reads on each unigene sequence to obtain the expression level FPKM value of each gene.

[0053] The...

Embodiment 2

[0067] In this embodiment, a detailed transcriptome analysis method without a reference genome sequence is provided, wherein, figure 2 A detailed schematic flow diagram of the analysis method is shown, specifically including the following steps:

[0068] (1) Extract total RNA from biological samples without a reference genome, build libraries on the illumina platform and ONT sequencing platform, perform sequencing on the machine, and obtain the original sequencing data.

[0069] (2) The off-machine data of the Illumina platform is in fastq format, which can be further analyzed directly. The off-machine data of the ONT platform is in the fast5 format. It is necessary to use the self-developed software guppy v3.6.0 of the ONT platform to use high-precision identification configuration files to basecall the original data and obtain the fastq file.

[0070] (3) For the data on the illumina platform, filter and clean through the software fastp, and filter the data with reads leng...

Embodiment 3

[0086] In this embodiment, a transcriptome analysis system without a reference genome sequence is provided, such as image 3 As shown, the analysis system includes: an acquisition module 10, a correction module 20, a de-redundancy module 30, a first comparison module 40 and an expression statistics module 50, wherein,

[0087] An acquisition module 10, configured to acquire effective sequencing data of the second-generation RNA and effective sequencing data of the third-generation RNA of the sample to be tested;

[0088] The correction module 20 is used to correct the effective sequencing data of the third generation RNA by using the effective sequencing data of the second generation RNA, and obtain the effective data of the third generation correction;

[0089] The de-redundancy module 30 is used to de-redundantly obtain the unigene sequence for the three generations of corrected valid data;

[0090] The first comparison module 40 is used to perform sequence comparison on th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a transcriptome analysis method and system without a reference genome sequence. The analysis method comprises the following steps: acquiring second-generation RNA effective sequencing data and third-generation RNA effective sequencing data of a to-be-detected sample; correcting the third-generation RNA effective sequencing data by utilizing the second-generation RNA effective sequencing data; redundancy elimination is carried out on the valid data after the third-generation correction to obtain a unigene sequence; performing sequence comparison on the second-generation RNA effective sequencing data and the unigene sequences, and performing statistics on the reads number on each unigene sequence by utilizing a comparison file to obtain an expression level FPKM value of each gene. The analysis method integrates the advantages of a second-generation sequencing technology and a third-generation sequencing technology, and solves the problem that no transcriptome of areference-free genome sequence is analyzed by utilizing third-generation sequencing data at present.

Description

technical field [0001] The present invention relates to the field of transcriptome sequencing data analysis, in particular to a transcriptome analysis method and system without a reference genome sequence. Background technique [0002] Next-generation sequencing technology refers to the second-generation sequencing technology developed based on the principle of sequencing-by-synthesis. The illumina sequencing platform is a widely used next-generation sequencing technology, which has the advantages of high sequencing throughput, low cost, and accurate sequencing results. The disadvantages are that the sequencing read length is short (below 500bp), and the sample preparation process is cumbersome. [0003] Three-generation sequencing technology refers to the third-generation sequencing technology developed based on the principle of single-molecule sequencing technology. Nanopore sequencing platform (Oxford Nanopore Technology, ONT) is the third-generation gene sequencing tec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/10G16B20/30G16B25/10G16B5/00
CPCG16B30/10G16B20/30G16B25/10G16B5/00Y02A90/10
Inventor 田振阳王苹
Owner TIANJIN MODERN INNOVATIVE TCM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products