Third-generation sequencing RNA-seq comparison method based on WFA algorithm

An algorithm and sequencing technology, applied in the field of sequence comparison in bioinformatics, to achieve the effect of improving accuracy

Pending Publication Date: 2022-05-27
GUILIN UNIV OF ELECTRONIC TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem of the accuracy of the third-generation sequencing RNA-seq comparison algorithm, and provide a third-generation sequencing RNA-seq comparison method ba

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Third-generation sequencing RNA-seq comparison method based on WFA algorithm
  • Third-generation sequencing RNA-seq comparison method based on WFA algorithm
  • Third-generation sequencing RNA-seq comparison method based on WFA algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0037] A third-generation sequencing RNA-seq alignment method based on the WFA algorithm, comprising the following steps:

[0038] (1) Obtain the data sets required for the evaluation, including 4 simulated data sets and 3 real data sets. The 4 simulated data sets were simulated and generated using the PBSIM tool, which simulated PacBio ROI Saccharomyces cerevisiae, Drosophila melanogaster, and human No. 19 respectively. Chromosome and ONT R2 2D Drosophila melanogaster data; 3 real datasets include RacBio ROI Drosophila melanogaster, error-corrected PacBioROI Drosophila melanogaster, and PacBio subreads of Drosophila melanogaster. Here, data from different species and different technologies are used to evaluate the performance of the algorithm more comprehensively. Table 1 shows the information of third-generation sequencing RNA-seq reads. For different species, each species corresponds to a reference genome ref. Use ref and reads to compare to find the position of reads in ref;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a third-generation sequencing RNA-seq comparison method based on a WFA algorithm. The method comprises the following steps: acquiring a data set containing a target sequence and a query sequence; then indexing the reference genome; carrying out region selection and graph mapping; searching the longest common subsequence LCSk of the k-length substring, and then carrying out anchor point filtering and anchor point comparison; and introducing an annotation file to obtain a comparison result, and finally evaluating the comparison result. Experiments prove that the method provided by the invention effectively improves the accuracy of sequence alignment, especially the accuracy of splicing site alignment, and reduces the alignment time to a certain extent.

Description

technical field [0001] The invention relates to the technical field of sequence alignment in bioinformatics, in particular to a third-generation sequencing RNA-seq alignment method based on a WFA algorithm. Background technique [0002] The third-generation RNA sequencing technology can determine RNA sequences of tens of thousands to hundreds of thousands of bases in length, so it is widely used in transcriptome detection, gene expression estimation, and identification of splicing isoforms. . However, the sequencing length of the third-generation sequencing technology is too long, and there are splice sites in RNA-seq, which brings challenges to algorithm design. [0003] The existing third-generation RNA-seq alignment algorithms mainly include STAR, BBMap, GMAP, Minimap2, and GraphMap2. These methods all use a seed-expansion strategy and all have the ability to handle splice junctions. However, most algorithms have low accuracy, do not handle short exons well, and have l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/30G16B30/10
CPCG16B20/30G16B30/10
Inventor 张艳菊李琪王荣兴齐王璟
Owner GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products