Unlock instant, AI-driven research and patent intelligence for your innovation.

Genome sequence splicing method based on protein information

A protein sequence and genome sequence technology, applied in the field of bioinformatics, can solve the problems of failing to meet the requirements of sensitivity and accuracy, consuming a lot of time, and damaging the sensitivity of scaffolding results, etc.

Active Publication Date: 2018-11-27
CENT SOUTH UNIV
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The accuracy of the tool ESPRIT depends on the predicted gene set, and the extremely time-consuming nature of gene prediction limits the practical application of ESPRIT
Similarly, in order to improve the matching between sequences, SWiPS needs to compare protein sequences and genome sequences many times, which also takes a lot of time.
The tool PEP_scaffolder will directly delete the protein sequence with too large matching coverage, which will damage the sensitivity of the scaffolding results to a certain extent
Therefore, the existing scaffolding methods cannot meet the sensitivity and accuracy requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome sequence splicing method based on protein information
  • Genome sequence splicing method based on protein information
  • Genome sequence splicing method based on protein information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0097] The present invention will be further described below in conjunction with examples. The following contigs are represented as DNA sequences in this example.

[0098] Such as figure 2 As shown, a method for generating genome splicing sequence scaffolding based on protein information provided by the present invention includes the following steps:

[0099] Step 1: Preprocessing.

[0100] Its purpose is S1: Obtain the alignment information between the DNA sequence to be spliced ​​and the protein sequence. The specific execution process is:

[0101] S11: compare the DNA sequence to be spliced ​​with the protein sequence one by one to obtain an alignment file (out.psl).

[0102] Wherein, the alignment file includes alignment information between all matched DNA sequences and protein sequences, and each row in the alignment file represents an alignment information. If there are n matching positions between a DNA sequence and a protein sequence, then n pieces of comparison ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a genome sequence splicing method based on protein information. The method comprises the steps: obtaining the comparison information between DNA sequences and protein sequencesto be spliced; determining the adjacent relationship between the corresponding DNA sequences on each protein sequence; constructing the connecting sides between the adjacent DNA sequences and obtaining the supporting information of the corresponding connecting sides of each DNA sequence on each protein sequence; denoising the supporting information of the connecting sides of each DNA sequence inturn; denoising the front and back nodes of each DNA sequence in turn based on the weight scoring function; calculating the connecting spacing of all the DNA sequence connecting sides in which the supporting information exists; and obtaining the genome sequence splicing path by series connection of the front and back nodes of each DNA sequence based on the connecting spacing between the connectingsides of all the DNA sequences. The sensitivity and the accuracy of the genome sequence splicing result can be improved by the method.

Description

technical field [0001] The invention belongs to the field of bioinformatics, and in particular relates to a genome sequence splicing method based on protein information. Background technique [0002] The low-cost development of new sequencing technologies has dramatically changed the whole-genome sequencing landscape, enabling scientists to launch numerous genome projects to decode the genomes of previously unsequenced organisms. Sequencing technologies enable deep sequencing of most species, including mammals, in just a few days. But DNA sequencing technologies cannot directly generate complete sequences at the chromosome level. Instead, they generate large numbers of reads, sampling contiguous bases ranging in length from tens to thousands of bases from different parts of the genome. Genome assembled long reads are assembled from millions or billions of short-length DNA sequence reads generated by sequencing technologies. [0003] Due to the lack of a reference genome fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/22G06F19/18G06F19/12
Inventor 王建新尚娟李洪东
Owner CENT SOUTH UNIV