Macrogenome-based method for multiple-sequence alignment of proteins

A metagenomic and multi-sequence technology, applied in the field of protein multiple sequence alignment based on metagenomics, can solve problems such as protein folding, difficulty in searching for homologous sequences, and inability to reliably extract evolutionary information, so as to improve diversity, The effect of increasing the number of effective sequences

Inactive Publication Date: 2021-08-13
ZHEJIANG UNIV OF TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, with many target protein sequences, especially novel proteins, it is difficult to search sufficient homologous sequences in databases to reliably extract evolutionary information to guide protein folding
[0004] In summary, the existing protein multiple sequence alignment methods are not perfect in terms of the quantity and quality of the search for homologous sequences and computational efficiency, resulting in the inability to reliably extract evolutionary information and then guide protein folding, so improvements are needed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Macrogenome-based method for multiple-sequence alignment of proteins
  • Macrogenome-based method for multiple-sequence alignment of proteins
  • Macrogenome-based method for multiple-sequence alignment of proteins

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings.

[0020] refer to figure 1 with figure 2 , a method for protein multiple sequence alignment based on metagenomics, comprising the following steps:

[0021] 1) First, according to the sequence of the target protein, use HHblits to perform an initial search on the UniClust30 database, using the parameters "-cpu 10-diff inf-id 90-cov 50-n 3", where "-cpu 10" indicates multiple sequences The number of CPU cores used in the search process, the default is 2, "-diff inf" means to select the most diverse sequence set, the default is 1000, "inf" means to turn off this option, "-cov 50" means to filter out the searched MSA The sequence whose residue gap exceeds 50% of the target sequence length, the default value is 0, "-n 3" indicates that the number of iterations of the search process is 3;

[0022] 2) Use hhfilter on the obtained multiple sequence alignment file to filter o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a macrogenome-based method for multiple-sequence alignment of proteins, which comprises the following steps: performing initial search on a UniClust30 database by using HHblits according to a sequence of a target protein, filtering out a sequence with residue gap exceeding 50% the target sequence length in searched MSA, and iterating the search process for three times; filtering obtained multi-sequence comparison files by using hhfilter to generate a sequence with a gap proportion exceeding 25% in the MSA, so as to obtain MSA1; calculating the valid sequence number Meff of MSA1, if Meff is larger than or equal to 10 L, ending the search, and taking the MSA1 as an output result of multi-sequence alignment; otherwise, constructing a hidden Markov model HMM of the MSA1, searching a Metaclust50 metagenome database to obtain multiple-sequence alignment MSA2, and combining the MSA1 with the MSA2 to obtain final MSA. The invention not only enhances the number and quality of the searched homologous sequences, but also improves the calculation efficiency.

Description

technical field [0001] The invention relates to the fields of bioinformatics and computer applications, in particular to a metagenomic-based protein multiple sequence alignment method. Background technique [0002] Proteins are the focus of many areas of life science research because they are responsible for most of the biological functions of living organisms. Sequence alignment is the basic composition and important foundation of bioinformatics. The basic idea of ​​sequence alignment is that, based on the general law that sequence determines structure and structure determines function in biology, both nucleic acid sequences and sequences on the primary structure of proteins are regarded as strings composed of basic characters, and the differences between sequences are detected. Similarity, discovering information about function, structure and evolution in biological sequences. The theoretical basis of sequence comparison is the theory of evolution. If there is enough sim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B15/20G16B40/00
CPCG16B15/20G16B40/00
Inventor 张贵军郭赛赛刘俊侯铭桦杨涛周晓根
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products