A method to improve the assembly efficiency of metagenomic nanopore sequencing data strains
A technology of metagenomics and sequencing data, applied in the field of bioinformatics analysis, which can solve the problems of increased difficulty in assembly, long assembly running time, and increased assembly running time, so as to improve assembly efficiency, improve identification efficiency, and ensure validity and accuracy Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0069] Example 1 Construction of this patented method
[0070] The focus of this patent is that, after the metagenomic data is pre-grouped, the assembly efficiency is improved based on the assembly of the grouped reads.
[0071] The process of method optimization
[0072] Two aspects need to be explained first: from the sequence of reads to the 5-mer frequency matrix, and the cluster label obtained for each read.
[0073] In the specific calculation,
[0074] 1. First calculate the 5-mer frequency matrix based on the reads sequence:
[0075] The number of sequence types of -5-mer is 4*4*4*4*4 / 2=512;
[0076] - Calculate the frequency of these 512 5-mers in each read;
[0077] - get a 5-mer frequency matrix;
[0078] 2. Then use Umap to reduce the dimension based on the frequency matrix, and use hdbscan to assign a cluster label to each read.
[0079] 3. Then use Canu / meta-Flye software to assemble for each cluster.
[0080] 4. Finally, use blast to compare the assembly ...
Embodiment 2
[0093] Embodiment 2 Umap grouping effect of the patented method
[0094] The present invention makes the zymo official ONT sequencing data grouped under different time / data volume gradients by means of pre-grouping, and reads from the same species tend to be grouped into the same cluster. process proceeds.
[0095] For the results of dimensionality reduction after Umap grouping, see Figure 2-6 , figure 2 is the dimensionality reduction clustering result graph of 1h, image 3 is the dimensionality reduction clustering result graph of 2h, Figure 4 is the dimensionality reduction clustering result graph of 3h, Figure 5 is the dimensionality reduction clustering result graph of 4h, Image 6 is the dimensionality reduction clustering result graph of 5h. It can be seen that all reads are divided into different clusters by pre-clustering.
Embodiment 3
[0096] Example 3 Evaluation of the assembly efficiency of the patented method
[0097] The present invention significantly improves the assembly efficiency of zymo official ONT sequencing data under different time / data volume gradients, such as 1h-5h base data volume, by means of pre-grouping. The specific implementation is based on the flow of Example 1.
[0098] The assembly time results are shown in Table 1. It can be seen that the assembly time using Umap pre-grouping is shortened by nearly half.
[0099] Table 1
[0100] time base(bp) Assembly (no_Umap) Assembly time (Umap) 1h 458,473,600 45m47.655s 14m36.602s 2h 919,961,649 503m13.250s 36m54.974s 3h 1,375,306,551 749m23.833s 65m43.655s 4h 1,796,485,159 1126m10.946s 154m36.229s 5h 2,205,881,698 1359m9.468s 179m0.873s
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


