Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Distributed Parallel Construction Method of Universal Suffix Tree

A construction method and distributed technology, applied in the field of distributed parallel construction of general suffix trees, can solve the problems of difficult portability, inability to directly build general suffix trees, and inefficient construction, and achieve good scalability and good portability. Effect

Active Publication Date: 2020-05-01
NANJING UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: In view of the problems and deficiencies in the prior art above, the purpose of the present invention is to provide a method for building a general suffix tree in parallel on a general cluster, which solves the problem that the existing method platform is expensive and difficult to transplant, and the lack of versatility cannot directly Construct a general suffix tree with low scalability and inefficient construction under large-scale sequence data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Distributed Parallel Construction Method of Universal Suffix Tree
  • A Distributed Parallel Construction Method of Universal Suffix Tree
  • A Distributed Parallel Construction Method of Universal Suffix Tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0026] The present invention proposes a fully parallelized general-purpose suffix tree efficient construction method, solves the general-purpose suffix tree construction problem with the method of parallel subtree division and lcp-range multi-way merging to construct subtrees, and designs an I suitable for the present invention / O optimization scheme and load balancing scheme. The present invention is further designed into the above-mentioned mutually independent steps, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed-type parallel construction method of a universal suffix tree. The method includes the following steps that 1, input sequences are integrated and evenly distributed to each calculation node; 2, the frequencies of subsequences are counted in parallel, and all subtree construction tasks are determined; 3, the subtree construction tasks are distributed to the different calculation nodes as evenly as possible according to scales; 4, all subtrees are constructed in batches in several rounds. Each round of batch construction can be divided into the following three steps that 1, parallel scanning input is conducted to locate suffixes needed by the round of the batch construction, the suffixes are ranked separately, and ranking results are summarized to the calculation nodes responsible for constructing the tasks; 2, multi-way merging is conducted to generate global and ordered suffix ranking results; 3, the raking results are adopted to generate corresponding suffix subtrees. By the adoption of the distributed-type parallel construction method of the universal suffix tree, the universal suffix tree can be effectively constructed in parallel, and the problems are solved that a current universal suffix tree construction method relies too much on I / O or main memory capacity and is insufficient in universality, and it is difficult for the current universal suffix tree construction method to cope with large-scale input.

Description

technical field [0001] The invention relates to the technical field of sequence processing and parallel computing, in particular to a distributed parallel construction method of a general suffix tree. Background technique [0002] Sequence is a commonly used form of data organization, and it has a wide range of applications in text processing, time series analysis, biotechnology and other fields. As a powerful data structure for sequence processing, the general suffix tree can efficiently solve many common sequence analysis problems, such as matching, searching, frequent pattern mining, etc. However, the construction process of the general suffix tree is very complicated. In the past related work, there is no method to directly construct the general suffix tree. , transformed into a general suffix tree. The predecessors proposed a stand-alone suffix tree construction method represented by the Ukkonen method. Although these methods can be very efficient in theory, it has be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
CPCG06F9/5088
Inventor 顾荣黄宜华郭晨朱光辉
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products