A parallel suffix sorting method and system

A sorting method and suffix technology, applied in the field of data processing, can solve the problems that the computer cannot exert full performance and the running speed of the serial IS algorithm is low, and achieve the effect of high speedup ratio, increased running speed and high parallelism.

Active Publication Date: 2019-02-22
SUN YAT SEN UNIV
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the problem that the running speed of the serial IS algorithm in the prior art is relatively low, and the computer cannot exert all the performance, the present invention provides a parallel suffix sorting method and system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A parallel suffix sorting method and system
  • A parallel suffix sorting method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] Such as figure 1 As shown, a parallel suffix sorting method includes the following steps:

[0040] Step S101: find out the LMS substring in the character string X, the specific implementation is as follows:

[0041] (1) The last element of the string X is an additional "$", which is the smallest character in the string. Define X[i]X[i+1], then suffix(X,i) is L type; when X [i]=X[i+1], then suffix(X,i) is of the same type as suffix(X,i+1). Use the L / S suffix recognizer to scan the string X from right to left, and store the result in an array t of length n.

[0042] (2) Simultaneously count the size of each bucket and the number of L-type and S-type suffixes of each bucket during the scanning process. Use the array bucket to record the number of occurrences of each character in the string X. Traverse the string X from left to right, and add one to bucket[X[i]] every time a character is traversed. Traverse the bucket array from left to right, set bucket[i] += bucket[...

Embodiment 2

[0063] Such as figure 2 As shown, a parallel suffix sorting system includes a front unit, an analysis unit, and a storage unit; the front unit is used to perform steps S101 to S102; the analysis unit is used to perform steps S103 to S111; The storage unit described above is responsible for storing temporary data generated by multi-thread parallel inductive sorting.

[0064] The front unit includes a decision subunit, an LMS substring calculation subunit and an SA block subunit;

[0065] The decision-making subunit is used to read the string X from the storage unit, use the L / S suffix recognizer to identify the string X, obtain its suffix type array t, count the number of L and S types of each suffix, and Write storage unit; Described LMS substring calculation subunit is used for reading suffix type array t from storage unit, calculates and obtains all LMS characters, then calculates LMS substring position, and writes storage unit; The above SA block subunit is used to divid...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a parallel suffix sorting method and system. For the string X of length n, when its size is much larger than the size of the computer Cache, the method of dividing SA into blocks is used to increase the hit rate of the Cache, reduce the interaction times between the Cache and the memory, and thus greatly reduce the sorting time of the string. The invention utilizes the parallel computing resources of the modern multi-core computer to parallelize the data access operation in the sorting process by using multi-threads, effectively improves the running speed of the algorithm, induces the parallelism of the sorting process to be high, the system can obtain a higher acceleration ratio, and greatly improves the work efficiency.

Description

technical field [0001] The present invention relates to the field of data processing, and more specifically, to a method and system for sorting parallel suffixes. Background technique [0002] The CPU of a modern computer reads and writes data from the memory through the cache, and the data locality of the algorithm has a great impact on the running speed of the algorithm. When performing suffix sorting on large-scale character strings, the serial IS algorithm has poor data locality and long data read and write delays, which reduces the running speed of the algorithm, resulting in the failure of the algorithm performance to be effectively exerted, and greatly reducing the working efficiency of the computer. and increased time costs. Contents of the invention [0003] In order to solve the problem that the running speed of the serial IS algorithm in the prior art is low, and the computer cannot exert all the performance, the invention provides a sorting method and system o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/46G06F16/9032
CPCG06F9/46
Inventor 彭炯瑜解静仪农革
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products