Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

String suffix array construction method on basis of radix sorting

A radix sorting, suffix array technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of large space complexity, limited application, slow running speed, etc., to achieve fast running speed, easy to implement, Small space consumption effect

Inactive Publication Date: 2011-05-25
农革
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing linear time suffix array construction algorithms have the disadvantages of slow running speed and large space complexity [3, 4, 5, 7, 8], which limits their application in practice

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • String suffix array construction method on basis of radix sorting
  • String suffix array construction method on basis of radix sorting

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] The present invention will be further elaborated below in conjunction with the accompanying drawings.

[0064] Such as figure 1 As shown, the pseudocode of each step in the flow chart of the radix sort-based string suffix array construction method of the present invention is given as follows, wherein the elements of each array are stored from left to right, that is, the first element is in far left, and the last element is far right.

[0065] According to the definition of d-substring, we know that the length of each d-substring is fixed to d+2 characters, where d≥2, therefore, when sorting all fixed-length d-substrings in S, we can use Simple and fast radix sort algorithm. This feature is a unique advantage of the method of the present invention compared to other linear time suffix array construction algorithms.

[0066] SA-IS (S, SA)

[0067] S: input string; (length is n characters, including n1 d-substrings)

[0068] SA: suffix array of S;

[0069] S1: integer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The utility model discloses a string suffix array construction method on the basis of radix sorting, which comprises the following steps of: (1) scanning a string S from right to left, comparing two adjacent characters S and S<i+1> which are scanned currently to obtain the type of each character and each suffix and carrying out recording by an array t; (2) scanning an array t from left to right, searching the positions at which all d-characters appear, obtaining initial pointers of all d-substrings and recording the pointer of each d-substring by a d-substring pointer array P1; (3) carrying out radix sorting on all d-weighted substrings in the S by the d-substring pointer array P1, an array B and an array SA; (4) renaming each d-weighted substring in the string S according to a result obtained by sorting in the step (3) to form a shortened novel string S1; (5) if each character of the S1 is unique, sorting each suffix of the S1 to calculate a suffix array SA1 of the S1, or carrying out recursive call on an SA-IS algorithm by using the S1 and the SA1 as input parameters; (6) carrying out induction calculation on the suffix array SA of the S according to the suffix array SA1 of the S1, which is obtained in the step (5); and (7) returning.

Description

technical field [0001] The invention relates to a method for constructing a character string suffix array, in particular to a method for automatically completing the construction of a character string suffix array by using radix sorting in linear time by a computer. Background technique [0002] The string suffix array is a space-saving alternative data structure of the suffix tree. It was first proposed by Manber and Myers in the literature [1, 2], which can realize the algorithm equivalent to the suffix tree in a smaller space. Suffix arrays are used extensively in applications such as data indexing and pattern matching. This paper invents a new suffix array construction algorithm, which uses radix sorting and "slice-merge" method to construct its suffix array for any given string in linear time. [0003] The following terms are used in this presentation: [0004] Character set A character set ∑ is a set that establishes a total order relationship, that is, any two diffe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 农革
Owner 农革
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products