Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for constructing suffix arrays (SAs) in parallel in constant working space

A suffix array and workspace technology, which is applied in the fields of electrical digital data processing, special data processing applications, natural language data processing, etc., can solve the conditions of large memory space, slow running speed, and cannot meet the needs of fast processing of large-scale string data and other problems to achieve the optimal effect of time and space complexity

Inactive Publication Date: 2018-11-06
FOSHAN SHUNDE SUN YAT SEN UNIV RES INST +2
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the explosive growth of data scale, the existing serial methods and systems can no longer meet the fast processing requirements of large-scale string data, that is, the running speed is slow, and it requires a large memory space condition. For some memory It is not applicable to relatively small computer systems. All in all, although the suffix array construction of strings can still be realized, its time and space complexity is at a poor standard

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for constructing suffix arrays (SAs) in parallel in constant working space
  • Method and system for constructing suffix arrays (SAs) in parallel in constant working space

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] Wherein, the following technical terms are used in the description of the present invention, which are explained here:

[0058] Workspace: refers to the remaining part of the total space after the space used by the string X and its suffix array is removed.

[0059] Constant workspace: refers to the remaining part of the total space after the space used by any character string X defined on the constant character set and its suffix array. The suffix array construction algorithm or system constructed according to the space-time complexity is theoretically the best solution achievable.

[0060] Character set: A character set Σ is a set that establishes a total order relationship, that is, any two different elements α and β in Σ can be compared in size, or αβ. The elements in the character set Σ are called characters, and the smallest character is '$'. The size of the character set involved in the present invention can be a constant O(1) or an integer O(n).

[0061] Strin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and system for constructing suffix arrays (SAs) in parallel in a constant working space. The method comprises the steps of obtaining first character pointers of all LMS (LeftMost S-type) substrings in a character string X and recording in an array P1; carrying out parallel inductive sorting in the constant working space on all the LMS substrings by utilizing the P1and an SA; obtaining a character string X1; distinguishing the different construct input parameters of the SA according to the uniqueness of the characters in the X1; and finally carrying out parallel inductive calculating on the SAs of the character string X1 in the constant working space through the corresponding relation between the X1 and the SA1 and storing in the SA. The method disclosed bythe invention has the beneficial effects that the computer memory requirement is reduced; the running speed is higher; the time-space complexity is optimized; and the method is suitable for constructing the SAs of large-scale character strings.

Description

technical field [0001] The invention relates to the field of string suffix array construction, in particular to a method and system for parallel construction of a constant workspace suffix array. Background technique [0002] Suffix Array (Suffix Array, SA) is a space-saving alternative data structure of Suffix Tree (Suffix Tree, ST). algorithm. Suffix arrays are usually used to index strings, which can solve many processing tasks related to strings, and are widely used in applications such as full-text indexing and gene matching. [0003] In recent years, the memory space of general-purpose computers has continued to increase, making it possible to quickly process large-scale text and genetic data on memory models. With the explosive growth of data scale, the existing serial methods and systems can no longer meet the fast processing requirements of large-scale string data, that is, the running speed is slow, and it requires a large memory space condition. For some memory ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/30
CPCG06F40/126
Inventor 劳斌解静仪徐文涛农革
Owner FOSHAN SHUNDE SUN YAT SEN UNIV RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products