Method and system for block-based inductive sorting of text suffix index

A sorting method and index technology, applied in the field of data processing, can solve the problems of poor data locality in the induction sorting process, and achieve the effects of improving data locality, better resource utilization, and high time efficiency

Active Publication Date: 2021-11-05
SUN YAT SEN UNIV
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the embodiment of the present application provides a block-based inductive sorting method and system for a text suffix index to solve the problem of poor data locality in the inductive sorting process in the prior art when constructing a text suffix index

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for block-based inductive sorting of text suffix index
  • Method and system for block-based inductive sorting of text suffix index
  • Method and system for block-based inductive sorting of text suffix index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In the following description, specific details such as specific system structures and techniques are proposed for explanation, such as specific details, such as specific system structures, and techniques. However, it will be apparent to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, a detailed description of well-known systems, devices, circuits, and methods is omitted to prevent unnecessary details to prevent the description of the present application.

[0031] The technical solution of the present application will be described below by way of example.

[0032] First, the technical terms that may be used in this application will be described herein.

[0033] String: A string x of a length N is the character array X [0 ... N-1] of the N-belonging to its character set σ, in order, in order, in which X [N-1] is A character $ with a smallest dictionary only in X.

[0034] Subtri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the present application is applicable to the technical field of data processing, and provides a block-by-block induction sorting method and system for a text suffix index. The method includes: for any character string, determining multiple substrings or multiple suffixes of the character string, And store multiple substrings or multiple suffixes in multiple preset data blocks; scan each data block in a preset order; for any scanned current data block, sort the current data block according to a preset stable sorting method Sort each substring or each suffix in the data block; scan each substring or each suffix in the current data block according to the preset order; for any scanned target substring or target suffix, determine the specific type of the target substring The target data block to which the specific type of the prefix and suffix of the preceding substring or the target suffix belongs; write the preceding substring or the preceding and suffix into the target data block to which it belongs. This embodiment can solve the problem of poor data locality in the inductive sorting process when constructing the text suffix index.

Description

Technical field [0001] This application belongs to the technical field of data processing, in particular to a block summary method and system for a text suffix index. Background technique [0002] In data processing technology, the suffix array, sa) is an important and widely used data structure. It is a compact replacement of the suffix tree, which can be used for data compression, genome comparison, full-text search, etc. Field. With the development of Internet technologies, the world is produced in massive multi-source heterogeneous data. In order to effectively manage and retrieve this data, it is urgent to construct a full-text index that can be efficiently constructed, and the suffix index, that is, an array is a full-text index available for this scene. [0003] The suffix index of multi-source heterogeneous data is consistent with the principle of constructing the suffix index (or suffix array) of the texture, and the latter has always been a key topic of researchers expl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/166G06F40/131
Inventor 解静仪农革
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products