A compression indexing method and device for a character string sequence

A string and string group technology, applied in the field of data management, can solve the problems such as the decrease of the capacity of the branch node of the coding index, the increase of the number of branch nodes and the search complexity, the excessively long difference prefix length of the underlying leaf nodes, etc., so as to reduce the index The number of nodes, reducing the complexity of index search, and improving the effect of capacity

Active Publication Date: 2022-03-29
HUAWEI TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Embodiments of the present invention provide a method and device for compressing indexing of character string sequences to solve the problem that in the existing CS-Prefix-Tree encoding and indexing process, there is an excessively long difference prefix length in the underlying leaf nodes, which leads to the accommodation of encoding index branch nodes. Decreased capacity, increasing the number of branch nodes and finding complex problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A compression indexing method and device for a character string sequence
  • A compression indexing method and device for a character string sequence
  • A compression indexing method and device for a character string sequence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0102] image 3 A flow chart of a compression indexing method for a character string sequence provided by an embodiment of the present invention, consisting of figure 2 The compressed indexing device 10 shown performs, as image 3 As shown, the compression indexing method of the string sequence may include the following steps:

[0103] S101: Obtain a character string sequence, where the character string sequence includes more than one character string arranged in an orderly manner.

[0104] Optionally, the string sequence can be read directly from the columnar database.

[0105] It should be noted that more than one character strings arranged in an orderly manner can be arranged in ascending order of the dictionary, or in descending order of the dictionary. This embodiment of the present invention does not limit this, and the present invention only takes the sequence of character strings arranged in ascending order of the dictionary as an example. The compressed index meth...

Embodiment 2

[0193] Figure 8 A structural diagram of a compression indexing device 20 provided for an embodiment of the present invention, used to implement the method described in Embodiment 1, as Figure 8 As shown, the device may include:

[0194] The acquiring unit 201 is configured to acquire a character string sequence, where the character string sequence includes more than one character string arranged in an orderly manner.

[0195] The grouping unit 202 is configured to perform grouping processing on the character string sequence according to the difference prefix length of each character string in the character string sequence acquired by the acquisition unit 201, and obtain M character string groups, so that each character The difference prefix length of the first character string in the string group is the shortest within the preset string range, wherein, the M is an integer greater than or equal to 1, each character string group contains at least one character string, and eac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method and device for compressing an index of a character string sequence, relating to the technical field of data management, solving the problem that in the existing CS‑Prefix‑Tree encoding index process, the underlying leaf nodes have an excessively long difference prefix length, which leads to the capacity of encoding index branch nodes Descending, increasing the number of branch nodes and finding the complexity of the problem. The method includes: grouping the string sequence according to the difference prefix length of each string in the string sequence, and obtaining M string groups, so that the difference prefix length of the first string in each string group It is the shortest (S102) within the preset character string range, and the M character string groups are stored in N memory pages in turn (S103), and the jump table index is constructed according to the index keys of the N memory pages ( S104).

Description

technical field [0001] The invention relates to the technical field of data management, in particular to a compression indexing method and device for character string sequences. Background technique [0002] As databases are widely used in various fields of social production, the scale and attributes of database records are becoming more and more complex. Under this premise, the advantages of priority storage (referred to as "column storage") are becoming more and more prominent. Among them, when column storage is used, in order to reduce storage overhead, dictionary encoding can be used to store data. At present, people usually use the CS-Prefix Tree (cache-aware prefix tree) order-preserving compression index mechanism proposed by Carsten Binnig et al. in 2009 to support the query of the compressed dictionary without decompression. [0003] Such as figure 1 As shown, CS-Prefix-Tree consists of two parts: shared leaves (Shared leaves) and encoding index (Encodeindex). Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/901G06F16/903H03M7/30
CPCH03M7/30G06F16/00
Inventor 魏建生朱俊华
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products