Associated data compressing method friendly to query

A technology of associated data and compression methods, applied in the field of big data, can solve problems such as aggravating performance problems and reducing query efficiency, and achieve the effect of improving the compression rate

Active Publication Date: 2017-05-24
WUHAN UNIV OF SCI & TECH +1
5 Cites 5 Cited by

AI-Extracted Technical Summary

Problems solved by technology

Although more and more storage media can be used to store increasingly large linked data sets, large data sets not only lead to...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention relates to an associated data compressing method friendly to query. The method comprises the following steps: defining a relation mining rule, and mining a potential incidence relation in a triad; defining a compression query memory model which consists of a subject vector, a predicate vector and an object matrix; defining a serialization mode of the compression query memory model, and implementing serialization and deserialization by using three auxiliary symbols; defining a query mode of executing SPARQL on the compression query memory model, querying a subject and a predicate by using a binary search method, and querying an object by using a linear traverse method; and defining a scheme for solving slow query caused by the over-large object matrix, and dividing a large data block into a plurality of small data blocks. Compared with most of existing compression schemes, an associated data set processed by the method has the characteristics that the compression ratio is increased, and SPARQL query operation can be carried out directly under the compression state.

Application Domain

Technology Topic

SerializationCompression ratio +4

Image

  • Associated data compressing method friendly to query
  • Associated data compressing method friendly to query
  • Associated data compressing method friendly to query

Examples

  • Experimental program(1)

Example Embodiment

[0047] The technical solution of the present invention will be described in detail below with reference to the drawings and embodiments.
[0048] The technical solution provided by the present invention is an associated data set compression algorithm based on a relation matrix, which specifically includes the following steps:
[0049] 1. Define the triplet memory model, including three data segments of subject S, predicate P and object O;
[0050] 2. Enter the associated data in N-Triple format and parse it to obtain a set of triples;
[0051] The detailed process is as follows:
[0052] 2.1. Filter out lines starting with "#" or blank lines;
[0053] 2.2. Read each line of data and divide the string by spaces;
[0054] 2.3. Map the segmented data to the subject, predicate, and object of the triple to form a triple;
[0055] 3. Construct a dictionary and IDize the triples;
[0056] The detailed process is as follows:
[0057] 3.1. Flatten the triples obtained in the previous step to remove duplicate data items;
[0058] 3.2. Assign a unique ID to each item of data to get Dictionary;
[0059] 3.3. Extract the same header information for each item of data in the Dictionary to get Header;
[0060] 3.4. Replace the original triple data with ID to obtain an IDized triple set;
[0061] 4. The first step of relation mining is to merge the triples with the same subject and predicate. figure 1 In Step1, refer to the attached image 3 Middle Rule1;
[0062] 5. The second step of relationship mining is to classify all triples according to the subject, merge all the predicates and objects of the same subject to form a predicate vector and an object vector, extract the predicate vector of each subject, and sort the predicate vector. Reference attached figure 1 In Step2, refer to the attached image 3 Middle Rule2;
[0063] 6. The third step of relation mining is to merge the triples with the same predicate (predicate vector) and object. figure 1 In Step3, refer to the attached image 3 Middle Rule3;
[0064] 7. The fourth step of relation mining is to classify all triples according to the predicate (predicate vector), merge the subject and object of the same predicate (predicate vector) to form an internally sorted subject vector and object matrix, and combine such a subject vector The structure composed of, predicate vector and object matrix is ​​called a data block. figure 1 In Step4, refer to the attached image 3 Middle Rule4;
[0065] 8. Extract the subject vector, predicate vector and object matrix of each data block and establish a compressed query memory model. Refer to the attached file for the compressed query memory model figure 2;
[0066] 9. SPARQL query mode in compressed state, concurrent query operations can be performed on all data blocks;
[0067] The detailed process is as follows:
[0068] 9.1. Subject query, traverse the subject vector of all data blocks, because the subject vector is internally sorted and the binary search method is used, so the time complexity of subject query is O(log2n);
[0069] 9.2. Predicate query, traverse the predicate vectors of all data blocks, because the predicate vector is internally sorted and the binary search method is used, so the time complexity of the predicate query is O(log2n);
[0070] 9.3. Object query, traverse the object matrix of all data blocks. Because the object matrix is ​​not sorted inside, it can only be searched sequentially. The time complexity is O(n). For data blocks with a particularly large object matrix, data can be divided into blocks. Reduce the time overhead caused by linear traversal search, refer to the attachment Figure 4;
[0071] 9.4. Complex queries, all complex queries can be decomposed into the first three simple queries, and then the results of the simple queries are combined.
[0072] 10. Write into the file serially, set the storage length of each ID to be the same, use auxiliary symbols "|" (or identifier), "," (subject-predicate object separator) and "/" (data block separator) Realize serialization and deserialization.
[0073] The specific embodiments described herein are merely examples to illustrate the spirit of the present invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the specific embodiments described or use similar alternatives, but they will not deviate from the spirit of the present invention or exceed the definition of the appended claims. Range.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Video encoding method and device based on long-term reference frame

ActiveCN106878750AImprove encoding quality and compressionImprove compression ratioDigital video signal modificationMacroblockVideo encoding
Owner:ALLWINNER TECH CO LTD

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products