An index data compression method of a web diagram

A technology of indexing data and compression method, applied in the field of big data processing, can solve the problems of large memory usage and large index ratio, and achieve the effects of improving cache hit rate, improving compressibility, and improving decompression speed

Active Publication Date: 2019-01-22
HUAZHONG UNIV OF SCI & TECH
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the defects of the prior art, the purpose of the present invention is to solve the technical problem of excessive memory usage in the parallel memory web graph processing scenario caused by the relatively large index ratio in the compressed web graph in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An index data compression method of a web diagram
  • An index data compression method of a web diagram
  • An index data compression method of a web diagram

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0027] A web graph (webgraph) is a large-scale graph composed of web pages and hyperlinks between them, which describes the link relationship between web pages on the World Wide Web. Web graphs have many practical applications, for example as follows: search engines use web graphs to calculate the PageRank value of each web page, and then determine the sequence of different web pages when displaying relevant search results for users according to the size of the PageRank value; in web page content analysis In , web graphs are used to detect similar topics; in the HITS algorithm (also a webpage va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an index data compression method of a web diagram. After dividing the index data (degrees and displacements) into blocks containing hundreds to thousands of nodes, then most ofthe blocks contain only low degree nodes, and the degree and displacement differences in these blocks can usually be stored in one or two bytes, and the compression rate of the index data can reach more than 50% (from the previous four bytes to one or two bytes), which improves the compressibility; The degree and displacement codewords of the same node are stored in the same cache line with highprobability through the cross storage of degree and displacement codewords, so the cache hit rate can be greatly improved; Through the fixed-length encoding to achieve real random access, the fixed-length encoding makes the index of the node compressed data be calculated, and according to the subscript for real random access, the decompression speed is improved.

Description

technical field [0001] The invention belongs to the field of big data processing, and more specifically relates to a method for compressing index data of web graphs. Background technique [0002] A web graph is a large-scale graph composed of web pages and hyperlinks between them. It describes the link relationship between web pages on the World Wide Web. It has the following characteristics: First, the scale of the web graph is very large. Every crawlable web page on the World Wide Web may be a node in the web graph, and each hyperlink on each web page may be an edge in the web graph. From this, it can be imagined that the scale of the web graph is very large; secondly, the web graph is very sparse, that is, the average degree of nodes in the web graph is relatively low; finally, the degree of the web graph is distributed in a power law. On the one hand, the single-machine memory space is limited, and the scale of the web graph grows rapidly, which limits the scope of use ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/958G06F16/901
Inventor 王芳冯丹张永选
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products