Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Partitioning and parallel distribution processing method of super-large scale RDF graph data

A distributed processing and ultra-large-scale technology, applied in the field of big data processing, can solve the problems of long division time, low division quality, and unbalanced task load, and achieve the effect of fast division speed, improved division quality, and high division quality.

Active Publication Date: 2015-07-29
HUAZHONG UNIV OF SCI & TECH
View PDF6 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the above deficiencies or improvement needs of the prior art, the present invention provides a method and system for the division and parallel distribution processing of ultra-large-scale graph data. And equally divide the super-edge data on the path, so as to take into account the uniformity of data distribution and the balance of task load, and through the use of bit-block transmission and pipeline processing methods, it solves the long division time and the division of existing division methods. Problems with low quality and uneven task load

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Partitioning and parallel distribution processing method of super-large scale RDF graph data
  • Partitioning and parallel distribution processing method of super-large scale RDF graph data
  • Partitioning and parallel distribution processing method of super-large scale RDF graph data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0074] Such as figure 1 As shown, the division and parallel distribution processing method of ultra-large-scale RDF graph data of the present invention comprises the following steps:

[0075] (1) Preprocess the original RDF graph data, generate the corresponding hash dictionary file and shaping three-table data, and convert the shaping three-table data into an association matrix M;

[0076] (2...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a partitioning and parallel distribution processing method of super-large scale RDF graph data. The method comprises the following steps: preprocessing original RDF graph data, generating a corresponding Hash dictionary file and reshaped three-column list data, and converting the reshaped three-column list data into an associated matrix M; creating a hypergraph model of the associated matrix M, wherein the subject, the predicate and the object of the M in the hypergraph model are hyperedges, and data associated with the hyperedges is hyperedge data; judging whether the RDF graph data is a connected graph or an unconnected graph; if it is the unconnected graph, partitioning the unconnected graph into a plurality of connected graphs; on the basis of the hypergraph model and concurrent breadth traversal, equidistantly placing the hyperedge data on a path, classifying and ranking the hyperedge data, uniformly partitioning the hyperedge data into K portions to be put into K slave nodes; and establishing mapping relationships among the hyperedge data and the slave nodes. The partitioning and parallel distribution processing method of the super-large scale RDF graph data is quick in partitioning speed, high in partitioning quality, balance in data and task loads, high in parallelism of query processing, and fast in query processing speed.

Description

technical field [0001] The invention belongs to the field of big data processing, and more specifically relates to a method for dividing and parallel distribution processing of ultra-large-scale RDF graph data. Background technique [0002] Resource Description Framework (RDF) is the core of the entire Semantic Web system structure, and it is widely used to describe various information resources on the Internet. With the continuous growth of RDF data, the processing on a single machine has become incapable, so the RDF data must be divided into multiple machines for processing. [0003] For the division of ultra-large-scale RDF graph data, commonly used methods include heuristic division and parallel hierarchical division. For the heuristic method, an objective function is generally provided, and then the division is carried out around the optimal direction of this function, but the selection of the objective function is more difficult. For parallel hierarchical division, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 袁平鹏金海谢昌凤罗毅
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products