The invention discloses a partitioning and parallel distribution
processing method of super-large scale
RDF graph data. The method comprises the following steps: preprocessing original
RDF graph data, generating a corresponding Hash dictionary file and reshaped three-column
list data, and converting the reshaped three-column
list data into an associated matrix M; creating a
hypergraph model of the associated matrix M, wherein the subject, the predicate and the object of the M in the
hypergraph model are hyperedges, and data associated with the hyperedges is hyperedge data; judging whether the
RDF graph data is a connected graph or an unconnected graph; if it is the unconnected graph, partitioning the unconnected graph into a plurality of connected graphs; on the basis of the
hypergraph model and concurrent breadth traversal, equidistantly placing the hyperedge data on a path, classifying and
ranking the hyperedge data, uniformly partitioning the hyperedge data into K portions to be put into K slave nodes; and establishing mapping relationships among the hyperedge data and the slave nodes. The partitioning and parallel distribution
processing method of the super-large scale
RDF graph data is quick in partitioning speed, high in partitioning quality, balance in data and task loads, high in parallelism of query
processing, and fast in query processing speed.