Unlock instant, AI-driven research and patent intelligence for your innovation.

Knowledge graph distributed mass data importing method based on load balancing

A load balancing and knowledge graph technology, applied in the field of knowledge graph data import, can solve the problems of difficulty in parallelism, low efficiency, and inability to cope with the demand of massive data blowout, and achieve the effect of improving parallelism and import efficiency.

Pending Publication Date: 2022-04-08
PEKING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] For the first Janusgraph data import solution, there will be a problem of extremely low efficiency. It may even take several days to import billion-level data, which cannot meet the demand for massive data blowout in many application scenarios; for the second data import solution , Embedded Janusgraph is to open an embedded JanusGraph graph instance from a JVM-based user application
In this case, JanusGraph is part of the user application. Once the application is closed, the graph instance will also be deleted, and data persistence cannot be achieved. For the third data import solution, it is mainly for the Janusgraph storage backend Cassandra to write in batches. However, the storage backend supported by Janusgraph also supports HBase, BerkleyDB, etc. in addition to Cassandra. HBase, as the top project of Apache, provides high reliability, high performance, column storage, and scalability for massive data storage. The basis of scaling and real-time reading and writing has a wide range of applications, but the third solution cannot support all storage backends such as HBase
And the above three schemes do not take into account that the super node cannot be split and it is difficult to parallelize, thereby reducing the impact of data import performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Knowledge graph distributed mass data importing method based on load balancing
  • Knowledge graph distributed mass data importing method based on load balancing
  • Knowledge graph distributed mass data importing method based on load balancing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042]The present invention will be further described below in conjunction with specific embodiments and accompanying drawings.

[0043] figure 1 It shows the flow chart of the load-balancing-based knowledge map distributed massive data import method according to the present invention, including the following steps:

[0044] S1. Build a Spark distributed computing cluster. Specific steps include:

[0045] S11, installing images of databases such as Hbase, ES, Spark, Janusgraph, HDFS, yarn;

[0046] S12. Create a docker network, and all containers in the distributed cluster run under the same docker network;

[0047] S13. Write the docker-compose.yml file, configure the dependency order of container startup, and build the entire service through docker-compose technology.

[0048] S2. Solve the jar package dependency conflict and version conflict between Janusgraph and Spark, and use the SparkGraphComputer interface for connection testing. The specific method is:

[0049] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a knowledge graph distributed mass data import method based on load balancing, and belongs to the technical field of knowledge graph data import, the method comprises the following steps: S1, building a Spark distributed computing cluster; s2, solving a jar packet dependency conflict and a version conflict between the Janusgraph and the Spark, and carrying out a connection test by using a SparkGraphCompute interface; s3, investigating the format of an input file which is processed by the Spark in batch graph data import operation, and generating data according to the file format; s4, adjusting resource distribution of worker nodes of the Spark cluster according to the data import integrity and the data import speed, and optimizing the import speed; and S5, carrying out segmentation and load balancing on the super nodes in the data, and accelerating the importing of the super node data. According to the method provided by the invention, the data import efficiency can be greatly improved through distributed calculation, parallelization of super node data import is realized through a load balancing method based on node segmentation, and finally efficient import of the super node data is achieved.

Description

technical field [0001] The invention belongs to the technical field of knowledge graph data import, and in particular relates to a method for importing distributed mass data of knowledge graphs based on load balancing. Background technique [0002] A graph database (GDB) is a database that uses a graph structure for semantic queries, and it uses nodes, edges, and attributes to represent and store data. The key concept of the system is the graph, which directly associates data items in storage with collections of data nodes and edges representing relationships between nodes. A graph database is a non-relational database to address the limitations of existing relational databases. Especially for relational databases, complex multi-table joint query operations are required in most application scenarios (for example, searching for friends of friends of users). A large number of multi-table joint query operations, when there are many records in the table, the calculation amount...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F8/61G06F16/182G06F11/36G06F9/50
Inventor 王亚沙赵俊峰徐涌鑫杨恺单中原王子健尹思菁
Owner PEKING UNIV