Large-scale social network service-oriented graph data storage and query method

A social network and data storage technology, applied in the direction of memory address/allocation/relocation, etc., can solve the problem that the server program cannot proxy different map data requests, cannot effectively support graph calculations, and graph databases are difficult to access and update graph data, etc. question

Active Publication Date: 2015-09-09
INST OF INFORMATION ENG CHINESE ACAD OF SCI +1
View PDF4 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The data division of Redis is completed by the client, and the server program cannot proxy the request of different map data
[0009] From the above research work, it can be seen that the data access requirements of graph computing and the characteristics of graph data pose challenges to the storage of graph data. Traditional data storage methods such as file systems or distributed file systems and graph databases with Key-Value storage models are difficult. Efficient support for graph data access and update, and thus cannot effectively support graph calculations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale social network service-oriented graph data storage and query method
  • Large-scale social network service-oriented graph data storage and query method
  • Large-scale social network service-oriented graph data storage and query method

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0063] Example 1 data access instance

[0064] The present invention has tested the writing performance of the present invention and comparative system Redis and Neo4j under different concurrency, and the writing performance has taken the average value under ten data sets, wherein in the test image 3 Among them, the present invention is represented by NYNN.

[0065] Depend on image 3 It can be seen that the local writing performance of the present invention is obviously better than Redis and Neo4j, and the design of metadata, the data transmission format and the mechanism of data writing affect the writing performance. First of all, the division of graph data in the present invention adopts continuous equal-width vertex intervals, and the metadata can be cached locally through the first data write request, and then the metadata cached locally can be used for data addressing, which reduces the addressing IO overhead. When writing data, write it to the local memory-mapped f...

example 2

[0074] Instance 2 data update instance

[0075] This paper tests the data updating performance of the present invention and comparative systems Redis and Neo4j, and the performance takes the average value under ten data sets.

[0076] Such as Figure 8 As shown, for the incremental update of data, the present invention adopts a chained distribution method, which can eliminate the problem of frequent data movement caused by data insertion and deletion. However, Redis and Neo4j adopt the method of sequential allocation. As the data is updated, the data needs to be moved and relocated. During this process, new query processing cannot be performed.

example 3

[0077] Example 3 Data Remote Access Example

[0078] This paper tests the performance difference of the present invention with and without prefetch mechanism. The present invention provides multiple data prefetch mechanisms, which is convenient for the upper layer application to select the best data prefetch mode based on its own application background. This experiment uses BFS prefetching. The algorithm used in the experiment is to count the total number of edges in the graph data set. The idea of ​​the algorithm is: divide the vertices of the graph into n slices, and each slice specifies a thread for processing. All threads Parallel access to graph data in the present invention. The thread puts the vertices to be visited in the access request into the queue, then processes the vertices at the head of the queue, visits the neighborhood of the vertices, and counts the number of vertices and edges. If the end point of the current edge is located in the shard to which the thre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large-scale social network service-oriented graph data storage and query method. A data storage manager stores received graph data in a Key-Value way, wherein the vertex ID (Identity) of the graph data is taken as the Key, a vertex neighbourhood is taken as the Value, and the data of each vertex neighbourhood is stored; and a plurality of edges connected with the vertex neighbourhood are orderly stored in a memory block with a fixed size via a timestamp, a double linked list is formed, and the attribution information and the index information of the vertex are stored in a data structure. When the data storage manager receives an access request that the vertex v is accessed, the data storage manager transmits the vertex v and a k-order neighbourhood of the vertex v to a requester; and the requester caches stored data locally, firstly checks local cache during next query, and sends the access request to the data storage manager if no queried vertexes exist. The graph data storage and query method can meet a scene which is dynamically updated and is suitable for processing data sparseness, and random access.

Description

technical field [0001] The invention relates to a large-scale social network-oriented graph data storage and query method, which belongs to the field of software technology. Background technique [0002] At present, the mainstream method of graph data storage is to preprocess the graph data into records of edges and vertices, and store them in large files in the distributed file system in the form of sequential data sets. When accessing graph data, the large files storing the graph data are accessed in a sequential scan. This organization method cannot provide effective data storage and access performance for multi-round iterative graph computing applications. In order to improve the access performance of graph data, the memory management technology of graph data has become an important trend, such as Trinity, Giraph, etc. [0003] Neo4j is a graph database using the Key-Value storage model. The most basic storage units are vertices and edges. When the breadth traverses BFS...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F12/06
Inventor 周薇包秀国马宏远程工冉攀峰刘春阳王卿韩冀中庞琳李雄贺敏刘玮
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products