Spark-based diversity graph sorting method for large-scale graph data

A variety and graph sorting technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of inability to effectively process large-scale graph data, and meet the requirements of fast processing, intuitive models, and good scalability The effect of sex and efficiency

Active Publication Date: 2017-02-01
YUNNAN UNIV
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to overcome the deficiencies in the prior art, provide a kind of diversity graph sorting method based on Spark's large-scale graph data, overcome the defect and deficiency that existing diversity graph sorting technology can't effectively process large-scale graph data , to provide technical support for the diversity graph sorting and application of large-scale graph data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark-based diversity graph sorting method for large-scale graph data
  • Spark-based diversity graph sorting method for large-scale graph data
  • Spark-based diversity graph sorting method for large-scale graph data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0040] A kind of diversity graph sorting method based on Spark's large-scale graph data of the present invention comprises two major steps: (1) calculation preparation part, its main function is: first, carry out personalized PageRank, obtain relevant node set and node The personalized PageRank value (abbreviated as ppr) of the individualized PageRank value (abbreviated as ppr). Secondly, the collection of neighbor information of the nodes on the graph is completed, which lays the foundation for the calculation of the distance between nodes; (2) the calculation implementation part, whose main function is based on the ppr between nodes The weighted distance value is used to obtain the top-k node sorting result of fusion correlation and diversity through k iterations.

[0041] The present invention is described in detail below in conjunction with example, as figure 1 As shown, it specifically includes the following steps:

[0042] (1) Obtain the query-related node set of person...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Spark-based diversity graph sorting method for large-scale graph data. The diversity graph sorting of the graph data is carried out in combination with a classic personalized Page Rank algorithm and a distance-based diversity measurement method by taking the diversity graph sorting of the large-scale graph data as a goal and taking a method for measuring a distance between nodes in the graph data as a basis. The method has the advantages of expandability, higher efficiency and the like, meets the data storage and calculation requirements of the diversity graph sorting of the massive graph data, and provides a powerful technical support for to-be-solved key problems in massive graph data analysis processing and mining, and the like.

Description

technical field [0001] The invention belongs to the technical field of data mining and information retrieval, and more specifically relates to a method for sorting diverse graphs of large-scale graph data based on Spark. Background technique [0002] Ranking is one of the basic tasks of information retrieval, data mining and social network analysis. In the information retrieval system, a better sorting method can ensure that the mining results with high correlation with the user query and low information redundancy are presented in the limited display space, thereby minimizing the user's query abandonment rate, which is very important for improving user efficiency. The experience of information retrieval service is of great significance. [0003] Graph data is composed of a large number of nodes and edges representing the relationship between nodes. Due to the lack of explicit order in the graph, graph sorting is particularly critical in the process of graph data analysis a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/24578G06F16/2465
Inventor 李劲岳昆胡矿王钰杰高仁尚
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products