Overlapping community parallel discovery method of memory iteration on basis of spark platform

A technology of overlapping communities and discovery methods, applied in website content management, instrumentation, and other database retrieval, etc., can solve problems such as not supporting memory iterative calculations, not suitable for describing complex data processing processes, etc., and achieve the effect of increasing speed

Inactive Publication Date: 2015-11-18
SHANDONG UNIV +1
View PDF7 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Hadoop's MapReduce model is not suitable for describing complex data processing
Secondly, hadoop does not support memory iterative calculation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Overlapping community parallel discovery method of memory iteration on basis of spark platform
  • Overlapping community parallel discovery method of memory iteration on basis of spark platform
  • Overlapping community parallel discovery method of memory iteration on basis of spark platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0061] figure 1 It is an algorithm flow chart of the present invention. In combination with the flow chart, the implementation of the algorithm and specific details will be further described below.

[0062] An overlapping community parallel discovery algorithm based on spark platform memory iteration, the steps of the method are as follows:

[0063] Step (1): read the original community network data through graphx on the computing cluster configured with the spark environment, and construct a graph

[0064] Step (2): Through graphx, calculate the neighbor node set of each vertex of the graph in parallel, and use it as the attribute of each vertex of the graph

[0065] Step (3): Each edge is initially a community, and the similarity between all two edges with common vertices in the graph is calculated,

[0066] Step (4): Find the two communities with ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an overlapping community parallel discovery method of memory iteration on the basis of a spark platform. The method comprises the following steps of: reading original community network data via GraphX on a calculation cluster configured with spark environment, and building a graph instance; parallelly calculating a neighbor node set of each vertex in the graph instance through the GraphX, and using the neighbor node set as the attribute of each vertex in the graph instance; initializing each edge of the graph instance as one community, and calculating the similarity between every two edges with a common vertex in the graph instance according to the neighbor node sets of the graph instance; finding two communities with the maximum similarity, and merging the two communities into one new community; updating the community similarity set; using a division density formula to calculate the division quality of the community division in the current time; judging whether the current community number is greater than 1 or equal to 1; and obtaining the community division with the highest division quality if the current community number is equal to 1.

Description

technical field [0001] The invention is used to discover community results in a network, and in particular relates to a parallel discovery method for overlapping communities based on spark platform memory iteration. Background technique [0002] A complex network is an abstraction of a replicated system. In reality, many complex systems can be described and analyzed using the relevant characteristics of a complex network. Nodes in the network represent individuals in the system, and edges represent the relationship between individuals, such as social network, power grid, etc. [0003] A community is a subgraph of a complex network. The nodes in the same community are closely connected, while the connections between communities are relatively sparse. [0004] The so-called community discovery is to divide the graph into a collection of a certain number of communities. If the intersection of the vertex sets of any two communities is empty, the set is called a non-overlapping...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/958
Inventor 郭山清鲁宗飞崔立真许信顺刘士军王昌圆杨伯宇陶立冬田燕琛李文哲
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products