Parallel overlapping community discovery method based on label propagation under Spark

A technology of label propagation and overlapping communities, applied in the field of parallel overlapping community discovery, can solve problems such as not being ideal and increasing computing overhead, and achieve the effects of reducing the number of labels, improving efficiency, and improving accuracy

Inactive Publication Date: 2017-07-28
NANJING UNIV OF INFORMATION SCI & TECH
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although many scholars have optimized and improved different problems to improve the stability and accuracy of label propagation to a cert

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel overlapping community discovery method based on label propagation under Spark
  • Parallel overlapping community discovery method based on label propagation under Spark
  • Parallel overlapping community discovery method based on label propagation under Spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Below in conjunction with accompanying drawing, the implementation of technical scheme is described in further detail:

[0031] The method for discovering parallel overlapping communities based on label propagation under Spark according to the present invention will be further described in detail in conjunction with the flow chart and the implementation case.

[0032] In this implementation case, under the Spark framework, the complete subgraph is used to reduce the initialization labels, improve the efficiency of algorithm execution, improve the method of label selection, and then improve the accuracy of the algorithm. Such as figure 1 As shown, this method includes the following steps:

[0033] Step 10, design the map and reduce functions from the network data set, where the map function maps an edge into a binary group (a, b), which means that there is an edge between node a and node b; the reduce function converts the binary The first element of the group is used ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a parallel community discovery method based on label propagation under Spark, and relates to the field of data mining. In the invention, a complete subgraph is found in a network, and nodes in the complete subgraph are given the same label, so as to reduce the shortcomings of too many labels in an initial stage and improve the executing efficiency of an algorithm; then, according to the weight of the nodes, the propagation probability of the nodes in the network is calculated, the label propagation probability and the similarity among the nodes are considered in a label selection stage, and the accuracy of the label selection stage is improved; and the whole algorithm is executed in the framework of Spark, a good scalability is gained for the massive data, the executing efficiency and accuracy of the invention are both improved significantly, and the quality of community discovery is also greatly improved.

Description

technical field [0001] The invention belongs to the field of data mining, and specifically relates to a parallel overlapping community discovery method for mining communities in a network by utilizing label propagation ideas. Background technique [0002] With the rapid development of the Internet, the social network quickly enters people's lives, which leads to a large increase in the amount of online personal information, and has caused researchers to pay great attention to it. In simple terms, what social networks accomplish is to transfer part of people's daily life to online platforms. In social networks, users can make new friends, exchange ideas, share interesting things they have encountered, and so on. These personal information include their activities, connections with individuals or groups, and their opinions and ideas. With the emergence of online social networks and their rapid popularity, such as Sina Weibo, WeChat Moments, Facebook, Twitter, etc. The growin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06Q50/00
CPCG06Q50/01
Inventor 马廷淮岳明亮薛羽曹杰
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products