Parallel community discovery method and device

A community discovery and social network technology, applied in the field of community discovery solutions, it can solve problems such as computing bottlenecks, and achieve the effects of high fault tolerance, improved stability, and good adaptability

Active Publication Date: 2014-10-01
ZTE CORP
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to provide a parallel community discovery method and device to overcome the computational bottleneck problem existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel community discovery method and device
  • Parallel community discovery method and device
  • Parallel community discovery method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] An embodiment of the present invention provides a community discovery method including:

[0059] Step 1) The input module reads the original text file of social network data from HDFS (Hadoop Distributed File System), models it as an authorized undirected graph model or an unweighted undirected graph, and converts the graph similarity matrix S( Adjacency matrix W) distributed storage on HDFS;

[0060] Step 2) The Laplacian generation module of the community discovery system, the degree matrix D and the Laplacian matrix L of the adjacency matrix of the calculation graph are calculated on the computing cluster configured with the Hadoop environment sym =I-D -1 / 2 SD -1 / 2 ;

[0061] Step 3) The eigendecomposition module of the community discovery system uses the Haoop framework to solve the parallel Lanczos numerical values ​​of the eigenvalues ​​and eigenvectors of the Laplacian matrix, and obtains the first K largest eigenvalues ​​I=λ of the matrix 1 ≥λ 2 ≥…≥λ K , a...

Embodiment 2

[0127] This embodiment provides a system for implementing community discovery, including:

[0128] The input module reads in the original social network data, converts it into the form of adjacency matrix and stores it on the HDFS file system;

[0129] Specifically, the input module reads in the original social network data, models it as a weighted undirected graph model or an unweighted undirected graph model, and distributes and stores the adjacency matrix W of the graph obtained after modeling on HDFS.

[0130] Among them, the original social network data is a text file stored on the HDFS file system, and the format of each line is "username 1 username 2 relationship weight", indicating the relationship strength between two users; or stored in the HDFS file Sequence files on the system; or relational data stored in the database of the Hadoop platform.

[0131] The Laplacian generation module calculates the degree matrix D and the Laplacian matrix L of the adjacency matrix ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parallel community discovery method and device, and relates to the field of data mining. The method disclosed by the invention comprises the following steps: reading original social network data, converting the original social network data into an adjacency matrix way, and storing the original social network data on a HDFS (Hadoop distributed file system); calculating a stiffness matrix D and a Laplacian matrix of the adjacency matrix of a picture stored on the HDFS on a computing cluster configured with a Hadoop environment; carrying out the parallel Lanczos numerical value solving of a characteristic valve and a characteristic vector to the Laplacian matrix to obtain the characteristic vectors corresponding to the first K maximum characteristic values of the matrix; constructing a characteristic vector matrix for normalizing to obtain a standardized characteristic vector matrix and extract characteristics; taking each line as a point, and clustering the points into K types by a clustering method; according to the corresponding relationship of the points, equivalently dividing individuals in an original community into K types to finish the classification of the communities. The invention also discloses a parallel community discovery device. A technical scheme of the invention exhibits good adaptability to large-scale data.

Description

technical field [0001] The invention relates to the field of data mining, in particular to the parallel calculation of large-scale data and the community discovery scheme in the social network. Background technique [0002] A social network consists of relationships between individuals. Individuals usually include individuals, organizations, and other social entities, and can also represent web pages, blogs, mailboxes, text messages, papers, and locations, etc.; social relationships generally include friends, relatives, and classmates, and can also represent clicks, attention, sending messages, and Citations and other behaviors. [0003] There is a community structure in a social network, and the relationship between individuals within the community is close, while the relationship between communities is not close. Community identification (also known as community discovery) is to detect and identify these communities. Community discovery can be used as the basis of targe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 陆平罗圣美胡磊王桥林云龙邹俊洋钟齐炜陆建
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products