Integrated data mining method for similarity of accounts on social network sites

A technology of social networking sites and comprehensive data, which is applied in the field of computer Internet data mining, which can solve the problems that social relationships are not equal to the degree of similarity, and are not suitable for correlation analysis of social networking site accounts.

Active Publication Date: 2015-09-09
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF5 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing social networking site data mining methods cannot be directly applied to the similarity analysis of social networking site accounts. The degree of similarity between social networking site accounts; 2) The similarity between social networking site accounts is a comprehensive indicator af

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Integrated data mining method for similarity of accounts on social network sites
  • Integrated data mining method for similarity of accounts on social network sites
  • Integrated data mining method for similarity of accounts on social network sites

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0142] According to the method provided by the present invention, the Sina Weibo account similarity calculation system has been constructed, and the system selects 27 influencing factors of the Sina Weibo account, including 14 personal attribute factors, 12 interactive behavior factors, and 1 content factor. The calculation of the similarity of influencing factors determines the comprehensive similarity of Weibo accounts. The above-mentioned system was used to conduct comprehensive similarity automatic detection on more than 400,000 Sina Weibo accounts randomly selected.

[0143] First, input 500 Sina Weibo account training samples into the system, each sample contains all the information of the two accounts and the comprehensive similarity Y of the two accounts of the sample l , where l=1,2,...,500. The following method is used to determine the weight value of each factor that affects the comprehensive similarity:

[0144] Step 1: Input 500 training samples;

[0145] Step ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses an integrated data mining method for similarity of accounts on social network sites. The method can be used for monitoring online public opinions and solves the identification problem that the same user has a plurality of accounts on the social network sites. The method disclosed by the invention comprehensively considers three factors which affect the integrated similarity of the accounts of the social network sites: personal attribute, inter-behavior and content, and determines the weight of the similarity of each factor in calculation of the integrated similarity by using a training sample. Compared to the prior art, the method disclosed by the invention has the technical advantages that (1) the method provides quantitative, reliable and comprehensive reference for identifying the plurality of the accounts on the social network sites of the same user and is suitable for automatic processing by a computer under a big data environment; and (2) the weight of the similarity of the each factor in the calculation of the integrated similarity is determined by adopting the training sample, so that the consistence of the result and a manual processing result can be maintained.

Description

technical field [0001] The invention belongs to computer Internet data mining technology, which is used for computer Internet data dissemination control, in particular to a comprehensive data mining method for social networking site account similarity. Background technique [0002] The rise of social networking sites represented by Weibo has greatly increased the speed and breadth of Internet information dissemination. Users of social networking sites can spread information on a large scale in a very short period of time through operations such as "mutual powdering", reposting, commenting, and "reposting". This kind of short-term and large-scale information dissemination not only brings great convenience to users in obtaining information, but also brings serious problems of the proliferation of Internet rumors. [0003] In order to deal with the flood of rumors on the Internet, public opinion monitoring is an indispensable link in the management of social networking sites. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06Q50/00
CPCG06F16/958G06Q50/01
Inventor 徐琳王犇葛唯益刘畅徐欣
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products