Method for acquiring PCU association data in specific topic microblogs

A technology of linked data and microblogging, applied in special data processing applications, electronic digital data processing, instruments, etc., can solve the problems of inability to organize social network data, unable to adapt to the complex correlation of social networks, etc., and achieve efficient data acquisition and organization. Reasonable, avoid ineffective expenses, accurately obtain and organize the effect

Inactive Publication Date: 2015-09-16
XI AN JIAOTONG UNIV
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method described in the patented invention mainly solves the crawling direction of web crawlers and relies on the knowledge base formed by social annotations. However, this method cannot form the relationship between the acquired content and cannot adapt to the natural complex correlation of social networks, so it cannot be used for social Effective organization of network data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for acquiring PCU association data in specific topic microblogs
  • Method for acquiring PCU association data in specific topic microblogs
  • Method for acquiring PCU association data in specific topic microblogs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below in conjunction with accompanying drawings and examples.

[0044] The implementation process of constructing PCU associated data from elements scattered on various pages and their logical relationships on Sina Weibo is as follows: figure 1 As shown, it can be divided into the following three processes:

[0045] (1) Acquisition of data access rights, including 5 steps.

[0046] Step 1: Start the IE browser through selenium, automatically enter the Sina Weibo login homepage http: / / www.weibo.com / login.php, locate the HTML tag and enter the account and password ;

[0047] Step 2: Use selenium to automatically fill in the registered user name and password according to the label in step 1, such as filling in the account "robbersunsohu.com" and password "897fgCKdf";

[0048] Step 3: According to whether the login page contains the HTML tag , determine whether it is necessary to enter the verification code; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for acquiring PCU association data from microblogs, and aims to overcome the technical defect of incapability of acquiring associated microblog posts, comments and posters in the prior art. The method comprises the following steps: (1) gaining of a data access permission: automatically filling identity authentication information by analyzing login page HTML (Hypertext Markup Language) tags to gain the data access permission; (2) downloading of PCU association data pages: automatically and sequentially downloading pages containing PCU association data under the guidance of the logical relation of the PCU data according to the HTML structures and tag semantics of microblog pages; and (3) structured parsing and construction of the PCU association data: fusing post relations, user-friend relations and user-post sub-relations to construct a heterogeneous network, namely, a PCU association data network. Through adoption of the method, the PCU association data in Sina microblogs can be acquired automatically; the structured association data network is constructed; and a good data set is provided for subsequent social network mining.

Description

technical field [0001] The invention belongs to computer social network data acquisition technology, in particular to a method for automatically acquiring PCU related data of a specific topic in microblogs. Background technique [0002] The Internet and Web 2.0 promote the rapid development of social networks. Social networks have a large number of users and rapid data generation. The increasing accumulation of data and its complex correlation structure make it more and more difficult to obtain and understand information. [0003] Sina Weibo, the most influential social networking site in China, contains a large amount of potentially valuable information. An important aspect of studying this information is to analyze posts on specific topics, post comments, and posting users in Sina Weibo. These data are scattered in different pages, causing people to be unable to quickly and accurately find or understand these useful information from a large number of pages. [0004] There...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535
Inventor 刘均陈浩米建红吕彦章占梦婷
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products