An agent-based intrusive social data acquisition method

A technology of social data and collection methods, applied in the field of information collection, can solve the problems of frequent operations, complicated implementation, troublesome login, etc., and achieve the effect of improving the efficiency of data collection, collecting data comprehensively, and avoiding repeated collection.

Active Publication Date: 2019-05-31
USTC SINOVATE SOFTWARE
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although there are also descriptions on the Internet about the collection of WeChat official account articles, most of them are incomplete, or just a simple overview. For related patents, some implementations are more complicated, or the data is obtained through the interaction between the browser and the Internet.
[0003] Today’s existing WeChat official account article collection technology uses Sogou WeChat as the entrance. The disadvantages of this method are: (1) anti-crawler restrictions, which require the assistance of an ip proxy and coding platform (2) the collected article links are not permanent (3) The number of likes, readings and comments of the article cannot be collected (4) The number of articles collected is limited to the last 10; there is an interface provided by the material management of the WeChat public platform For data, the disadvantages of this method are: (1) The login is more troublesome and requires the user to log in and scan the code to confirm; (2) Anti-crawler restrictions, frequent operations, and direct ban; (3) The obtained article link still cannot be liked counts, views, and comments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An agent-based intrusive social data acquisition method
  • An agent-based intrusive social data acquisition method
  • An agent-based intrusive social data acquisition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0037] See Figure 1-2 As shown, the present invention is an agent-based invasive social data collection method, including the following steps:

[0038] Step S1: Start the scheduled task of the scheduler, take the official account from the database and put it into redis, and perform deduplication processing;

[0039] Step S2: regularly take out the address from redis and put it into the rabbitMq queue, and start the WeChat crawler program;

[0040] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an agent-based intrusive social data acquisition method, and relates to the field of information acquisition. The system comprises a WeChat client, a proxy server, a program server and a WeChat server, a packet capturing tool is used for obtaining a data packet returned by a server side to a client side, a JS is injected and returned to the client side, a JS code is automatically executed when the client side loads a page, so that the browser is connected with a program, and the program sends an instruction to the browser to control the whole collection process. According to the method, more data are loaded through the pull-down operation, the article links are captured, then the detail links are executed to obtain article content, the like number, the reading number, comments and the like, the public number article collection data are comprehensive, operation is easy, and the data collection efficiency is improved.

Description

Technical field [0001] The invention belongs to the field of information collection, and particularly relates to an agent-based invasive social data collection method. Background technique [0002] With the rapid development of the Internet, the Internet has become the most important means for people to obtain information, and with the continuous increase in the amount of data, how to effectively obtain and use these data has become a critical step. Information collection technology can more accurately obtain the specific data users want, and the collection of a large amount of information data also provides a stage for the rise of anti-crawler technology, making data collection more and more difficult. As the mainstream social software, WeChat public account articles have also become an important source of information collection. There are three existing WeChat public account article collection entrances: (1) Sogou WeChat, (2) WeChat public platform material management interfac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08H04L29/06G06F16/951
Inventor 李森李凌悦苏磊
Owner USTC SINOVATE SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products