Microblog acquisition system and method based on events

A collection system and microblog technology, applied in the field of information security, can solve the problems of low degree of automation, difficulty in meeting industrial needs, and inability to realize automatic integration of information, etc. Effect

Inactive Publication Date: 2014-07-16
上海数据分析与处理技术研究所 +2
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this technology only solves the acquisition of dynamic news and forum information, and cannot realize the automatic integration of informati...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog acquisition system and method based on events
  • Microblog acquisition system and method based on events

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] Such as figure 1 As shown, the present embodiment relates to an event-based microblog acquisition system, comprising: a URL construction module, a JSSH client module, a browser collection module, and an HTML parsing module, wherein: the URL construction module is connected with the JSSH client module and The collected URL information is transmitted, the JSSH client module is connected with the browser collection module and transmits JSSH instructions, and the browser collection module is connected with the HTML parsing module and transmits HTML text information.

[0024] The JSSH instruction includes, but is not limited to: browser jump and other action instructions.

[0025] Such as figure 2 As shown, taking the Firefox browser (FireFox) as an example, the described system realizes microblog collection through the following steps:

[0026] The first step is to pass browser instructions through the JSSH client, and connect to the Weibo login page to perform login act...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a microblog acquisition system and method based on events and belongs to the technical field of information security. The system comprises an URL structure module, a JSSH client module, a browser acquisition module and an HTML analysis module, wherein the URL structure module is connected with the JSSH client module and used for transmitting acquired URL information, the JSSH client module is connected with the browser acquisition module and used for transmitting JSSH instructions, and the browser acquisition module is connected with the HTML analysis module and used for transmitting HTML text messages. By means of the microblog acquisition system and method based on the events, abstract data such as a microblog author name, a microblog author homepage URL, a microblog author head portrait URL, microblog body content, a microblog short link, microblog issue time, a microblog issue client, the number of forwarding times and the number of comments of a microblog message can be acquired through analysis, each piece of unstructured data is changed into structural data, and therefore abstract data can become concrete to be used in follow-up data mining.

Description

technical field [0001] The present invention relates to a system and method in the technical field of information security, in particular to an event-based microblog collection system and method, through which microblog information acquired by the system can be used for data mining and data analysis. Background technique [0002] Most of the existing collection systems use the method of directly collecting the website. Such as Liu Lan, Wu Zhenxin in "Web Archive Information Collection Process and Key Issues Research" (Information Theory and Practice, 2009) and Lin Ying, Wu Zhenxin, Zhang Zhixiong in "WebArchive Archiving Strategy Analysis" (Modern Library and Information Technology, 2009.). The defects of these acquisition systems mainly lie in: firstly, the efficiency is low, and the load requirement on the acquisition system is very high. Secondly, it is necessary to face various website conditions, and the format analysis of the collection source is relatively complicate...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/86G06F16/986
Inventor 李翔裘瑛黄豫蕾王佳凯陈继国林祥陈璐艺冯皪魏
Owner 上海数据分析与处理技术研究所
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products