Public sentiment data real-time collecting method and system based on distribution

A real-time collection and data collection technology, applied in relational databases, database models, network data retrieval, etc., can solve problems such as the inability to discover and obtain key information well, the inability to meet the real-time nature of data, and the inability to execute multiple tasks at the same time. Achieve the effect of enhancing scalability and portability, improving reusability, and high stability

Active Publication Date: 2016-11-09
SOUTHWEST PETROLEUM UNIV
View PDF4 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these general search engines also have certain limitations. 1. The results returned by the general search engines include a large number of web pages that users do not care about.
General search engines are often powerless to data with dense information content and a certain structure, and cannot discover and obtain key information well
2. The simple Internet-based data acquisition system has a single acquisition method and cannot perform multiple tasks at the same time, which leads to low data acquisition efficiency and cannot meet the real-time nature of the data
3. In other current public opinion analysis and detection systems, most of the public opinion data adopts an offline processing mechanism, which inevitably leads to a certain time delay in the data in terms of structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Public sentiment data real-time collecting method and system based on distribution
  • Public sentiment data real-time collecting method and system based on distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] Further describe the technical scheme of the present invention in detail below in conjunction with accompanying drawing:

[0051] Such as figure 1 As shown, a real-time collection method based on distributed public opinion data, which includes the following steps:

[0052] S1: Establish a public opinion data website class library, classify public opinion data source sites, and define crawling data items for each type of website, including the following sub-steps:

[0053] S11: Collect all data source sites into three categories: news, Weibo, and forums; collect information including website address, page title, and keywords for each type of website to establish a related keyword library;

[0054] The data sources of public opinion information mainly include news portals, social networking sites, forum blogs and other mainstream information sharing websites. Although there are great differences in page format and layout between websites, the data item settings of the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a public sentiment data real-time collecting method and system based on distribution. The method comprises the following steps that firstly, a public sentiment data website class library is established, and crawling data items are classified and defined; secondly, a data acquisition website list is transmitted to a data collection server, the server allocates corresponding crawlers to circularly crawl data in a dormant mode; thirdly, crawled source webpage data is subjected to label analysis, the position of a target data item is positioned, and the target data item is obtained; fourthly, an obtained result data item is encapsulated into a uniform format of a corresponding class; fifthly, the encapsulated data is stored into a corresponding database; sixthly, a monitoring log file is generated. According to the public sentiment data real-time collecting method and system based on distribution, the architecture is advanced, a factory mode is used as a main design mode of the system, new examples can be quickly generated, system core functions such as browser assess, log generating, data encapsulation, agent setting and queue setting are encapsulated, the expandability and transportability of the system are enhanced, and the reusability of codes and the maintainability of the system are improved.

Description

technical field [0001] The present invention relates to a method and system for distributed and highly concurrent collection of Internet public opinion data, in particular to a method and system for realizing high-efficiency and real-time collection of target data, and especially to a method and system for real-time collection of public opinion data based on distribution. Background technique [0002] Public opinion refers to the social and political attitudes of the people towards social managers in a certain social space around the occurrence, development and changes of intermediary social events. At present, the Internet has become one of the main carriers to reflect public opinion, and plays an important role in the dissemination of public opinion. The outbreak of public opinion crisis makes more and more people pay attention to the emergence and development of public opinion. Party and government organs at all levels must keep abreast of information, strengthen monitor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/1734G06F16/1805G06F16/182G06F16/284G06F16/951
Inventor 李平陈雁胡栋代臻刘婷许斌孙先林辉赵玲
Owner SOUTHWEST PETROLEUM UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products