Distributed online real-time processing method and system for multi-source and heterogeneous flow-state big data

A real-time processing and distributed technology, applied in the direction of electronic digital data processing, special data processing applications, text database clustering/classification, etc., can solve problems such as difficult real-time monitoring and early warning of social security, and achieve stability and Reliability, avoiding resource consumption, and improving write performance effects

Pending Publication Date: 2019-05-10
INFORMATION RES INST OF SHANDONG ACAD OF SCI
View PDF1 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Whether it is the discovery of sensitive topics, the mining of criminal organization relationships, or the dissemination of rumors, current research relies on the accumulation and analysis of a certain amount of data, which belongs to ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed online real-time processing method and system for multi-source and heterogeneous flow-state big data
  • Distributed online real-time processing method and system for multi-source and heterogeneous flow-state big data
  • Distributed online real-time processing method and system for multi-source and heterogeneous flow-state big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The disclosure will be further described below in conjunction with the drawings and embodiments.

[0065] It should be pointed out that the following detailed descriptions are all illustrative and are intended to provide further explanations for the application. Unless otherwise specified, all technical and scientific terms used in the present disclosure have the same meaning as commonly understood by those of ordinary skill in the technical field to which the present application belongs.

[0066] It should be noted that the terms used here are only for describing specific implementations, and are not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should also be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate There are features, steps,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed online real-time processing method and system for multi-source and heterogeneous flow-state big data. The method specifically comprises the steps of crawling webpage data of each source by utilizing a distributed crawler duplicate removal algorithm; pre-processing the crawled page, constructing a corresponding tree by utilizing a visual page segmentation algorithm, pruning noise nodes according to a visual rule, classifying multi-layer pages, determining predicates under different types of pages according to different characteristics, and inferring data recording block nodes and data attribute nodes through the rule; Distributing the preprocessed data source by using a distributed message system, providing a data stream, and describing the state of a data node in the data stream to form state information; Selective storage operation is carried out on data streams by utilizing a Hadoop distributed file system based on K- And detecting the processed data by using a means text clustering method, determining a text similar to a predetermined sensitive information text, and screening out the sensitive information.

Description

Technical field [0001] The present disclosure relates to a method and system for distributed online real-time processing of multi-source, heterogeneous flow state big data. Background technique [0002] The statements in this section merely provide background information related to the present disclosure, and do not necessarily constitute prior art. [0003] The network technology revolution marked by the formation of the Internet has pushed human society into the era of information networking, forming a brand-new social life space-the network environment, which reflects all aspects of social life in real time. In the era of rapid development of the mobile network and the Internet, the high expansion of information has made the current security situation more complicated, and cyber warfare has become an important issue in the field of non-traditional social security. [0004] Because social networking sites such as forums, microblogs, blogs, private spaces, and Renren.com carry a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F16/953G06F16/9535G06F16/955G06F16/35G06K9/62
Inventor 于俊凤魏墨济杨子江李思思朱世伟郭建萍杨爱芹李晨刘翠芹
Owner INFORMATION RES INST OF SHANDONG ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products