Unlock instant, AI-driven research and patent intelligence for your innovation.

Distributed internet information acquisition system and method

An Internet information and collection system technology, applied in the field of distributed Internet information collection system, can solve the problems of narrow application range and slow data collection speed, and achieve the effect of strong adaptability, high concurrency requirements, and fast efficiency

Inactive Publication Date: 2020-02-07
河南拓普计算机网络工程有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The invention provides a distributed Internet information collection system and method to solve the existing technical problems of slow data collection speed and narrow application range

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed internet information acquisition system and method
  • Distributed internet information acquisition system and method
  • Distributed internet information acquisition system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] Embodiment 1: a kind of distributed Internet information collection system, see Figure 1 to Figure 2 , centered on the task scheduling module, is mainly responsible for task allocation and download resource analysis; Internet information collection includes downloading original resources (html, images, attachments, etc.) from the server, and then parsing these resources to make them readable by the client. Therefore, the acquisition script loaded into the task scheduling module includes a resource download part and a logic analysis part. The acquisition script is loaded by the lower computer to the task scheduling module, and the task scheduling module arranges the execution sequence of the acquisition script according to the priority, load balancing, and timing setting strategies. The collection script is compiled according to the collection requirements and the requirements of the original information website. The script includes but is not limited to the following l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed internet information acquisition system and method, and aims to solve the technical problems of low data acquisition speed and narrow application range in the prior art. The acquisition system comprises an acquisition script compiling module, a task scheduling module, a micro-service framework and a data storage module, wherein the acquisition script compilingmodule is used for generating an acquisition script for recording acquisition requirements; the task scheduling module is used for adjusting the execution sequence of the acquisition scripts; the micro-service framework communicates with the task scheduling module and is used for receiving the acquisition script and distributing the acquisition script to different downloading nodes; and the datastorage module is used for storing the downloaded content transmitted by the downloading node. The collection method comprises the steps of task loading, task analysis and node distribution. The beneficial effects of the invention are that the method is wide in application range, is high in applicability, and is high in collection efficiency.

Description

technical field [0001] The invention relates to the field of Internet information technology, in particular to a distributed Internet information collection system and method. Background technique [0002] The Internet contains a large amount of valuable information needed by users from all walks of life. There are usually three ways for users to collect this information: (1) Content management system, referred to as cms, such as Zhimeng, Empire and other brands, this system includes collection services, in the system The background provides a simple collection service for general-purpose news websites; however, its scope of use is limited, its flexibility is not unsatisfactory, and it also needs to manually configure the regular matching extraction rules of the response, which is cumbersome to operate; (2) Information collection system for special websites , the system can be customized and developed according to the information structure requirements of webpages published ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/08
CPCH04L67/06H04L67/10
Inventor 李善平
Owner 河南拓普计算机网络工程有限公司