Unlock instant, AI-driven research and patent intelligence for your innovation.

Webpage data acquisition method and device, equipment and storage medium

A technology of data collection and web page data, applied in the field of data processing, can solve the problem of low node task execution efficiency and achieve the effect of improving operation efficiency

Inactive Publication Date: 2019-11-12
NEW H3C BIG DATA TECH CO LTD
View PDF16 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present disclosure is to provide a method, device, device and storage medium for collecting web page data in order to solve the problem of low efficiency of node task execution in the prior art.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage data acquisition method and device, equipment and storage medium
  • Webpage data acquisition method and device, equipment and storage medium
  • Webpage data acquisition method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

[0045] Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary sk...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage data collection method and device, equipment and a storage medium, and relates to the technical field of data processing. The method includes: creating tasks corresponding to data acquisition requirements according to information of the data acquisition requirements and a preset script template, so as to creat corresponding tasks for different data acquisition requirements and the corresponding preset script template; further obtaining operation resources required by the task and operation resource states of a plurality of acquisition nodes, determining an acquisition node matched with the task from a plurality of acquisition nodes as a target node, issuing the task to the target node, so as to enable the target node to perform data acquisition according tothe task, associating the operation resource required by the task with the operation resource state of the acquisition node, and selecting the accurate acquisition node for the task. The operation efficiency of the acquisition node for executing the task, namely the data acquisition efficiency, is improved.

Description

technical field [0001] The present disclosure relates to the technical field of data processing, and in particular to a web page data collection method, device, equipment and storage medium. Background technique [0002] A web crawler is a program or script that automatically grabs information on the World Wide Web according to preset rules. Through web crawler technology, it is possible to search for different data such as pictures, databases, audio / video multimedia, etc., so that it can be searched according to the settings. It can be widely used in the information crawling of major search engines to discover and acquire these data with dense information content and certain structure under certain crawling conditions. [0003] Existing websites for large amounts of data such as news and social networking usually use a distributed crawling system. The distributed crawler system provides stronger crawling capabilities. Different from the single-point crawler system, the dist...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F9/50H04L29/08
CPCG06F9/5005G06F9/5083G06F16/951H04L67/1008
Inventor 董晨辉任延辉谷广鹏
Owner NEW H3C BIG DATA TECH CO LTD