Unlock instant, AI-driven research and patent intelligence for your innovation.

Data processing method and system based on web crawler and cloud platform

A technology of web crawler and web crawler, which is applied in the direction of network data indexing, network data retrieval, and special data processing applications, etc., and can solve the problems of crawler data pollution, reliable data crawling reliability reduction, etc.

Active Publication Date: 2021-02-19
金服软件(广州)有限公司
View PDF11 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the scenario of large-scale data analysis, due to the large amount of crawled data, crawler data of different webpage data may pollute each other, and the reliability of reliable data crawling is reduced.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and system based on web crawler and cloud platform
  • Data processing method and system based on web crawler and cloud platform
  • Data processing method and system based on web crawler and cloud platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in some embodiments of the present invention. Obviously, the described embodiments are the Some, but not all, embodiments are invented. The components of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

[0075] Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on some embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the prese...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of internet and data processing, in particular to a data processing method and system based on web crawlers and a cloud platform. The method comprises thesteps that a webpage crawler instruction input by a user is acquired, the webpage crawler instruction comprises target webpage information and a crawling object set, then target crawler data corresponding to the target webpage information and the crawling object set is acquired, and the target crawler data is stored in a target distributed storage node, wherein the target distributed storage nodeis a storage node corresponding to the webpage object set in the distributed storage system, compared with the prior art, the reliability of crawler data storage during large-scale data crawling can be improved, data required by a user can be fully crawled by crawling the current webpage content data and the historical webpage content data, and the integrity of data crawling is improved.

Description

technical field [0001] The present invention relates to the technical field of Internet and data processing, in particular, to a data processing method, system and cloud platform based on a web crawler. Background technique [0002] A web crawler is a program or script that can automatically grab web page information according to set rules; using a web crawler, it can quickly obtain the web page data required by users, thereby providing technical support for large-scale data collection. [0003] Wherein, in the process of using a web crawler to crawl data, the prior art can save the crawled data locally on the device. However, in the scenario of large-scale data analysis, due to the large amount of crawled data, crawler data of different webpage data may pollute each other, and the reliability of reliable data crawling is reduced. Contents of the invention [0004] The purpose of the present invention is to provide a web crawler-based data processing method, system and cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/951G06F16/27
CPCG06F16/27G06F16/951
Inventor 詹能勇刘振宇
Owner 金服软件(广州)有限公司