Unlock instant, AI-driven research and patent intelligence for your innovation.

Data acquisition method and system thereof based on distributed crawler

A data acquisition and distributed technology, applied in the field of data acquisition method and system based on distributed crawler, can solve problems such as low grabbing efficiency, and achieve the effect of satisfying time, improving efficiency and fast grabbing

Inactive Publication Date: 2018-09-14
合肥俊刚机械科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a data acquisition method based on distributed crawlers and its system. By dividing the webpages and determining the crawler crawling parameters according to the division of the webpages, the webpage information can be grabbed through the crawler crawling parameters, and jump The transfer connection can be connected with other webpages to realize the rapid crawling of webpage information, which solves the problem of low crawling efficiency in the process of crawling by existing crawlers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data acquisition method and system thereof based on distributed crawler
  • Data acquisition method and system thereof based on distributed crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0028] see figure 1 Shown, the present invention is a kind of data acquisition method based on distributed crawler, comprises the following steps:

[0029] S1. Divide the webpage according to the attributes of different webpages;

[0030] S2. For the division of web pages, determine at least one crawler crawling parameter;

[0031] S3. Crawl the current webpage information according to the determined crawler crawling parameters, and analyze the captured webpage i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data acquisition method based on a distributed crawler. The method comprises the following steps: dividing web pages according to different webpage attributes; determining atleast one crawler crawling parameter as for division of web pages; according to a determined crawler crawling parameter, crawling current webpage information and analyzing crawled webpage informationin order to obtain analysis data; compiling the analysis data obtained; collecting complied webpage information and reporting the information to a data saving module; and storing the collected webpage data information. With the data acquisition method and system thereof based on the distributed crawler, the crawler crawling parameter is determined by dividing the web pages and according to divisions of the web pages. Through the crawler crawling parameter, webpage information can be crawled. Additionally, connection with other web pages is established by jump connection so that rapid crawlingof webpage information is achieved. Data can be collected within the short period of time. Therefore, collection efficiency is increased and the time requirement is met.

Description

technical field [0001] The invention belongs to the technical field of network data acquisition, and relates to a distributed crawler-based data acquisition method and a system thereof. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, how to effectively extract and use this information has become a huge challenge. Search engines, such as the traditional general search engines AltaVista, Yahoo and Google, etc., as a tool to assist people in retrieving information, become the entrance and guide for users to access the World Wide Web. However, these general search engines also have certain limitations. Therefore, web crawlers have emerged at the historic moment. , is a program or script that automatically captures information on the World Wide Web according to certain rules. Other less commonly used names include ant, autoindex, emulator, or worm. A web crawler is a program th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王华伟
Owner 合肥俊刚机械科技有限公司