Unlock instant, AI-driven research and patent intelligence for your innovation.

Distributed network data acquisition method and acquisition system thereof

A data acquisition system, distributed network technology, applied in network data indexing, network data retrieval, transmission system and other directions, can solve problems such as attacks and inability to capture

Active Publication Date: 2019-04-19
SOUTH CHINA AGRI UNIV
View PDF10 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] When collecting data information on agricultural websites, although the crawler work complies with the Robots protocol to interact with the website, the long-term and / or frequent normal crawler work may be attacked by the website's anti-crawler error, and cannot be performed normally. fetching jobs for

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed network data acquisition method and acquisition system thereof
  • Distributed network data acquisition method and acquisition system thereof
  • Distributed network data acquisition method and acquisition system thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The following describes the present invention in detail through specific embodiments in combination with the drawings. It should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other if there is no conflict, and the protection scope of the present invention is not limited to this.

[0046] See figure 1 , Which shows the implementation environment diagram involved in each embodiment of the present invention, and the implementation environment includes a host 100, a slave 200, and the Internet 300.

[0047] The host 100 refers to a computer that issues main commands, and may be a desktop computer, a portable computer, a tablet computer, or other intelligent terminals that can be used to issue main commands, and so on. The host 100 is provided with a Scrapy framework. The Scrapy framework mainly includes an engine, a scheduler that interacts with the engine through a scheduling middleware, a downloader that in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of network data acquisition, in particular to a distributed network data acquisition method and an acquisition system thereof. The method comprises the following steps: eliminating duplications on a link in a request queue by virtue of a scheduler, and assigning the request queue to a corresponding slave node and performing network data acquisition; when a network data acquisition behavior of a certain acquisition node encounters an attack behavior of an acquired website, triggering a corresponding defense mechanism; judging an attack type accordingto the attack behavior by the defense mechanism, and judging whether the attack type is matched with a preset defense type of the slave node corresponding to the acquisition node or not; if yes, executing a defense measures corresponding to the defense type to eliminate an attack; if not, cancelling the network data acquisition behavior of the acquisition node, and returning a request queue not acquired to the scheduler for waiting for reassignment, so that a corresponding measures can be timely adopted for eliminating a crisis when normal network data acquisition work encounters an error attack of the acquired website.

Description

Technical field [0001] The invention relates to the technical field of network data collection, in particular to a distributed network data collection method and its collection system. Background technique [0002] Network data collection refers to the use of Internet search engine technology to achieve targeted, industry-specific and accurate data capture, and to classify data according to corresponding rules to form a database file. [0003] The patent publication number CN108121706A is a distributed crawler optimization method. The specific steps of the distributed crawler optimization method are as follows: the dispatch center issues tasks; the crawler grabs the content of the webpage according to the URL; the parser parses the content of the webpage; If there are more updates, the content of the web page is returned to the data warehouse; the parser parses the links in the web page and uses the Bloom filter locally to de-duplicate; hash the URLs that have passed the local de-d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/06G06F16/951
CPCH04L63/1416H04L63/145H04L63/1466
Inventor 王乐乐杨自尚韩宇星
Owner SOUTH CHINA AGRI UNIV