Unlock instant, AI-driven research and patent intelligence for your innovation.

Web crawler flow control automatic degradation method based on time window

A flow control and web crawling technology, applied in data exchange networks, digital transmission systems, electrical components, etc., can solve problems such as historical data and real-time data prioritization that cannot be solved by web crawlers

Active Publication Date: 2016-05-25
HUNAN ANTVISION SOFTWARE
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although this invention can reduce server system overhead, filter client requests more accurately, and effectively improve the filtering efficiency of the system; however, this invention cannot solve the problem of priority processing of web crawler historical data and real-time data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler flow control automatic degradation method based on time window

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Such as figure 1 As shown, a kind of time window based web crawler flow control automatic degradation method of the present invention comprises the following steps,

[0026] Step S101: Data reception, receiving web crawler data, and decoupling the reception of crawler data through message middleware.

[0027] Step S102: Determine the priority of data processing. By recording the delay between the original time and the current time of the web crawler data, a normal distribution diagram of the delay time is generated, and the priority of the data is determined according to the delay measurement values ​​of different quantiles. Specifically include the following steps,

[0028] Step S21: delay time recording, record the original time of the web crawler data and the delay of the current time; when the system is running, it will record the actual time of each piece of data within a period of time and a delay value of the current time, and pass the normal distribution graph ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of network flow control, and in particular relates to a web crawler flow control automatic degradation method based on a time window. The method comprises the steps of S101, data receiving, specifically comprising receiving web crawler data, and decoupling the receiving of the crawler data via a message middleware; S102, determination of data processing priorities, specifically comprising generating the normal distribution diagram of delay time through recording delay between the original time and the current time of the web crawler data, and determining the priorities of the data according to delay metrics in different percentiles; S103, flow control, specifically comprising comparing the information processing speed of a flow control system with a threshold, priority processing the data with lower delay metrics, if the processing speed is allowed, simultaneously processing temporarily stored data, and controlling the processing speed of the temporarily stored data; and S104, completing the processing. By using the method, the processing speed is dynamically adjusted, and the system stability is ensured; the delay time window for processing messages is adjusted according to the current speed, and the stable operation of the system is ensured.

Description

technical field [0001] The invention relates to the technical field of network flow control, in particular to an automatic degradation method for network crawler flow control based on a time window. Background technique [0002] When the web crawler system captures data, the number of captured messages will increase sharply due to occasional emergencies, and it is mainly composed of historical data, which makes the system in a peak state and even exceeds the processing capacity of the system, making the system unable to effectively Processing valuable real-time data can even lead to the collapse of the entire system, making it impossible to provide services. [0003] Chinese invention patent application CN103107948A discloses a flow control method, including: intercepting the request of the client at the application server: obtaining a combination of user information according to the request; matching the combination of user information with the combination of user informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/851
CPCH04L47/24
Inventor 覃璐
Owner HUNAN ANTVISION SOFTWARE