Web crawler flow control automatic degradation method based on time window
A flow control and web crawling technology, applied in data exchange networks, digital transmission systems, electrical components, etc., can solve problems such as historical data and real-time data prioritization that cannot be solved by web crawlers
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0025] Such as figure 1 As shown, a kind of time window based web crawler flow control automatic degradation method of the present invention comprises the following steps,
[0026] Step S101: Data reception, receiving web crawler data, and decoupling the reception of crawler data through message middleware.
[0027] Step S102: Determine the priority of data processing. By recording the delay between the original time and the current time of the web crawler data, a normal distribution diagram of the delay time is generated, and the priority of the data is determined according to the delay measurement values of different quantiles. Specifically include the following steps,
[0028] Step S21: delay time recording, record the original time of the web crawler data and the delay of the current time; when the system is running, it will record the actual time of each piece of data within a period of time and a delay value of the current time, and pass the normal distribution graph ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 