Unlock instant, AI-driven research and patent intelligence for your innovation.

Response scheduling method in crawler system

A scheduling method and crawler system technology, applied in the field of network resource search, can solve problems such as waste of system resources and poor real-time performance, and achieve the effect of avoiding waste of service resources

Inactive Publication Date: 2019-02-26
真相网络科技(北京)有限公司 +1
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome at least to a certain extent the problem of poor real-time performance or waste of system resources caused by uniformly setting a scanning frequency for a site, this application provides a response scheduling method in a crawler system, including:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Response scheduling method in crawler system
  • Response scheduling method in crawler system
  • Response scheduling method in crawler system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0033] figure 1 It is a flowchart of a response scheduling method in a crawler system provided by an embodiment of the present application.

[0034] Such as figure 1 As shown, the method of the present embodiment includes:

[0035] S1: Divide multiple entry seed tasks according to the hierarchical structure of the website;

[0036] S2: Preset the initial collection frequency according to the level of each entry seed task and the news volume per unit time;

[0037] S3: preset multiple news attributes, and formulate adjustment rules corresponding to the news attributes;

[0038] S4: Update the calculation sampling frequency of each entry seed task in real time according to the adjustment rule.

[0039] As an optional implementation manner of the present invention, the dividing a plurality of entry seed tasks according to the hierarchical block stru...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a response scheduling method in a crawler system. The method comprises the following steps of: dividing a plurality of entrance seed tasks according to a hierarchical plate structure of a website; The initial collection frequency is preset according to the level plate and the news volume per unit time of each entrance seed task. A plurality of news attributes are preset and adjustment rules corresponding to the news attributes are formulated. The calculation sampling frequency of each entry seed task is updated in real time according to the adjustment rule. The presentapplication allocates an optimal collection frequency for each inlet seed in real time, avoids the waste of service resources caused by information output and frequency mismatch, and indirectly reduces the pressure of the monitoring station.

Description

technical field [0001] The present application relates to the technical field of network resource search, in particular to a response scheduling method in a crawler system. Background technique [0002] With the advent of the era of big data, the mining and analysis of massive data has become a research hotspot, and data collection is the basis of data mining and analysis. In the process of data collection, the most important thing is the real-time, accuracy and comprehensiveness of data collection. The real-time nature of data collection, that is, whether a piece of information is discovered in time can directly affect the development of an event, so when designing a crawler program, the frequency of program scanning monitoring sites becomes very important. [0003] In related technologies, the data acquisition system usually uniformly sets a scanning frequency for a site, but during the acquisition process, it often occurs that the real-time performance is poor when colle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
Inventor 石松孙志国
Owner 真相网络科技(北京)有限公司