A method for distributed collection of public page data
A page data and distributed technology, which is applied in the direction of network data indexing, network data retrieval, and other database retrieval, can solve the problems of resource preemption, weak cluster scalability, and inability to dynamically allocate resources, etc., to improve scalability, The effect of allocating cluster resource saving and improving resource utilization
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0031] Referring to the accompanying drawings, the following is a specific embodiment of a method for distributed collection of public page data of the present invention:
[0032] Multi-source news data collection: artificially designate three news sites as A, B, and C. Adapt the corresponding crawler programs PA, PB, and PC to the different page structure designs and data transmission interfaces of sites A, B, and C. Package the programs as docker images and upload them to the image server. Publish tasks and specify parameters on the cluster host, each machine node receives the task and starts the crawler image configured with parameters, and the crawler images of the three sites are distributed and started in the machine cluster. Use the distributed database to receive the data crawled by all machine clusters and store them in the database. Examples such as
[0033] Figure 4 shown.
[0034] see Figure 1~3 , whose execution steps are subdivided as follows:
[0035] 1...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


