Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Task allocation method and device for controlling web crawler

A technology for network control and task allocation, applied in the Internet field, can solve problems such as the complexity of the crawler system, and achieve the effect of simplifying the structure

Active Publication Date: 2020-02-28
BEIJING GRIDSUM TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main purpose of this application is to provide a task allocation method and device for controlling web crawlers, so as to solve the problem in the related art that assigning web crawler tasks through an intermediate controller causes the crawler system to become more complicated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Task allocation method and device for controlling web crawler
  • Task allocation method and device for controlling web crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0021] In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0022] It should be noted that the terms "first" and "second...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a task distribution method and device for controlling network crawlers. Network crawlers execute tasks by adoption of multiple threads and the multiple threads are pre-stored in a thread pool. The method comprises the following steps of: judging whether the quantity of task threads in the multiple threads achieves a task maximum parallel number or not through a semaphore, wherein an initial value of the semaphore is the task maximum parallel number, and the task threads are threads going through the semaphore; when the fact that the quantity of the task threads in the multiple threads achieves the task maximum parallel number is judged through the semaphore, stopping the threads in the thread pool from obtaining tasks in a URL queue; and when the fact that the quantity of the task threads in the multiple threads does not achieve the task maximum parallel number is judged through the semaphore, controlling the threads in the thread pool to obtain tasks in the URL queue. According to the method and device, the technical problem that crawler systems become relatively complicated as network crawler tasks are distributed through middle controllers in correlation techniques is solved.

Description

technical field [0001] The present application relates to the field of the Internet, and in particular, relates to a task distribution method and device for controlling web crawlers. Background technique [0002] When a web crawler crawls a webpage, it will start from the Uniform Resource Locator (UniformResource Locator, referred to as URL) of one or several initial webpages, extract all URLs on the initial webpage, put them into the URL queue, and wait for the web crawler to obtain a URL from the URL queue. The new URL continues to perform web crawling. The existing method is to assign tasks to web crawlers through an intermediate controller, so the assignment of web crawler tasks is very dependent on the intermediate controller. When an abnormal situation occurs in the intermediate controller, the web crawler will not be able to assign tasks or assign too many tasks. If the web crawler can't assign tasks, it will be idle all the time, wasting machine resources; if the we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951
CPCG06F16/951
Inventor 杨杰
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products