Network content asynchronous grasping system and method

A network content and asynchronous technology, applied in the Internet field, can solve problems such as extensive control granularity, poor crawling performance, and avalanche effect of the crawling system, and achieve the effects of saving system resources, ensuring stability, and improving crawling performance

Active Publication Date: 2017-04-26
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Under these two methods, the control granularity is too extensive. For slow back-end systems, the crawling performance is poor, an

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network content asynchronous grasping system and method
  • Network content asynchronous grasping system and method
  • Network content asynchronous grasping system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. On the contrary, the embodiments of the present invention include all changes, modifications and equivalents coming within the spirit and scope of the appended claims.

[0021] figure 1 It is a schematic structural diagram of a system for asynchronously grabbing network content proposed by an embodiment of the present invention.

[0022] see figure 1 , the network content asynchronous grabbing system includes: a task queue manager 100, used to provide at least one task queue; a scheduler 200, used to read the Uniform Resource Locator URL...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention proposes a network asynchronous grasping system and method. The network asynchronous grasping system comprises a task queue manager used for providing at least one task queue; a scheduler used for reading a uniform resource locator URL of a network content to be grasped from each task queue, and triggering a driver to schedule the URL according to the environment type of a back end where a task to which the URL belongs locates; the driver used for reading task information of the task of the URL after being triggered by the scheduler, injecting the URL into a grasping pool based on the task information, and controlling the frequency of injecting the URL into the grasping pool according to the task information, wherein the task information comprises the query per second and a concurrency value; and an actuator used for reading the URL from the grasping pool, and grasping the URL. The invention can ensures the stability of the grasping system during high concurrency, effectively save system resources and improve the grasping performance.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a system and method for asynchronously capturing network content. Background technique [0002] With the development of the Internet, the Internet will contain a large amount of network content. In some application scenarios, it is necessary to use some computer technology to extract the network content that users need from the massive network content. This computer technology is called crawling. For example, web content can be crawled by using a crawler. [0003] In related technologies, the grabber adopts a concurrency control strategy, or a query rate per second (Query Per Second, QPS) control strategy, wherein, the concurrency control strategy independently controls the total concurrent queue length through threads or processes, and each process or Threads execute fetching synchronously to ensure that the total length of the queue is fixed and the pressure on the system...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F9/48
CPCG06F9/4881G06F16/951
Inventor 卢刚孙鹏宇覃安
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products