Data acquisition system and method based on scrapy crawler framework
A data acquisition system and crawler technology, which is applied in the direction of network data indexing, network data retrieval, and other database retrieval, can solve the problems of slow crawling speed and exhaustion of single-machine memory, so as to ensure reliability, improve crawling breadth, Improve the effect of crawling stability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0023] The specific embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.
[0024] The data acquisition system based on the scrapy crawler framework proposed by the present invention, such as figure 1 As shown, it includes a crawler queue module 1, a crawler execution module 2, and a task scheduling module 3. The crawler queue module 1 includes a crawler seed queue 11, a crawler seed processing unit 12, and a crawler task queue 13; the crawler execution module 2 includes a web page download unit 21 and URL mining unit 22; task scheduling module 3 includes crawler process queue 31 and process manager 32.
[0025] The crawler seed queue 11 is used to store crawler tasks, including but not limited to crawler tasks issued by users and new crawler tasks submitted by the crawler execution module 2; the crawler seed processing unit 12 is used to de-duplicate the crawler tasks in the crawler seed queue Screening and pr...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap