Dicycle crawler system based on Spark Streaming and running method thereof
A crawler system, dual-cycle technology, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effect of strong versatility, stable operation, and difficult expansion of crawler
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0028] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.
[0029] Such as figure 1 As shown, the dual-cycle crawler system based on Spark Streaming includes: page download module, DNS cache module, URL distribution scheduling module, URL extraction module, URL deduplication module, page scheduling module, page analysis module, page extraction module, storage system and web background.
[0030] (1) Page download module
[0031] The page download module is responsible for downloading pages. When downloading, it calls the DNS information cached in the DNS ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com