Distributed web crawler system
A distributed network and crawler system technology, applied in the field of distributed network crawler systems, can solve problems such as the inability to realize the correlation between pages and topics, and the speed and quality of crawling webpages that cannot meet user requirements, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0044]The system of the present invention adopts a distributed system structure based on a data extractor, and is composed of a central master control node and a distributed crawler server. figure 1 .
[0045] Such as figure 1 Shown, the present invention mainly is made up of following modules:
[0046] 1. Management Portal
[0047] The management portal is a web interface provided by the crawler system to the administrator. You can view the logs of the center and sub-servers, set and add topics, update the URL seed of a certain topic, configure parameters such as the frequency of crawling topics, and control the status of the crawler. The central node and distributed crawlers are the main body of the system, completing topic operations, learning of data extractors, page analysis and storage of target pages.
[0048] 2. Central node server
[0049] The crawler center master control node is the control center, mainly including URL controller, extractor module and theme cont...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com