A multi-thread-based web crawler system and webpage crawling method thereof
A web crawler and multi-threading technology, applied in the field of multi-threaded web crawler systems, can solve problems such as low efficiency, difficult maintenance, complex programs, etc., and achieve the effect of improving concurrent efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0046] Such as figure 1 As shown, a multithread-based web crawler system includes a URL processing module, a webpage crawling module, a webpage analysis module and a webpage storage module.
[0047] The URL processing module obtains the host name, port number, and file name of each URL through URL class processing.
[0048] The general form of URL is: : / / : / . In this program, it can be made simple, so a class for storing URLs is designed, which includes Host (host name), Port (port), File (file path), Fname (this is for this web page called name). The following code is all members of the URL class and its member functions:
[0049] class URL
[0050] {
[0051] public:
[0052] URL() {}
[0053] void SetHost(const string& host) {Host = host;}
[0054] string GetHost() {return Host;}
[0055] void SetPort(int port) {Port = port;}
[0056] int GetPort() {return Port;}
[0057] void SetFile(const string& file) {File = file;}
[0058] string GetFile() {return File;}
[...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com