Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler
A technology focusing on crawlers and web page classification, applied in the field of web search engines, can solve the problems of low web page recognition rate, dimensionality disaster, weak function of obtaining structured information, etc., to achieve high application value, improve efficiency, and reduce the number of effects.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0075] The invention proposes a technical framework capable of effectively identifying various URLs in dynamic webpages, and provides a detailed algorithm. The system is divided into three layers, from top to bottom: acquisition layer, analysis layer and presentation layer.
[0076] 1. Web page data collection layer
[0077] Function: The main function of this layer is to realize the collection of dynamic webpage data and hand it over to the upper layer for content analysis.
[0078] Interface: This layer is an interface focusing on crawlers and the network, and is responsible for providing web page source code string input data to the upper layer
[0079] 2. Web page content analysis layer
[0080] Function: This layer is the core layer of the entire design. It mainly analyzes the content of the pages collected by the web page data collection layer, obtains effective hyperlinks according to the weight of the subject words, and builds the URL queue sequence list to be crawle...
PUM

Abstract
Description
Claims
Application Information

- R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com