Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler
A technology focusing on crawlers and web page classification, applied in the field of web search engines, can solve the problems of low web page recognition rate, dimensionality disaster, weak function of obtaining structured information, etc., to achieve high application value, improve efficiency, and reduce the number of effects.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0075] The invention proposes a technical framework capable of effectively identifying various URLs in dynamic webpages, and provides a detailed algorithm. The system is divided into three layers, from top to bottom: acquisition layer, analysis layer and presentation layer.
[0076] 1. Web page data collection layer
[0077] Function: The main function of this layer is to realize the collection of dynamic webpage data and hand it over to the upper layer for content analysis.
[0078] Interface: This layer is an interface focusing on crawlers and the network, and is responsible for providing web page source code string input data to the upper layer
[0079] 2. Web page content analysis layer
[0080] Function: This layer is the core layer of the entire design. It mainly analyzes the content of the pages collected by the web page data collection layer, obtains effective hyperlinks according to the weight of the subject words, and builds the URL queue sequence list to be crawle...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


