Method and device for web page classification
A web page classification and web page type technology, applied in the field of Internet communication, can solve the problem of low page classification efficiency, achieve the effect of fast and efficient classification, reduce impact, and improve user experience
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0067] Considering the problem that current webpage classification methods cannot efficiently and quickly classify webpages, this embodiment provides a webpage classification method that is different from the prior art, and utilizes the structural similarity of webpage addresses (URLs) to implement webpage classification. Quick classification; such as figure 1 As shown, the web page classification method of the present embodiment specifically includes the following steps:
[0068] Step 101: Establish a feature word classifier according to a set of webpage samples. The set of webpage address samples includes: a plurality of sample webpage addresses and a webpage type corresponding to each of the sample webpage addresses.
[0069] Before webpages are classified, the method of this embodiment selects some sample webpage addresses and the webpage types corresponding to the sample webpage addresses in advance; such as webpage address 1-financial affairs, webpage address 2-sports, w...
Embodiment 2
[0179] This embodiment provides a web page classification device, such as Figure 5 As shown, it includes: a feature word classifier building module, an acquisition and recognition module, a webpage address processing module, a storage module and a webpage classification module;
[0180] The characteristic word classifier establishing module is used to establish a characteristic word classifier according to a webpage sample set, and the webpage address sample collection includes: a plurality of sample webpage addresses and a webpage type corresponding to each sample webpage address.
[0181] The obtaining and identifying module is used to obtain a predetermined number of webpage addresses, and determine the type of webpage to which each of the webpage addresses belongs through the feature word classifier;
[0182] The webpage address processing module is used to perform deredundancy processing on the webpage address of the webpage type determined by the acquisition and identific...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com