Webpage classification method and device based on URL analysis
A webpage classification and webpage technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of slow webpage classification speed, and achieve the effect of fast and effective classification
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] Such as figure 1 Shown is a flow chart of a method for classifying webpages based on URL analysis provided by the present invention, the method comprising the following steps:
[0042] In step S1, the complete URL is divided into blocks, and feature words are screened out from the URL blocks according to the URL dictionary, and URLs are roughly classified according to the URL dictionary and feature words to obtain roughly classified webpages and their corresponding categories.
[0043] Step S2, after preprocessing the webpage text in the webpage that cannot be roughly classified and converting it into a vector model, classify it through the generated classifier to obtain the webpage that cannot be roughly classified and its corresponding category.
[0044] Step S3, storing the complete URL, web pages that can be roughly classified and their corresponding categories, and web pages that cannot be classified and their corresponding categories.
[0045] Such as figure 2 ...
Embodiment 2
[0058] Such as Figure 4 As shown, it is a functional block diagram of a webpage classification device based on URL analysis provided by the present invention, and the device includes:
[0059] The web page rough classification module 10 is used to divide the complete URL into blocks, and filter out characteristic words from the URL blocks according to the URL dictionary, and roughly classify the URLs according to the URL dictionary and the characteristic words, so as to obtain the web pages and their web pages that can be roughly classified. corresponding category.
[0060] The webpage text classification module 20 is used to preprocess the webpage text in the webpage that cannot be roughly classified and convert it into a vector model, and then classify it through the generated classifier to obtain the webpage that cannot be roughly classified and its corresponding category.
[0061] The storage module 30 is configured to store complete URLs, webpages that can be roughly cl...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com