A web page classification method based on deep learning with the fusion of text and structural features
A technology for structural features and web page classification, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem that the accuracy rate of web page text feature classification is not high enough, and achieve the effect of comprehensive and effective classification and high accuracy rate
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0027] The present invention will be further described below in conjunction with the accompanying drawings.
[0028] Such as As shown in Figure 1, a webpage classification method based on deep learning fusion of text and structural features is characterized in that the method includes the following steps:
[0029 ] (1) Obtain web page information
[0030] Enter the URL of the web page, and the scrapy crawler will obtain the HTML document of the web page and store it in the MongoDB database.
[0031] (2) Extract web page text features
[0032] First, from the HTML tag , , , , , to extract text information, these tags represent the title of the web page, meta information, titles at all levels, hyperlinks, etc., including web page main information of . Then preprocess the obtained text, unify lowercase, remove garbled characters, remove abbreviations and numbers, and remove stop words. Stop words are some frequently appearing words that do not have much effect on classification, and the stop...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com