Method and device for identifying webpage categories
An identification method and webpage technology, applied in the Internet field, can solve problems such as inability to identify webpage categories, lack of methods for identifying webpage categories, etc., and achieve the effect of easy extraction
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0023] figure 1 The implementation flow of the method for identifying the webpage category provided by the first embodiment of the present invention is shown, and the details are as follows:
[0024] In step S101, the page features of the webpage to be identified are acquired.
[0025] Specifically, the webpage to be identified includes a webpage address, page information, and corresponding webpage source code information.
[0026] Acquiring the page features of the webpage to be identified may be obtained before extracting the webpage content or before viewing the page content through a mobile phone browser when the terminal is a mobile terminal. Or when analyzing the user behavior, identify the category of the webpage before or after the user obtains and views the webpage.
[0027] Specifically, the page features may include one or more of the following features: web page address features, web page title features, secondary navigation features, document object model DOM tr...
Embodiment 2
[0060] figure 2 The implementation flow of a method for identifying web page categories provided by the second embodiment of the present invention is shown, and the details are as follows:
[0061] In step S201, web page samples marked with web page categories are obtained.
[0062] The webpage samples marked with webpage categories can be marked as text pages or picture text pages by the staff in advance based on experience identification, and the webpage samples used for training can also be marked as other categories according to needs and the specific content of the webpage .
[0063] In step S202, according to the category of the webpage and the page features of the webpage sample, a decision tree model is obtained through training with a classification regression algorithm.
[0064] As a preferred implementation manner, according to the webpage category and the page features of the webpage samples, a recursive method may be used to divide the samples into multiple sma...
Embodiment 3
[0074] image 3 It shows a structural block diagram of a device for identifying webpage categories provided by the third embodiment of the present invention, and the details are as follows:
[0075] The identification device of the webpage category described in the embodiment of the present invention includes:
[0076] A page feature acquiring unit 301, configured to acquire the page feature of the webpage to be identified;
[0077] A page feature loading unit 302, configured to load the page features according to a pre-generated decision tree model, the decision tree model is generated by training a plurality of sample web pages whose web page categories have been determined;
[0078] The traversal search unit 303 is configured to recursively traverse the decision tree model, search for leaf classification nodes of the decision tree corresponding to the page features, and obtain the webpage category of the webpage to be identified from the leaf nodes.
[0079] Specifically,...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com