Web page classification standard acquisition method and device and web page classification method and device
A webpage classification and acquisition method technology, applied in the communication field, can solve the problem of low classification accuracy and achieve the effect of improving classification and accurate webpage classification
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0055] This embodiment describes in detail the acquisition process of acquiring a web page classifier, that is, a standard for classifying web pages.
[0056] See figure 1 As shown, this figure shows a schematic structural diagram of a web page classification standard acquisition device, which includes:
[0057] The webpage obtaining module is used to obtain the sample webpages of each standard classification; the standard classification in this embodiment refers to the webpage classification in the initial classification library, and the initial classification here is the initial hope that the webpages are divided into several categories, then use this Targeted selection of sample web pages for initial classification. Obtaining the sample webpages corresponding to the classification in the initial classification database can specifically obtain the corresponding website webpages by searching the required classifications from the navigation website or through a search engine;...
Embodiment 2
[0080] The first embodiment described above describes the process of obtaining a web page classifier. On this basis, this embodiment focuses on the process of classifying webpages of location type, that is, webpages to be classified.
[0081] See Figure 4 As shown, the web page classification device in the present embodiment includes:
[0082] The label acquisition module is used to extract the label content of the webpage to be classified, and the rules of its extraction can be the same as the above-mentioned rules for obtaining the label of the webpage classifier;
[0083] The feature word acquisition module is used to extract corresponding feature words from the label content according to the standard feature words in the standard proportion list obtained in embodiment one; specifically, it can extract the same standard feature words from the label content as the standard feature words in the list. Words as feature words; or extract from the label content the same and si...
Embodiment 3
[0105] In order to better understand the present invention, the solution provided by the present invention is applied to a specific scene as an example to further illustrate the present invention; please refer to Figure 6 ,include:
[0106] Step 601: Acquiring standard classifications (that is, initial classifications), including catering, animation websites, transportation and tourism, education and culture, finance and economics, military defense, and automobile websites;
[0107] Step 602: Obtain one sample webpage through website navigation or search engine corresponding to each category, of course there can be multiple, for example, obtain 10 webpages corresponding to each category, etc. Here, only one example is used for illustration;
[0108] Step 603: Extract the label content of the sample webpage, assuming that the extracted label content may include: title, keywords, discrption and h1, h2, h3;
[0109] Step 604: Segment the obtained tag content to obtain standard ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com