Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web page classification standard acquisition method and device and web page classification method and device

A webpage classification and acquisition method technology, applied in the communication field, can solve the problem of low classification accuracy and achieve the effect of improving classification and accurate webpage classification

Inactive Publication Date: 2015-03-18
ZTE CORP
View PDF9 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main technical problem to be solved by the present invention is to provide a webpage classification standard acquisition method and device and a webpage classification method and device to solve the problem of low classification accuracy caused by extracting keywords from the webpage text in the existing webpage classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page classification standard acquisition method and device and web page classification method and device
  • Web page classification standard acquisition method and device and web page classification method and device
  • Web page classification standard acquisition method and device and web page classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] This embodiment describes in detail the acquisition process of acquiring a web page classifier, that is, a standard for classifying web pages.

[0056] See figure 1 As shown, this figure shows a schematic structural diagram of a web page classification standard acquisition device, which includes:

[0057] The webpage obtaining module is used to obtain the sample webpages of each standard classification; the standard classification in this embodiment refers to the webpage classification in the initial classification library, and the initial classification here is the initial hope that the webpages are divided into several categories, then use this Targeted selection of sample web pages for initial classification. Obtaining the sample webpages corresponding to the classification in the initial classification database can specifically obtain the corresponding website webpages by searching the required classifications from the navigation website or through a search engine;...

Embodiment 2

[0080] The first embodiment described above describes the process of obtaining a web page classifier. On this basis, this embodiment focuses on the process of classifying webpages of location type, that is, webpages to be classified.

[0081] See Figure 4 As shown, the web page classification device in the present embodiment includes:

[0082] The label acquisition module is used to extract the label content of the webpage to be classified, and the rules of its extraction can be the same as the above-mentioned rules for obtaining the label of the webpage classifier;

[0083] The feature word acquisition module is used to extract corresponding feature words from the label content according to the standard feature words in the standard proportion list obtained in embodiment one; specifically, it can extract the same standard feature words from the label content as the standard feature words in the list. Words as feature words; or extract from the label content the same and si...

Embodiment 3

[0105] In order to better understand the present invention, the solution provided by the present invention is applied to a specific scene as an example to further illustrate the present invention; please refer to Figure 6 ,include:

[0106] Step 601: Acquiring standard classifications (that is, initial classifications), including catering, animation websites, transportation and tourism, education and culture, finance and economics, military defense, and automobile websites;

[0107] Step 602: Obtain one sample webpage through website navigation or search engine corresponding to each category, of course there can be multiple, for example, obtain 10 webpages corresponding to each category, etc. Here, only one example is used for illustration;

[0108] Step 603: Extract the label content of the sample webpage, assuming that the extracted label content may include: title, keywords, discrption and h1, h2, h3;

[0109] Step 604: Segment the obtained tag content to obtain standard ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed are a method and device for obtaining web page category standards, and a method and device for categorizing web pages. Tag contents are extracted from sample pages; standard characteristic terms are extracted from the tag contents; on the basis of the standard characteristic terms extracted from the tag contents, a list of standard categories and standard weights of standard characteristic terms, i.e. the standard for web page categories, are obtained; web pages to be categorized are categorized on the basis of the standard.

Description

technical field [0001] The invention relates to the communication field, in particular to a web page classification standard acquisition method and device, and a web page classification method and device. Background technique [0002] Classify various webpages, so as to analyze the records of users' access to webpages, so as to obtain the user's online preference, which can further provide users with Internet services based on this preference, and optimize the satisfaction of user experience. At present, there are many methods for classifying webpages. These methods extract keywords from the text of the webpage, and then build an algorithm model for the keywords to obtain a classifier (that is, the classification standard of webpages), and use this classifier to classify unknown categories. Web pages are categorized. These existing web page classification methods all have the following technical problems: [0003] Existing webpage classifications extract keywords from the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F17/30G06F17/30707G06F16/353G06F16/958G06F16/285G06F40/117
Inventor 于波
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products