Webpage classification dictionary generation method and apparatus

A web page classification and dictionary technology, applied in the field of Internet search, can solve the problems of low accuracy and low accuracy of web page classification dictionary

Active Publication Date: 2016-12-07
NEW H3C TECH CO LTD
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the prior art, when evaluating the importance of a sample word in a sample file, only the number of files in the sample file where the sample word is located is considered, and the accuracy of the determined importance of the sample word in the sample file (ie, the corresponding weight value) is not high. , resulting in low accuracy of the generated web page classification dictionary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage classification dictionary generation method and apparatus
  • Webpage classification dictionary generation method and apparatus
  • Webpage classification dictionary generation method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0027] Embodiments of the present invention provide a method and device for generating a webpage classification dictionary, so as to generate a webpage classification dictionary with higher accuracy.

[0028] The following firstly introduces a method for generating a webpage classification dictionary provided by an embodiment of the present invention.

[0029] Such as figure 1 As shown, a method for generating a web page classification dictionary provided in an em...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention disclose a webpage classification dictionary generation method and apparatus. The method comprises the steps of determining a sample uniform resource locator (URL) corresponding to a webpage classification sample of each category according to a predetermined webpage classification standard, and obtaining sample webpage contents corresponding to each sample URL; extracting sample text information in each piece of the sample webpage contents, performing word segmentation processing on the sample text information, and obtaining a corresponding sample word from the sample text information; and screening out a reverse word frequency value corresponding to the sample word from corresponding relationships between a plurality of pre-stored learning words and the reverse word frequency value, wherein the reverse word frequency value is a value determined according to the occurrence frequency of each learning word in corresponding learning text information; and storing the sample word and a weight value determined according to the corresponding reverse word frequency value in a webpage classification dictionary. Therefore, the webpage classification dictionary with higher accuracy is generated.

Description

technical field [0001] The invention relates to the technical field of Internet search, in particular to a method and device for generating a web page classification dictionary. Background technique [0002] The number of websites in the Internet is extremely large, and there are various types of websites, such as news, sports, shopping and so on. Faced with various websites, enterprises or organizations hope that internal staff only visit work-related websites. How to filter the websites that internal staff can access is a very urgent and important requirement. Then at this time, it is necessary to classify each website, and filter the website according to the category to which the website belongs, so as to filter out websites that are not allowed to be accessed. [0003] In the face of a large number of websites on the Internet, URLs (Uniform Resource Locators, Uniform Resource Locators) corresponding to the websites can be classified. When setting reasonable classificati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/9535G06F16/9566G06F40/289
Inventor 张惊申
Owner NEW H3C TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products