Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web page classification method, web page classification device and network equipment

A web page classification and web page technology, applied in the field of network communication, can solve the problems of inaccurate word segmentation, low information extraction accuracy, low web page classification accuracy, etc., and achieve the effect of improving accuracy

Inactive Publication Date: 2012-12-12
BEIJING XINWANG RUIJIE NETWORK TECH CO LTD
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, commonly used web page information extraction methods, such as methods based on the Document Object Model (Document Object Model, DOM) tree, all have the defect of low information extraction accuracy, while commonly used word segmentation methods, such as string matching word segmentation, Methods such as understanding word segmentation and statistical word segmentation also have the defect of inaccurate word segmentation, which makes the accuracy of web page classification low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page classification method, web page classification device and network equipment
  • Web page classification method, web page classification device and network equipment
  • Web page classification method, web page classification device and network equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] figure 1 It is a flowchart of a method for classifying webpages provided by an embodiment of the present invention. The execution subject of this embodiment is a web page classification device. Such as figure 1 As shown, the method of the present embodiment includes:

[0019] Step 101, extract information of different classification weight levels from the source file of the webpage.

[0020] A web page (Web page in English) is a file that is stored on a computer somewhere in the world, and this computer is connected to the Internet. Different webpages can be identified and accessed through URLs, such as uniform resource locators (Uniform / Universal Resource Locator, referred to as URL). For example, when a user inputs a web address in the browser of the terminal device used, the web page corresponding to the web address is sent to the user's terminal device, and the user can browse the web page through the browser on the terminal device. A web page usually uses a Hy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a web page classification method, a web page classification device and network equipment. The method comprises the following steps of: extracting information of different classification weight levels in a source file of a web page; performing word segmentation processing on information of each classification weight level to acquire segmented words of each classification weight level; and performing classification processing on the web page by using the segmented words of each classification weight level according to a sequence of the classification weight level from high to low. According to the technical scheme provided by the invention, classification processing is performed on the web page by preferably using the information with higher classification weight level by using the characteristic that the more important information in the web page has higher influence on a web page classification result, so that the influence of invalid information on web page classification in the web page is favorably reduced, and further the accuracy of web page classification is favorably improved.

Description

technical field [0001] The invention relates to network communication technology, in particular to a web page classification method, device and network equipment. Background technique [0002] With the rapid development of the Internet and the rapid increase in the amount of web page data, people have entered an era of rich information. In the face of messy webpage information resources, people need to classify and organize massive webpage information, so that they can quickly search for desired useful information. Automatic classification of web pages provides the key technology to process and organize large-scale web pages, and is an important method to organize information resources reasonably and effectively. The accuracy of web page classification depends largely on the extraction of web page information. [0003] The existing webpage classification process includes: extracting webpage information from webpage source files (also called denoising processing on webpage s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 王祖海
Owner BEIJING XINWANG RUIJIE NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products