Unlock instant, AI-driven research and patent intelligence for your innovation.

Web page clustering method and device

A clustering method and clustering technology, applied in the computer field, can solve problems such as inability to cluster web pages

Active Publication Date: 2017-05-24
NSFOCUS INFORMATION TECHNOLOGY CO LTD +1
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a webpage clustering method and device to solve the problem in the prior art that webpages cannot be clustered according to the webpage frame

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page clustering method and device
  • Web page clustering method and device
  • Web page clustering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0064] figure 1 A schematic flow chart of a web page clustering method provided by an embodiment of the present invention, as shown in figure 1 shown, including the following steps:

[0065] S101: Obtain URLs of multiple pages to be clustered;

[0066] S102: For each URL of the page to be clustered, determine a rewriting rule for the URL and perform URL classification according to the rewriting rule for the URL;

[0067] S103: For ea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web page clustering method and device. The method is used for clustering web pages according to a web page framework and includes the steps: acquiring URLs (uniform resource locators) of a plurality of web pages to be clustered; determining rewriting rules of the URL of each web pages to be clustered and classifying the URLs according to the rewriting rules of the URL; determining a web page framework of the web page corresponding to each URL in each URL class, and determining whether each URL can be clustered or not according to the web page framework of the web page corresponding to each URL; retaining the URL class if each URL can be clustered. By the method, the web pages with the same web page framework structure can be clustered into one class, the problem of an existing clustering method incapable of clustering the web pages according to the web page framework is solved, and the clustering method is more applicable to a web page framework processing procedure.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a web page clustering method and device. Background technique [0002] The existing web page clustering methods are mainly based on the clustering of web page text features, that is, to extract key content or words with high frequency of occurrence as feature values ​​for clustering, so that pages with similar content are clustered into one class. The method is only suitable for text processing such as text retrieval and can obviously improve processing efficiency. [0003] However, in addition to text features, web pages also have the characteristics of hyperlinks, semi-structured content, large scale, and heterogeneous formats. For the same type of web pages with the same web page framework, their text features may be different. Clustering based on the text features of web pages cannot group web pages with the same frame into one class. Therefore, the method of clust...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/9535G06F16/9566G06F16/951G06F16/986
Inventor 郭洋洋刘少彬李菲李虎刘丽君
Owner NSFOCUS INFORMATION TECHNOLOGY CO LTD