Unlock instant, AI-driven research and patent intelligence for your innovation.

A web page clustering method and device

A clustering method and clustering technology, applied in the computer field, can solve problems such as inability to cluster web pages

Active Publication Date: 2019-11-19
NSFOCUS INFORMATION TECHNOLOGY CO LTD +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a webpage clustering method and device to solve the problem in the prior art that webpages cannot be clustered according to the webpage frame

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A web page clustering method and device
  • A web page clustering method and device
  • A web page clustering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0064] figure 1 A schematic flow chart of a web page clustering method provided by an embodiment of the present invention, as shown in figure 1 shown, including the following steps:

[0065] S101: Obtain URLs of multiple pages to be clustered;

[0066] S102: For each URL of the page to be clustered, determine a rewriting rule for the URL and perform URL classification according to the rewriting rule for the URL;

[0067] S103: For ea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage clustering method and device, which are used to implement clustering of webpages according to the webpage framework, comprising: obtaining uniform resource locator URLs of multiple pages to be clustered; URLs for each page to be clustered , determine the rewriting rules of the URL and perform URL classification according to the rewriting rules of the URL; for each URL classification, determine the page frame of the page corresponding to each URL in the URL classification, and The page frame of determines whether each URL can be clustered; if each URL can be clustered, the URL category is retained. Through the above method, webpages with the same page frame structure can be clustered into one group, thus overcoming the problem that the existing clustering methods cannot cluster according to the webpage frame, and providing a method that is more suitable for the processing process involving the page frame clustering method.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a web page clustering method and device. Background technique [0002] The existing web page clustering methods are mainly based on the clustering of web page text features, that is, to extract key content or words with high frequency of occurrence as feature values ​​for clustering, so that pages with similar content are clustered into one class. The method is only suitable for text processing such as text retrieval and can obviously improve processing efficiency. [0003] However, in addition to text features, web pages also have the characteristics of hyperlinks, semi-structured content, large scale, and heterogeneous formats. For the same type of web pages with the same web page framework, their text features may be different. Clustering based on the text features of web pages cannot group web pages with the same frame into one class. Therefore, the method of clust...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9535G06F16/955G06F16/906
CPCG06F16/9535G06F16/9566G06F16/951G06F16/986
Inventor 郭洋洋刘少彬李菲李虎刘丽君
Owner NSFOCUS INFORMATION TECHNOLOGY CO LTD