Webpage clustering method based on node property label propagation

A clustering method and attribute labeling technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as time-consuming, high time complexity, and the inability to apply large-scale networks well. , to achieve the effect of fast clustering, less information and high quality

Inactive Publication Date: 2012-11-07
HARBIN ENG UNIV
View PDF2 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the continuous development of Internet technology and the continuous increase of data scale, the above-mentioned algorithms generally have high time complexity, and the mining of all cluster structures in the network will take a lot of time, which is not very good for large-scale networks. Be applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage clustering method based on node property label propagation
  • Webpage clustering method based on node property label propagation
  • Webpage clustering method based on node property label propagation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention is described in more detail below in conjunction with accompanying drawing example:

[0038] 1) Construct a topology graph model based on the link relationship between web pages

[0039] In order to analyze and study the network, it is first necessary to describe it through a suitable mathematical model, and there is a very close relationship between graph theory and the network. The specific method of converting the network into a graphical model G(V,E) is as follows:

[0040] Map the web pages in the network to nodes in the graph model, and use V to represent the set of all nodes {v 1 ,v 2 ,...,v N};

[0041] The links between web pages are mapped to the edges between nodes in the graph model, and E is used to represent the connection between node pairs.

[0042] 2) Construct attribute vectors for each node in the graph model

[0043] The webpages in the network have their own attribute information, and the attribute vector is construct...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage clustering method based on node property label propagation, which comprises the following steps: structuring a topological structure diagram model according to the link relationship of the webpage; building property feature vector for each node in the diagram model; initializing node labels in the network; iteratively updating the node labels according to the topological structure and the node property; and defining the condition of stopping updating the node labels. With the method, effective and high-quality webpage clustering can be realized within the nearly linear time only by extracting the topological relationship of the internet and the property information marking webpage characteristics without knowing priori knowledge such as the quantity and the scale of webpage groups and the like or predefining parameters such as the critical value and the like. The algorithm is simple in concept, easy to understand and realize and lower in time complexity, and can generate high-quality network clustering result, so as to apply to large-scale internet.

Description

technical field [0001] The invention relates to a web page clustering method. Background technique [0002] With the continuous in-depth exploration of data mining theory, complex network analysis has attracted the attention of researchers in many fields. Complex network clustering has become the focus of research interests of some domestic and foreign scholars. Many scholars have devoted themselves to the study of clustering techniques in complex networks, and they discovered network cluster structures by using different methods. [0003] Nodes in the network may have different levels of organizational structure, such as a large cluster structure may contain small cluster structures. Hierarchical clustering method is a traditional method of clustering, including top-down split hierarchical clustering and bottom-up agglomerative hierarchical clustering. The representative algorithms are the article Community in PNAS.2002,99(12). The GN algorithm published in structure in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张乐君夏磊张健沛杨静国林
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products