XML (extensible markup language) document spectrum clustering method based on affinity propagation

A technology of neighbor propagation and spectral clustering, which is applied in the field of Web data management and can solve problems such as low retrieval accuracy.

Inactive Publication Date: 2012-11-28
NORTH CHINA ELECTRIC POWER UNIV (BAODING)
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the deficiency of low retrieval precision in XML retrieval in Web data management mentioned in the above background technology, the present invention proposes a clustering method of XML document spectrum based on neighbor propagation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML (extensible markup language) document spectrum clustering method based on affinity propagation
  • XML (extensible markup language) document spectrum clustering method based on affinity propagation
  • XML (extensible markup language) document spectrum clustering method based on affinity propagation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Attached below figure 1 , to describe the preferred embodiment in detail. It should be emphasized that the following description is only exemplary and not intended to limit the scope of the invention and its application.

[0027] 1. Extract the XML path, process the element tags in it, use the prototype representation of the word uniformly, and remove the path contained in other paths, and then represent the XML document with the feature vector formed by the XML path.

[0028] If the XML document set has the following information:

[0029]

[0030]

[0031]

[0032]

[0033] Extract the node labels from the root node to the leaf node in the above XML document to form a path, and get the following path:

[0034] P1=persons / person / name

[0035] P2=persons / person / books / book

[0036] P3=persons / person / papers / paper

[0037] P4=persons / person / courses / course

[0038] P5=persons / person / articles / article / title

[0039] P6=persons / person / articles / article / time

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an XML (extensible markup language) document spectrum clustering method based on affinity propagation in the technical field of Web data management. The method disclosed by the invention comprises the following steps: representing an XML document by virtue of a characteristic vector formed by an XML path; then calculating the initial similarity between every two XML document vectors to obtain an initial similarity matrix W, and further determining an initial affinity relation matrix N; then correcting the similarity between every two implied similar XML document vectors by adopting an affinity propagation algorithm, so as to obtain a final similarity matrix A; and finally determining the clustering number and clustering result by applying a first specified method according to the final similarity matrix A. According to the method disclosed by the invention, the initial similarity matrix between XML documents obtained by adopting the traditional similarity calculation method is corrected by adopting the affinity propagation algorithm, and similarity between the implied similar XML documents can be reflected; and the method disclosed by the invention is independent of sequence of the XML documents and is applicable to the clustering of XML document retrieval results which are arranged in any sequence.

Description

technical field [0001] The invention belongs to the technical field of Web data management, in particular to an XML document spectrum clustering method based on neighbor propagation. Background technique [0002] Due to the massive appearance and wide application of XML format data on the web, the demand for searching XML documents is becoming more and more urgent. For a large number of free XML documents on the Internet, keyword-based XML document search technology does not require users to learn and master complex query languages, nor does it require users to understand XML schemas, so it is suitable for the retrieval needs of ordinary users. However, due to the phenomenon of synonyms and polysemous words in the element tags and element contents in the XML document, there will be a large number of documents that are not related to the user's retrieval semantic requirements in the retrieval result set based on the XML keyword search, the retrieval result accuracy is low, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李新叶
Owner NORTH CHINA ELECTRIC POWER UNIV (BAODING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products