XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method

A technology of adjacency matrix and document structure, applied in the field of data mining, can solve problems such as differences without considering contributions, high time complexity, etc.

Inactive Publication Date: 2010-08-11
NANKAI UNIV
View PDF1 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of this method is that it can well show how many nodes are different between different documents, but it does not co

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
  • XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
  • XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an XML (Extensible Markup Language) document structure based on an extended adjacent matrix and a semantic similarity calculation method, belonging to the technical field of data excavation. The method concretely comprises the following steps of: encoding an XML document tree; as for two encoded documents, generating a schema document node list and a data source document node list firstly and then generating a schema extended adjacent matrix and a data source extended adjacent matrix (P1, P2); and calculating the similarity of XML documents through cos (P1, P2). In the method, different contributions of nodes with different levels to the documents are fully considered, and the highest time complexity of the method is O (n2) under the condition that the amount of the XML document node is n and is prior to that of an edit distance algorithm.

Description

XML Document Structure and Semantic Similarity Calculation Method Based on Extended Adjacency Matrix 【Technical field】 The invention belongs to the technical field of data mining, and in particular relates to a reasonable and effective XML document similarity calculation method. 【Background technique】 As a markup language, XML has become a relevant standard for data expression and data exchange on the Internet, especially in electronic commerce. Under the condition of continuous expansion of network data, XML data, which is one of the network data standards, is also growing rapidly. How can we find the data we need in these massive XML data and even dig out some hidden information that we have never known? It has become an important research direction of data mining. In this research direction, how to quantify the similarity of two XML documents is a key. XML can not only describe structured data, but also has the ability to describe semi-structured data. At present, m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 卫金茂张学良袁晓洁刘伟杨汀
Owner NANKAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products