XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
A technology of adjacency matrix and document structure, applied in the field of data mining, can solve problems such as differences without considering contributions, high time complexity, etc.
Inactive Publication Date: 2010-08-11
NANKAI UNIV
View PDF1 Cites 8 Cited by
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
The advantage of this method is that it can well show how many nodes are different between different documents, but it does not consider the difference in the contribution of different layers of nodes to documents, and the time complexity is too high, O(n 3 )
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
Experimental program
Comparison scheme
Effect test
Embodiment
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More PUM
Login to View More
Abstract
The invention discloses an XML (Extensible Markup Language) document structure based on an extended adjacent matrix and a semantic similarity calculation method, belonging to the technical field of data excavation. The method concretely comprises the following steps of: encoding an XML document tree; as for two encoded documents, generating a schema document node list and a data source document node list firstly and then generating a schema extended adjacent matrix and a data source extended adjacent matrix (P1, P2); and calculating the similarity of XML documents through cos (P1, P2). In the method, different contributions of nodes with different levels to the documents are fully considered, and the highest time complexity of the method is O (n2) under the condition that the amount of the XML document node is n and is prior to that of an edit distance algorithm.
Description
XML Document Structure and Semantic Similarity Calculation Method Based on Extended Adjacency Matrix 【Technical field】 The invention belongs to the technical field of data mining, and in particular relates to a reasonable and effective XML document similarity calculation method. 【Background technique】 As a markup language, XML has become a relevant standard for data expression and data exchange on the Internet, especially in electronic commerce. Under the condition of continuous expansion of network data, XML data, which is one of the network data standards, is also growing rapidly. How can we find the data we need in these massive XML data and even dig out some hidden information that we have never known? It has become an important research direction of data mining. In this research direction, how to quantify the similarity of two XML documents is a key. XML can not only describe structured data, but also has the ability to describe semi-structured data. At present, m...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More Application Information
Patent Timeline
Login to View More
IPC IPC(8): G06F17/30
Inventor 卫金茂张学良袁晓洁刘伟杨汀
Owner NANKAI UNIV
Who we serve
- R&D Engineer
- R&D Manager
- IP Professional
Why Patsnap Eureka
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com