XML (Extensive Makeup Language) structural similarity measuring method based on frequency-associated tag sequence
A technology of structural similarity and tag sequence, which is applied in the field of measuring the structural similarity of XML documents, can solve the problems of loss of correct rate and inaccurate structural similarity, and achieve the effect of accurate similarity and improved accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0044] For a given set of XML documents C, the specific flow of the present invention to calculate the similarity between any two documents is as follows: figure 2 shown, including the following steps:
[0045] 1. Preprocess the document set to obtain the tag sequence database TSDB. Processing flow such as image 3 As shown, in the parsing process, the same path of the same XML document only appears once in TSDB. In the figure, d_TS represents the set of tag sequences contained in document d, and d.id represents the identity of document d.
[0046] The label sequence refers to an ordered list composed of multiple labels in the label set. The order of tags is the order of paths from the root node to the leaf nodes in the tag tree corresponding to the XML document. The tag sequence α can be formally expressed as: 1 , a 2 , L, a n >, where a i is a label in the label set, the number of labels contained in it is called the length of the label sequence, and the label sequenc...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com