Supercharge Your Innovation With Domain-Expert AI Agents!

Methods and systems for compressing and decompressing extensible markup language (XML) data

A data compression and data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of micro data blocks without considering the irregular structure of XML tree, and the compression rate is not high, so as to improve the compression rate and improve the The effect of the compression effect

Active Publication Date: 2013-05-15
PEKING UNIV +2
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Although the above-mentioned compression methods can better compress some specific XML files, since these compression methods mainly deal with the regular parts of XML documents and most of them use the method of compressing data by node blocks, Without taking into account the irregular structure and the presence of micro-blocks (i.e., blocks of data whose size is small) in the corresponding XML tree of the XML document
Therefore, when the XML document structure is relatively complex, there may be a large number of micro-data blocks, and the existing compression methods often have low compression ratios for such XML documents.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods and systems for compressing and decompressing extensible markup language (XML) data
  • Methods and systems for compressing and decompressing extensible markup language (XML) data
  • Methods and systems for compressing and decompressing extensible markup language (XML) data

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0023] figure 1 is a flow chart of the XML data compression method according to the first embodiment of the present invention. refer to figure 1 , the XML data compression method includes the following steps:

[0024] Step S100, optimizing the XML schema to remove redundant structural information and indirect use between nodes, and storing the optimized schema;

[0025] Step S101, using the optimized schema to extract the structural information part of the XML data standardized by the schema;

[0026] Step S102, dividing the data part of the XML data into multiple data blocks according to the optimized schema node; and

[0027] Step S103 , respectively compressing the structure information part and the data block using a common compression method, and outputting the compression result to a file.

[0028] Specifically, in step S100, the XML schema can be optimized according to the following optimization principles:

[0029] 1. For a node connected to another node by means ...

no. 2 example

[0059] Figure 4 is a flow chart of the XML data compression method according to the second embodiment of the present invention. From Figure 4 and figure 1 It can be seen from the comparison that the difference between the second embodiment and the second embodiment is that in step S402, after the data part of the XML data is divided into multiple data blocks according to the optimized schema node, the Tiny chunks smaller than a given threshold are merged. Merging optimizes the storage of micro-blocks, further increasing the compression ratio.

[0060] Figure 5 is achieved Figure 4 Block diagram of the XML data compression system for the shown method. From Figure 5 and image 3 From a comparison, it can be seen that the difference is that a merging unit 507 is added, which is used for merging miniature data blocks smaller than a given threshold, and sending the merged miniature data blocks to the compression output unit 506 .

[0061] The decompression method and ...

no. 1 example

[0075] In the first instance, the XML schema is defined as follows:

[0076]

[0077]

[0078]

[0079]

[0080]

[0081]

[0082]

[0083]

[0084]

[0085]

[0086]

[0087]

[0088]

[0089]

[0090] In this schema, there is a sequence indicator containing a choice indicator and a sequence indicator without child nodes. In the choice indicator, its three child nodes e1, e2 and e3 are all optional nodes.

[0091] An example of XML data standardized by the above schema is as follows:

[0092]

[0093]

[0094] Mike

[0095] 2

[0096] Adam

[0097] 3

[0098]

[0099] For this XML data, the compression process of the present invention is as follows:

[0100] Step 1. Optimize the XML schema and store the optimized schema.

[0101] Figure 6 a is the original schema structure, Figure 6 b is the optimized schema structure. From Figure 6 a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for compressing extensible markup language (XML) data. The method comprises the following steps of: optimizing an XML schema to remove indirect use between redundant structure information and a node, and storing the optimized schema; extracting a structural information part of the XML data which takes the schema as a specification by using the optimized schema; dividing a data part of the XML data into a plurality of data blocks according to the node of the optimized schema; and respectively compressing the structural information part and the data blocks by using a general compression method, and outputting compression results to a file. Correspondingly, the invention provides a decompression method, a compression system and a decompression system of the XML data. In the invention, the minimum structural information is acquired by simplifying the XML schema, and a packet storage strategy of the data is improved, so that compression rate is improved. Furthermore, the storage of miniature data blocks is optimized, so that the compression rate is further improved.

Description

technical field [0001] The invention relates to the field of XML data processing, in particular to an XML data compression and decompression method and system. Background technique [0002] XML (Extensible Markup Language), as a cross-platform standard data exchange format, is widely used in web services, data exchange and storage, etc. It is a powerful tool for processing structured document information. Since the XML document contains a large number of repeated tags and structural information, there is a large amount of data redundancy in the XML document, so it needs to be compressed in many applications. Commonly used XML compression methods are XMILL, XMLPPM, XWRT and so on. [0003] The XMILL compression method first separates the XML structure information from the XML document through the syntax parser SAX, then reorganizes the XML document data items into different containers according to different semantics, and finally uses GZip to compress each container separate...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 仇睿恒胡薇
Owner PEKING UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More