XML document compressing method based on file difference

A compression method and document technology, which is applied in the database field to achieve good compression effect and reduce system overhead

Inactive Publication Date: 2009-02-11
FUDAN UNIV
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This feature makes XMill very unsuitable for compressing a large num

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML document compressing method based on file difference
  • XML document compressing method based on file difference
  • XML document compressing method based on file difference

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] About the XDrill compression system framework:

[0032] figure 2 It is the frame structure diagram of the XDrill compression system. The XDrill compression system is mainly composed of two parts, one part is the SAX parser part, which is mainly used to read the original file and generate the corresponding reference file and target file by calling the segmentation module. The other part is the compressor module (Compressor), which calls the underlying zdelta compressor to compress the original file.

[0033] The XML segment refers to the XML fragment generated after the module is segmented.

[0034] About XDrill compression algorithm and process:

[0035] The process of XDrill compression algorithm is shown in Table 1. The XDrill compression system needs to maintain six system caches, among which refl_structure and ref2_structure are used to store the structural fragment information of two reference files. refl_structure, ref2_contents are used to store the content...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the database technical field and particularly provides a novel XML file compression algorithm, which comprises the following steps: a. dividing an XML file into 64k document fragments; b. calculating differences among document fragments; and c. compressing the differences among document fragments. The decompression algorithm comprises the sequentially opposite steps to the compression algorithm. The XML file compression algorithm is an efficient XML file compression algorithm based on file difference calculation. By dividing an XML document tree, XDrill excavates redundant information inside a document and among documents, thereby achieving a good compression effect. Compared with the traditional XML compression algorithm, XDrill has the advantages of lower intensity of document operation and more flexible use.

Description

technical field [0001] The invention belongs to the technical field of databases, and in particular relates to a novel and efficient XML document compression algorithm, the XDrill compression algorithm. XDrill divides the XML document to mine the redundant information inside the document and between documents, and obtains a good compression effect. Background technique [0002] As a self-describing markup language, XML has been widely used in various fields, such as the exchange of electronic documents, electronic medical records in hospitals, etc. XML is widely used to describe semi-structured data. In order to support the self-describing feature of XML, there are a large number of tags in XML documents to distinguish the structure of document content. This storage structure with both structure and content not only facilitates document query and machine interaction, but also brings a lot of information redundancy. In some resource-constrained systems, the redundancy probl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 周傲英耿志华王晓玲
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products