Query workload estimation-based extensible markup language (XML) fragmentation method

A workload and XML tree technology, applied in the field of efficient XML sharding, can solve problems such as large data overhead, achieve good load balancing, improve acceleration ratio and scaling ratio indicators, and improve the effect of load balancing when querying

Inactive Publication Date: 2012-01-18
BEIHANG UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the cluster scales, it will be expensive to

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Query workload estimation-based extensible markup language (XML) fragmentation method
  • Query workload estimation-based extensible markup language (XML) fragmentation method
  • Query workload estimation-based extensible markup language (XML) fragmentation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0030] First, the principle of the method of the present invention will be described.

[0031] The research shows that in the XML fragmentation method considering query load balancing, the estimated value of the query workload will greatly affect the XML fragmentation results, and then affect the performance of the entire parallel system. To estimate the query workload using only XML structure, it is necessary to explore the relationship between XML structure and XML query. The XPath language is most commonly used to select and locate nodes in XML queries. For complex XPath expressions, such as "a / b / / c[attr="XX"]")", the general XML query engine will split it into multiple sub-query steps, such as "a / b", "b / / c", "c[attr]" and "attr="XX"". Then calculate the result of each subquery step, and finally combine these results as the query result of the orig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a query workload estimation-based extensible markup language (XML) fragmentation method, which comprises the following steps of: (1) coding each node in an XML tree by adopting an interval code Zhang coding rule; (2) generating a related XPath query step for each node, and adding the XPath query steps into an XPath queue; (3) recursively estimating the query workload of each node from the root node of the XML document tree by adopting a depth priority traversing sequence; (4) dividing the XML document tree into sub trees, the query workload of which is W0, according tothe query workload estimation result; and (5) sequencing the separated XML fragments according to the query workload estimation values, and distributing the XML fragments to the processing nodes in a'double-square' mode. By the method, the query workload estimation is performed only by using an XML document structure, query of a user does not need to be known, and the query workload estimation values are used as XML storage measurement, so that better query load balance and extensibility support are achieved.

Description

technical field [0001] The present invention relates to a high-efficiency XML fragmentation method for distributed storage and parallel query of massive XML, in particular to a query workload estimation based on the structure of XML itself under the premise of unknown user query to achieve better query load balance sharding method. Background technique [0002] As an extensible markup language, eXtensible Markup Language (XML) has the advantages of extensibility, self-description and self-compatibility, and has become the standard for data representation, storage and exchange on the Internet. Therefore, the generation of massive XML data makes the effective storage and management of XML a new problem. Parallel XML processing is an effective solution, and data fragmentation is the most critical factor affecting the overall performance of parallel systems. [0003] Query load balancing is an important factor affecting the efficiency of parallel query. In the previous researc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/08G06F17/30
Inventor 张静郎波段亚伟牛虹婷李未
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products