A language for semi-structured documents,
XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and
document processing. However, the
processing of
XML documents has a reputation for poor performance, and a number of optimizations have been developed to address this performance problem from different perspectives, none of which have been entirely satisfactory. Parallel
XML parsing leverages the growing
prevalence of multicore architectures in all sectors of the computer market, and yields significant performance improvements. The design consists of an initial preparsing phase to determine the structure of the XML document (or other data document), followed by a full, parallel parse. The results of the preparsing phase are used to help partition the XML document for data
parallel processing. The parallel
parsing phase is, for example, a modification of the libxml2 XML parser, which demonstrates that the approach applies to real-world,
production quality parsers. Empirical study shows the parallel XML
parsing algorithm can improve the XML parsing performance significantly and scales well.