Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for analyzing XML file by indexing

A document and index table technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of rapid parsing of large-scale XML documents that are not applicable, large consumption of computing resources, lack of random access capabilities of XML documents, and Problems such as online modification ability, to achieve the effect of saving memory space, speeding up retrieval speed, and good performance

Inactive Publication Date: 2010-07-14
NORTHWESTERN POLYTECHNICAL UNIV
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the DOM method can establish the complete structure of XML documents and has random access capabilities, it consumes a lot of computing resources and is not suitable for fast parsing of large-scale XML documents; although the SAX method consumes less resources, it can use higher Efficiently parse large-scale XML documents, but it does not have random access and online modification capabilities for XML documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for analyzing XML file by indexing
  • Method for analyzing XML file by indexing
  • Method for analyzing XML file by indexing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Now in conjunction with embodiment, accompanying drawing, the present invention will be further described:

[0035] Taking the BookSet.xml document as an example, the implementation process of the IXP parsing method is explained. The DTD format of the BookSet.xml document conforms to the definition in the following table:

[0036]

[0037] IXP parses through the XML data and splits the entire document into many element subtrees with the specified element as the root. In this DTD description document, it can be found that its subtree element is "Book", marking all start tags ( ) and closing tags ( ) form the element subtree index table. If the subtree containing the target element can be quickly located, the retrieval speed can be greatly accelerated. Create an index with whatever tags you want, elements with unique values ​​are recommended. In this example, use the ISBN as the key tag and record " "and" "The text between" is used as the index value. After the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for analyzing an XML file by indexing. The technical characteristics are as follows: the method comprises the steps: traversing to illustrate a DTD file with an XML file structure and extracting a subtree tag name under a root node in the DTD file; then creating a Hash table, traversing the XML file needing to be analyzed according to the extracted subtree tag name, inquiring and recording relative positions of starting of all subtree tag names in the XML file, constructing a new item according to data items, and adding the new item into the Hash table to form a subtree index table; and creating a key element index table, and then utilizing an unvalidated IXP analytic model or validated IXP analytic model to carry out analysis. The method has the benefits that: for the large XML file, the analytic speed of an IXP method is far faster than that of a DOM method and an SAX method. By providing a general interface, the mode can be widely applied in analysis of various XML files, and provides a new method for analyzing an XML text.

Description

technical field [0001] The invention relates to a method for parsing an XML document through an index, and belongs to the field of XML information processing. Background technique [0002] The XML (eXtensible Markup Language) language proposed by the W3C organization has become an information organization and description format commonly used in Web2.0 applications and even various information processing systems due to its flexibility and self-description, and has been more and more widely used. . At present, XML parsers that are widely used include: IBM's XML4J, Microsoft's MSXML, Oracle's XML Parser, Sun's JavaTM, JDOM, etc. The parsing methods adopted by these parsers can be divided into two categories: DOM and SAX. [0003] DOM is a tree-based parsing method proposed by W3C. When DOM parses an XML file, it regards the elements, attributes, comments, and processing instructions in the document as some nodes of the tree structure, and organizes the content of the XML doc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨刚周兴社张海辉詹涛
Owner NORTHWESTERN POLYTECHNICAL UNIV
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More