JSAX (joint simple API (application program interface) for XML (extensible markup language)) parser and parsing method based on syntactic analysis of backtracking automaton

A backtracking automaton and parser technology, applied in the web field, can solve the problem of low efficiency of parser, and achieve the effect of efficient parsing, improving efficiency, and simplifying design and implementation.

Active Publication Date: 2012-10-03
XIDIAN UNIV
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The present invention mainly aims at problems such as the inefficiency of the syntax analyzer of the XML parser, which is difficult to realize, etc., and provides a new high-efficiency, tag-matching-string language capable of identifying nested structures and is easy to implement by improving the backtracking automaton. Backtracking Automata Parsing JSAX Parser and Parsing Methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • JSAX (joint simple API (application program interface) for XML (extensible markup language)) parser and parsing method based on syntactic analysis of backtracking automaton
  • JSAX (joint simple API (application program interface) for XML (extensible markup language)) parser and parsing method based on syntactic analysis of backtracking automaton
  • JSAX (joint simple API (application program interface) for XML (extensible markup language)) parser and parsing method based on syntactic analysis of backtracking automaton

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0084] The present invention is based on the JSAX parser of backtracking automaton syntax analysis, see figure 2 , including a lexical analyzer, a grammatical analyzer and an event handler. The lexical analyzer is responsible for reading the content of the XML document and outputting the read tokens to the grammatical analyzer. The grammatical analyzer identifies the language in the input token stream according to the requirements of the XML specification Structure, pass the corresponding event information to the event processor, the event processor accepts all the events reported by the parser, and processes the found data, and realizes the parsing of the XML document, where the syntax analyzer is constructed based on the automaton, automatic The structure of the backtracking automaton in the machine is a five-tuple, and the structure is M=(S, ∑, δ, q 0 , F), also includes a state stack to save part of the history of the operation, the syntax analyzer of the present inventio...

Embodiment 2

[0143] The JSAX parser and parsing method based on backtracking automaton syntax analysis are the same as in Embodiment 1, and the present invention will be described in detail from the perspective of the composition of the parser.

[0144] The JSAX parser based on the backtracking automaton syntax analysis of the present invention mainly includes several parts of a lexical analyzer, a syntax analyzer and an event processor.

[0145] Design and implementation of JSAX parser lexical analyzer based on backtracking automaton syntax analysis:

[0146] Due to the advantages of easy construction and high analysis efficiency of FA, FA is widely used in the design of lexical analyzers. The JSAX parser of the present invention is a SAX interface-based XML document parser implemented in Java, and the JSAX parser also performs lexical analysis by constructing FA.

[0147] Refer to attached Figure 4 , the lexical analyzer is responsible for reading the content of the XML document. The ...

Embodiment 3

[0185] The constitution and grammatical rules of the JSAX parser based on backtracking automata grammatical analysis are the same as in Embodiment 1-2, and the JSAX parsing method based on backtracking automata grammatical analysis is the same as in Embodiment 1-2.

[0186] The specific improvements to the backtracking automaton will be described in detail in conjunction with the accompanying drawings.

[0187] Syntax analyzer of the present invention is based on backtracking automata, and backtracking automaton is defined as: a definite backtracking automaton DTA is made up of five tuples, M=(S, ∑, δ, q 0 , F), where,

[0188] M represents the constructed backtracking automaton;

[0189] S={S 0 , S 1 ,...,S n} is a non-empty set of states;

[0190] ∑ is the input character set;

[0191] q 0 ∈S is the initial state;

[0192] is a non-empty set of terminal states;

[0193] δ is a mapping on S×∑→S∪{trace}.

[0194] The backtracking automaton is composed of an input t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a JSAX (joint simple API (application program interface) for XML (extensible markup language)) parser and a parsing method based on syntactic analysis of a backtracking automaton. An action transfer rule delta of the backtracking automaton is redefined and the improved backtracking automaton is applied to a syntactic parser, so that the design and the implementation of the syntactic parser are simplified, and the efficiency of the XML parser is effectively improved. During the syntactic parse, the backtracking automaton uses a mark flow provided by a lexer parser as the input and compresses the current state to the stack top when the mark read by the backtracking automaton is a start mark; when the read mark is an end mark, the automaton pops a state out of the stack top as the next state of the automaton; and the automaton does not carry out any stack operation for other marks. During the syntactic parse, the XML document information meeting the syntactic specifications returns to the user through a standard callback function. The JSAX parser and the parsing method solve the problems that the syntactic parser of the XML document parser is complex in structure and low in performance, have the characteristics of easy implementation and high efficiency, and can be applied to parsing the XML documents.

Description

technical field [0001] The invention belongs to the field of Web technologies, and mainly relates to the parsing technology of Extensible Markup Language XML (eXtensible Markup Language) documents, in particular to the XML document parsing technology based on Simple Application Programming Interface SAX (Simple API for XML), specifically a backtracking-based The JSAX parser and parsing method for automata syntax analysis can be applied to parsing XML documents. Background technique [0002] In recent years, XML has been widely used in data transmission and exchange, data integration, document storage and other fields under the Web environment due to its advantages of simple application and flexible use. Both the SOAP protocol and WSDL are based on XML. In addition, XML has many applications in many fields such as mathematics, chemistry, and physics, such as the chemical markup language CML used to describe molecular information in the chemical field. XML document parser pla...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 段振华张柯柯王小兵田聪
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products