Digitization method and system for XML document

A document and number technology, applied in electronic digital data processing, natural language data processing, instruments, etc., can solve the problems of attaching importance to XML structural features, ignoring semantic features, and unable to accurately reflect XML document information, and achieving similarity High detection sensitivity, efficient and fast processing, convenient document classification and application processing

Pending Publication Date: 2021-02-12
NANJING INST OF TECH
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the current XML document digitization method, there are the following problems: first, the digitization result is simple and rough, and cannot accurately reflect the XML document information; second, the representation method is complex and the conversion efficiency is low; while ignoring its semantic features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Digitization method and system for XML document
  • Digitization method and system for XML document
  • Digitization method and system for XML document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention is described in further detail now in conjunction with accompanying drawing.

[0040] It should be noted that terms such as "upper", "lower", "left", "right", "front", and "rear" quoted in the invention are only for clarity of description, not for Limiting the practicable scope of the present invention, and the change or adjustment of the relative relationship shall also be regarded as the practicable scope of the present invention without substantive changes in the technical content.

[0041] combine figure 1 , the present invention mentions a kind of digitization method of XML document, is suitable for comparing similarity between XML documents, and described digitization method comprises the following steps:

[0042] S1, extract the trunk structure tree:

[0043] Preprocess the imported XML document, find out the main structure tree, remove redundant nodes, and realize that the same path appears only once in the main structure tree.

[0044] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a digitization method of XML (Extensive Markup Language) documents, which is suitable for similarity comparison between the XML documents and comprises the following steps: S1,extracting a trunk structure tree; s2, filling pseudo nodes, and unifying a tree structure; and S3, extracting a full path and generating a tuple string stage. According to the method, the digital processing of the XML document can be realized through three steps of extracting the trunk structure tree, unifying the structure tree type and converting the tuple string in combination with the structural features and semantic features of the XML document, the processing process is efficient and rapid, the digital result has the characteristics of high similarity detection sensitivity and the like, and the method is suitable for popularization and application. According to the method, massive XML documents can be digitally expressed in a complex network environment, so that the XML documents are simplified. The subsequent document classification and application processing are facilitated.

Description

technical field [0001] The present invention relates to the technical field of digital processing of XML documents, in particular to a digital method and system for XML documents. Background technique [0002] With the rapid development of the network, a large amount of semi-structured data stored in the form of XML has been generated on the Internet. These data treasures accumulated in different fields have unlimited potential and great value. As a representative of semi-structured data, XML documents are used by more and more enterprises and institutions because of its platform independence, convenient data processing and flexible Web applications. Therefore, in the face of huge XML data, the digital representation of its documents is the basis for data analysis, classification and various data processing, and its quality directly affects various subsequent operations. For example, the patent No. CN108984713A discloses an XML file processing method and device. By splittin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/154
CPCG06F40/154
Inventor 吴海涛郭丽红杨洁
Owner NANJING INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products