Unlock instant, AI-driven research and patent intelligence for your innovation.

XML file classification method and system

A technology of file classification and file classification, which is applied in the direction of instruments, calculations, electrical digital data processing, etc., to achieve the effect of improving the classification effect

Inactive Publication Date: 2015-01-14
PEKING UNIV +2
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] For the above description of structured file classification, methods based solely on file modeling, edit distance, and frequent subitems cannot perform automatic classification well

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML file classification method and system
  • XML file classification method and system
  • XML file classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention will be described in further detail below with reference to the accompanying drawings and examples.

[0032] Aiming at the problems in the prior art, the embodiment of the present invention provides a method and device for classifying XML files, which can automatically classify XML files for large-scale XML files (usually referring to more than 100,000 XML files), and improve Classification efficiency and classification effect.

[0033] like figure 1 Shown, is the flow chart of the XML file classification method of the embodiment of the present invention, comprises the following steps:

[0034] Step 101, preprocessing the training XML files in the training corpus set.

[0035] The preprocessing of training XML files mainly includes: extracting link information, compressing file tree, filtering file features, calculating file feature values, etc.

[0036] Specifically, the link information in the training XML file can be extracted (the link infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an XML file classification method and system. The method comprises the steps that a training XML file in a training corpus set is preprocessed, and preprocessing comprises link information extraction, file tree compression, file characteristic screening and file characteristic value calculation; a closed frequent subtree in the processed training corpus set is extracted; an SLVM file vector model based on the closed frequent subtree and an SLVM file vector model based on link information are built; XML files to be tested are classified based on the SLVM file vector model through an SVM algorithm. Through the XML file classification method and system, the XML files can be automatically classified, and the classification effect is improved.

Description

technical field [0001] The invention relates to the technical field of digital publishing, in particular to a method and system for classifying XML files. Background technique [0002] At present, the Internet has formed a huge data warehouse composed of data in XML format, which contains a wealth of information. Therefore, mining XML documents has become one of the best ways to quickly and effectively obtain information from the Internet. [0003] XML (Extensible Markup Language) files are semi-structured files, which use a tree-like nested structure to store content information. This tree-like structure is sometimes too complex for classic data mining algorithms. [0004] Therefore, according to the data characteristics of XML files, the prior art adopts a method of classifying XML files to simplify the complexity of data mining algorithms. At present, there are mainly the following related technologies: [0005] 1. First model the XML file, and then use the XML file mod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 王松林杨建武洪毅虹
Owner PEKING UNIV