Supercharge Your Innovation With Domain-Expert AI Agents!

Automatic generation method for wrapper on the basis of DOM (Document Object Model) tree abstraction

A DOM tree, automatically generated technology, applied in the field of cloud computing, to achieve the effect of good accuracy and time performance, reducing the utilization of storage resources

Active Publication Date: 2018-04-20
FUZHOU UNIV
View PDF10 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Traditional wrappers are defined manually, and different wrappers need to be made for different types of pages, so the maintenance of wrappers is a big overhead. Once the original page style changes, the original wrappers need to be redefined

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic generation method for wrapper on the basis of DOM (Document Object Model) tree abstraction
  • Automatic generation method for wrapper on the basis of DOM (Document Object Model) tree abstraction
  • Automatic generation method for wrapper on the basis of DOM (Document Object Model) tree abstraction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The technical solution of the present invention will be specifically described below in conjunction with the accompanying drawings.

[0036] A kind of wrapper automatic generation method based on DOM tree abstraction of the present invention comprises the following steps,

[0037] Step S1, wrapper generation phase:

[0038] Step S11, the user inputs a web page set, removes impurity information in the source code through web page preprocessing, and parses it into a DOM tree to obtain a DOM tree set;

[0039] Step S12, merging the DOM trees, traversing the DOM trees to merge child nodes with the same label, and marking each node with path features, and finally converting the DOM tree set into a merged tree set;

[0040] Step S13, perform an abstract operation on the merged tree set to obtain an abstract tree, and store the abstract tree in the database;

[0041] Step S14: Determine the path characteristics of the structured data in the merged tree according to the confi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an automatic generation method for wrapper on the basis of DOM (Document Object Model) tree abstraction. The method comprises a wrapper generation stage and a structural dataabstraction stage. The wrapper generation stage comprises the following steps that: a user inputs a webpage set to obtain a DOM tree set; then, combining DOM trees, traversing the DOM trees to combinethe subnodes with the same tag, carrying out path feature annotation on each node, and finally, converting the DOM tree set into a combination tree set; carrying out abstraction operation on the combination tree set to obtain an abstraction tree, and storing the abstraction tree into a database; and according to a configuration document, determining the path feature of the structural data in thecombination tree, and writing corresponding processed path features into a document to generate the wrapper. The structural data abstraction stage comprises the following steps that: analyzing an extracted target webpage into the DOM tree, matching with the abstraction tree to determine whether the target webpage is a type webpage corresponding to the wrapper or not; and reading a path in the configuration document to extract the target DOM tree. By use of the method, the wrapper can be automatically generated, and in addition, the method exhibits good accuracy and time performance.

Description

technical field [0001] The invention belongs to the field of cloud computing, and in particular relates to a method for automatically generating wrappers based on DOM tree abstraction. Background technique [0002] Traditional wrappers are defined manually, and different wrappers need to be made for different types of pages, so the maintenance of wrappers is a big overhead. Once the original page style changes, the original wrappers need to be redefined . Therefore, the current mainstream research trend is the automatic generation of wrappers, and this application proposes a feasible technology for automatic generation of wrappers based on DOM tree abstraction. This technology mainly consists of two parts: first, the DOM tree abstraction of the target type webpage; second, the feature acquisition and positioning of the target node. By using this technology, wrappers can be automatically generated for various types of web pages. Experiments are carried out on 5 websites, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/986
Inventor 陈星张佳俊王一洲
Owner FUZHOU UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More