Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Page data extraction method, device, storage medium and electronic equipment

A technology of page data and extraction method, applied in the computer field, can solve the problems of cumbersome process, error-prone configuration, high labor cost, and achieve the effect of simple operation

Active Publication Date: 2020-11-27
NEUSOFT CORP
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, for the current non-visual extraction method, the process is relatively cumbersome, and the feature value may appear in multiple positions on the same page, there is interference, and a secondary correction is required to find the required element position
For the current visual extraction method, although the user can directly locate the data to be extracted in the HTML page, the definition of the extraction action is relatively complicated, the configuration is prone to errors, and the operator needs to have strong HTML language skills, and the labor cost is high. ,low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Page data extraction method, device, storage medium and electronic equipment
  • Page data extraction method, device, storage medium and electronic equipment
  • Page data extraction method, device, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Specific embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

[0061] figure 1 is a flow chart of a method for extracting page data according to an exemplary embodiment, as shown in figure 1 As shown, the page data extraction method can be applied to electronic devices, including the following steps.

[0062] Step S11: receiving the attribute name and extraction mode of the extracted data input by the user, the extraction mode includes an exact extraction mode and an associative extraction mode;

[0063] Step S12: Determine element position features according to the extraction mode and the target webpage element selected by the user in the HTML page;

[0064] Step S13: Extracting the web page data in the HTML page acc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a page data extraction method and device, storage medium and electronic equipment, and the convenience of extracting the data from a HTML page can be improved. The method comprises the following steps: receiving an attribute name and an extraction mode for extracting data and input by a user, wherein the extraction mode comprises a precise extraction mode and an association extraction model; determining an element location feature according to the extraction mode and a target page element selected by the user in the HTML page; extracting the webpage data in the HTML page according to the element location feature; and generating output data according to the attribute name and the extracted webpage, and then outputting the same.

Description

technical field [0001] The present disclosure relates to the field of computer technology, and in particular, to a page data extraction method, device, storage medium and electronic equipment. Background technique [0002] In the data acquisition (integration) business, it is often necessary to extract the data in the HTML (HyperText Markup Language, Hyper Text Markup Language) page, and encapsulate the extracted data into a standard format, such as JSON (JavaScript Object Notation, JS Object Notation), XML (Extensible Markup Language, Extensible Markup Language) and other formats, and then output for use by other applications. [0003] At present, there are mainly non-visual extraction methods and visual extraction methods for extracting data in HTML pages. The non-visual extraction method is: the user provides the feature value of the data to be extracted in the specified HTML page, then calculates the feature path of the data, and then performs data extraction based on t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9535G06F9/445
CPCG06F9/4451
Inventor 卢鹏飞
Owner NEUSOFT CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products