Method and device for constructing visual webpage information extracting rule

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for web page information and page information, which is applied in network data retrieval, network data indexing, special data processing applications, etc. The difficulty of maintenance and the effect of improving construction efficiency

Active Publication Date: 2017-04-19

SURFILTER NETWORK TECH

View PDF4 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In order to solve the problem that the existing extraction rule construction method has high professional requirements for writers and low efficiency of writing and maintenance, the embodiment of the present invention provides a visual webpage information extraction rule construction method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0044] The embodiment of the present invention provides a method for constructing a visualized web page information extraction rule, see figure 1 , the method can include:

[0045] Step S11, according to the web page element selected by the user, use the web page node analysis algorithm to obtain the parameter information of the web page element, the parameter information may include: xpath, attribute, and text value of the web page element.

[0046] In this embodiment, the web page elements include the web page information that the user wants to extract. xpath is Extensible Markup Language (Extensible Markup Language, referred to as "XML") path language, it is a language used to determine the location of a certain part of the XML document, xpath is based on the tree structure of XML, provides the search in the data structure tree node capabilities. In practical applications, using the webpage node analysis algorithm to obtain the xpath, attributes, and text values of webp...

Embodiment 2

[0075] An embodiment of the present invention provides a device for constructing a visual web page information extraction rule, which adopts the construction method of a visual web page information extraction rule described in Embodiment 1, see Figure 4 , the apparatus may include: a first acquiring module 100 , a processing module 200 , and a first generating module 300 .

[0076] The first acquiring module 100 is configured to acquire parameter information of the webpage element by using a webpage node analysis algorithm according to the webpage element selected by the user, and the parameter information includes: xpath, attribute, and text value of the webpage element.

[0077] In this embodiment, the web page elements include the web page information that the user wants to extract. XPath is an XML path language, which is a language used to determine the location of a certain part of an XML document. Based on the tree structure of XML, xpath provides the ability to find no...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and a device for constructing a visual webpage information extracting rule. The method comprises the following steps: according to a webpage element selected by a user, obtaining parameter information of the webpage element by employing a webpage node analysis algorithm; according to the obtained parameter information of the webpage element, carrying out filling on configuration parameters required by corresponding webpage information extracting actions; and in a preset visual rule action management area, carrying out corresponding operations on the required webpage information extracting actions to generate the corresponding webpage information extracting rule. According to the method for constructing the visual webpage information extracting rule provided by the invention, not only is the analysis of the user on a webpage structure avoided, and the professional requirement of the user reduced, but the webpage information extracting action management convenient to operate is also provided for the user in the preset visual rule action management area; the difficulty of compilation and maintenance of the user on the webpage information extracting rule is greatly reduced; and the construction efficiency of the webpage information extracting rule is improved.

Description

technical field [0001] The invention relates to the technical field of web page information extraction, in particular to a method and device for constructing a visualized web page information extraction rule. Background technique [0002] Web page information extraction technology is a technology to extract target information from web pages. When developing data analysis products or services for a certain field, it is necessary to extract data from the massive Internet data of various websites. Among them, when extracting data information on a single website page, programmers can construct rules to facilitate the analysis Batch extraction of target information is performed on multiple web pages with the same web page structure. [0003] However, the existing technology has the following deficiencies in the construction of target extraction rules: first, the author of the extraction rules needs to analyze the structure of the webpage, and obtain the selector that can uniquel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

CPCG06F16/951

Inventor 李少敏王毅敏范娜刘刚唐新民沈智杰景晓军

Owner SURFILTER NETWORK TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and device for constructing visual webpage information extracting rule

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology