Method and device for constructing visual webpage information extracting rule

A technology for web page information and page information, which is applied in network data retrieval, network data indexing, special data processing applications, etc. The difficulty of maintenance and the effect of improving construction efficiency

Active Publication Date: 2017-04-19
SURFILTER NETWORK TECH
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problem that the existing extraction rule construction method has high professional requirements for writers and low efficiency of writing and maintenance, the embodiment of the present invention provides a visual webpage information extraction rule construction method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for constructing visual webpage information extracting rule
  • Method and device for constructing visual webpage information extracting rule
  • Method and device for constructing visual webpage information extracting rule

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] The embodiment of the present invention provides a method for constructing a visualized web page information extraction rule, see figure 1 , the method can include:

[0045] Step S11, according to the web page element selected by the user, use the web page node analysis algorithm to obtain the parameter information of the web page element, the parameter information may include: xpath, attribute, and text value of the web page element.

[0046] In this embodiment, the web page elements include the web page information that the user wants to extract. xpath is Extensible Markup Language (Extensible Markup Language, referred to as "XML") path language, it is a language used to determine the location of a certain part of the XML document, xpath is based on the tree structure of XML, provides the search in the data structure tree node capabilities. In practical applications, using the webpage node analysis algorithm to obtain the xpath, attributes, and text values ​​of webp...

Embodiment 2

[0075] An embodiment of the present invention provides a device for constructing a visual web page information extraction rule, which adopts the construction method of a visual web page information extraction rule described in Embodiment 1, see Figure 4 , the apparatus may include: a first acquiring module 100 , a processing module 200 , and a first generating module 300 .

[0076] The first acquiring module 100 is configured to acquire parameter information of the webpage element by using a webpage node analysis algorithm according to the webpage element selected by the user, and the parameter information includes: xpath, attribute, and text value of the webpage element.

[0077] In this embodiment, the web page elements include the web page information that the user wants to extract. XPath is an XML path language, which is a language used to determine the location of a certain part of an XML document. Based on the tree structure of XML, xpath provides the ability to find no...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for constructing a visual webpage information extracting rule. The method comprises the following steps: according to a webpage element selected by a user, obtaining parameter information of the webpage element by employing a webpage node analysis algorithm; according to the obtained parameter information of the webpage element, carrying out filling on configuration parameters required by corresponding webpage information extracting actions; and in a preset visual rule action management area, carrying out corresponding operations on the required webpage information extracting actions to generate the corresponding webpage information extracting rule. According to the method for constructing the visual webpage information extracting rule provided by the invention, not only is the analysis of the user on a webpage structure avoided, and the professional requirement of the user reduced, but the webpage information extracting action management convenient to operate is also provided for the user in the preset visual rule action management area; the difficulty of compilation and maintenance of the user on the webpage information extracting rule is greatly reduced; and the construction efficiency of the webpage information extracting rule is improved.

Description

technical field [0001] The invention relates to the technical field of web page information extraction, in particular to a method and device for constructing a visualized web page information extraction rule. Background technique [0002] Web page information extraction technology is a technology to extract target information from web pages. When developing data analysis products or services for a certain field, it is necessary to extract data from the massive Internet data of various websites. Among them, when extracting data information on a single website page, programmers can construct rules to facilitate the analysis Batch extraction of target information is performed on multiple web pages with the same web page structure. [0003] However, the existing technology has the following deficiencies in the construction of target extraction rules: first, the author of the extraction rules needs to analyze the structure of the webpage, and obtain the selector that can uniquel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 李少敏王毅敏范娜刘刚唐新民沈智杰景晓军
Owner SURFILTER NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products