Universal method for capturing webpage contents of any webpage

A webpage content and webpage technology, applied in the network field, can solve problems such as repetitive work, easy parsing errors, low efficiency, etc., and achieve the effect of improving efficiency

Inactive Publication Date: 2010-06-30
SUZHOU CODYY NETWORK SCI & TECH
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, with the explosive increase of the amount of information, as an ordinary netizen, if he can easily and conveniently grab the valuable information content from any other webpage on the Internet and use it on his own webpage, it will become a Netizens' headaches
Because the traditional crawling method has high technical barriers, in order to capture a certain piece of information content of a certain web page, it is often necessary to perform a complex analysis of the data content of the web page, and finally extract the information content you need. And once you change another webpage to grab its webpage content, you have to redesign the analysis code of the program. This process involves a lot of work duplication and inefficiency, because all the analysis work requires you to design the analysis code yourself, and It does not use the system's native functions for analysis, so it is often easy to analyze errors, and it is difficult for ordinary netizens to perform such complex operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Universal method for capturing webpage contents of any webpage
  • Universal method for capturing webpage contents of any webpage
  • Universal method for capturing webpage contents of any webpage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] A general method for grabbing webpage content applicable to any webpage, comprising the following steps:

[0021] 1) The client enters a target URL to be captured and a conditional expression, and generates a subpage displaying all the content of the webpage on the client;

[0022] 2) The client parses the conditional expression into an array variable of node labels and conditions;

[0023] 3) Traverse the array, find out the nodes that meet the conditions on the subpage, and save all the nodes that meet the last condition to an array variable;

[0024] 4) The client obtains the innerHTML attribute value or the outerHTML attribute value of all nodes in the saved array variable.

[0025] The specific steps of the method of the present invention are described in further detail below:

[0026] see figure 1 , figure 2 , input a valid URL on the client side, (for example: www.baidu.com), and a conditional expression, (for example: / div[@class=list]) to send a request to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a universal method for capturing webpage contents of any webpage, which belongs to the technical field of networks. The technical scheme is a universal method for capturing webpage contents of any webpage. The method comprises the following steps: inputting the captured websites and conditional expressions, creating a subpage and transmitting the websites and the conditional expressions to the subpage by a user; requesting a server, acquiring all contents of webpages of the websites and embedding a segment of javascript program into the webpage contents by the subpage; converting the conditional expressions into an array variable, traversing the array and finding out all nodes according with the conditions by the javascript program; and acquiring inner HTML(Hypertext Markup Language) or outer HTML(Hypertext Markup Language) attribute value of all nodes and further acquiring corresponding webpage contents by the javascript program. The method can ensure that the user can capture any content of the webpages by only simply modifying the conditional expressions without writing a code for analyzing the webpage contents aiming at each web page.

Description

technical field [0001] The invention belongs to the field of network technology, in particular to a general method for grabbing webpage content that can be used for any webpage. Background technique [0002] In the Internet age, abundant Internet resources have greatly facilitated people's information life. However, with the explosive increase of the amount of information, as an ordinary netizen, if he can easily and conveniently grab the valuable information content from any other webpage on the Internet and use it on his own webpage, it will become a A headache for netizens. Because the traditional crawling method has high technical barriers, in order to capture a certain piece of information content of a certain web page, it is often necessary to perform a complex analysis of the data content of the web page, and finally extract the information content you need. And once you change another webpage to grab its webpage content, you have to redesign the analysis code of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/2264G06F40/151
Inventor 胡加明
Owner SUZHOU CODYY NETWORK SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products