Network information fetching method and device

A network information and capture device technology, applied in the network field, can solve problems such as the inability to directly obtain dynamic web page content

Inactive Publication Date: 2014-03-12
HUAWEI TECH CO LTD +1
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The HTML webpages obtained by the existing web crawling technology are generally static webpages. The feature of static webpages is that their content is completely determined by the visited URL, and the content obtained by different users is the same; however, dynamic webpages can obtain static The content of the web page also contains a large number of URLs that must be obtained by executing client-side scripts. That is, in a dynamic web page, the content of the web page obtained by different users accessing the same URL is different. Therefore, when it is necessary to obtain different URLs for different users When personalizing data, the existing web crawling technology cannot directly obtain the URL and the content corresponding to the URL in the dynamic web page

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network information fetching method and device
  • Network information fetching method and device
  • Network information fetching method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0037] Embodiments of the present invention disclose a method for grabbing network information, such as figure 1 As shown, the method includes the following steps:

[0038] 101. The network information grabbing device accesses a static Uniform Resource Locator URL through a preset browser client.

[0039] Further optionally, before step 101, it also includes:

[0040] 101a. Create a browser client network connection;

[0041] 101b. Set the browser version a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention discloses a network information fetching method and device and relates to the field of network technologies. The method and device can fetch a uniform resource locator (URL) and the content corresponding to the URL in a dynamic webpage. The method comprises the steps of accessing a static URL through a preset browser customer-side, obtaining a hypertext markup language (HTML) file corresponding to the static URL, obtaining scripting language functions, which comprise the JavaScript scripting functions, corresponding to execution capable of realizing user operation, analyzing the scripting language functions so as to an obtain analyzed webpage, extracting other static URLs from the webpage, storing the webpage, and utilizing regular expression to extract the other static URLs. The method and device are used for fetching network information.

Description

technical field [0001] The invention relates to the field of network technology, in particular to a method and device for capturing network information. Background technique [0002] Nowadays, many products such as shopping search websites have begun to pay attention to the importance for users of the acquisition and integration of massive information in the Internet. Among these products, web crawling technology, one of the search engine technologies, is a very important link in the entire structured system, which can help major search engines crawl web pages and build web page databases. Web scraping technology refers to a technology in which a program or script automatically downloads specific content from the Internet according to certain rules. The web scraping program generally starts from a specific Uniform Resource Locator (Uniform Resource Locator, referred to as URL), obtains the hypertext markup language (Hypertext Markup Language, referred to as HTML) webpage re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 邓志鸿张杰赖博彦刘河
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products