Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage

An extraction device and extraction method technology, applied in the computer field, can solve the problems of unable to obtain URL, unable to obtain redirected URL, unable to obtain dynamic web page URL, etc., and achieve the effect of improving efficiency and coverage, and improving network coverage.

Active Publication Date: 2013-08-28
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD +1
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, web crawlers can use page tag regular expression matching to obtain the URL in the page. This static method has the following disadvantages: (1) cannot obtain the URL genera

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
  • Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
  • Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. On the contrary, the embodiments of the present invention include all changes, modifications and equivalents coming within the spirit and scope of the appended claims.

[0024] In the description of the present invention, it should be understood that the terms "first", "second" and so on are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance. In the description of the present invention, it should be noted that unless otherwise specified and limited, the terms "connected" and "connect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an extracting method, an extracting device and an extracting system of hidden URL (Uniform Resource Locator) in a webpage, wherein the extracting method comprises the following steps of: acquiring and loading the webpage; analyzing the webpage to extract an event handler code in a JavaScript script in the webpage; and loading the event handler code by a JavaScript engine, and acquiring the hidden URL in the webpage according to the loading result. According to the method in the embodiment of the invention, on the one hand, more URLs hidden in the webpage can be covered while a safety test is carried out onto a website, so that the safety test efficiency and the coverage rate are improved; and on the other hand, the extracting method disclosed by the invention can be provided to a web spider, so that the web spider can be used for excavating the information in the network deeper by acquiring the hidden URL in the webpage, so that the network coverage rate of the web spider is improved.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method, device and system for extracting hidden URLs in web pages. Background technique [0002] The web crawler obtains the URL (Uniform Resource Locator, dynamic Uniform Resource Locator) of web pages in the Internet to provide users with more information services through search engines and the like. [0003] At present, web crawlers can use page tag regular expression matching to obtain the URL in the page. This static method has the following disadvantages: (1) cannot obtain the URL generated during the page loading process; (2) cannot obtain the URL generated by the page after loading The URL redirected by the server; (3) The URL hidden in the dynamic web page cannot be obtained. Contents of the invention [0004] The present invention aims to solve at least one of the above-mentioned technical problems. [0005] For this reason, the first object of the present inven...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 周正吉李鸣雷张彪王丹练坤梅刘磊
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products