Method and system for integrally acquiring webpage information

A web information and complete technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of inability to obtain AJAX dynamics, partial link link pages and inability to collect information on request results.

Active Publication Date: 2013-07-03
北京中金云网科技有限公司
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] For this reason, a problem to be solved by the present invention is that the dynamic web page acquisition method disclosed in the above-mentioned patent documents cannot obtain the link pages and request results of the partial links dynamical

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for integrally acquiring webpage information
  • Method and system for integrally acquiring webpage information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] A flow chart of a method for completely collecting web page information in an embodiment of the present invention, as shown in figure 1 as shown,

[0067] S1: Simulate the behavior of the user browsing the browser in the browser installed with the FireBug plug-in and the Cookies Manager plug-in and save the Cookies login information generated during the browsing process, all URL requests and the first response result of the URL returned by the server; Wherein, the FireBug plug-in is used to save all URL requests and the first response result that includes the URL returned by the server, and the Cookies Manager plug-in is used to save the login information of Cookies; the first response result is browsed by a simulated user The page response information obtained by the browser's behavior.

[0068] S2: the browser running in the background simulates the behavior of the user browsing the browser according to the Cookies login information saved in the above-mentioned brows...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for integrally acquiring webpage information. The method comprises the following steps of: (1) simulating the behavior that a user browses a browser in the browser provided with FireBug and Cookies Manager, and storing a first response result returned by Cookies information, all URL (uniform resource locator) requests and a server; (2) simulating the behavior that the user browses the browser and storing a second response result by the browser which is in background operation; (3) compensating the webpage information which is available in the first response result and unavailable in the second response result into the second response result; and (4) acquiring and storing the webpage information according to the compensated second response result by the browser which is in background operation. After the method and the system for integrally acquiring the webpage information provided by the invention are used, the browser which is in background operation can acquire all the webpage information according to the compensated second response result without occupying the resource of the browser, so that the problem that a link page dynamically generated by AJAX (asynchronous java and xml) can not be obtained by a dynamical webpage acquisition method in the prior art can be solved.

Description

technical field [0001] The invention relates to the field of web page information collection, in particular to a method and system for completely collecting web page information. Background technique [0002] With the development of Internet technology, users can obtain various information through the Internet. Currently, web pages on the Internet are divided into static web pages and dynamic web pages. The so-called static web pages refer to pre-compiled and stored web page files on the server. Static web pages do not contain programs and cannot be interacted with. Therefore, static web pages do not have a corresponding database in the server. The server where the webpage file is located can complete the collection of the static webpage information, and the dynamic webpage is relative to the static webpage. The dynamic webpage is not a webpage file that exists independently on the server. Generally, a webpage is provided with a database and a program for the webpage on th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 全小飞柳香
Owner 北京中金云网科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products