Website state reconnaissance method and device

A reconnaissance device and status technology, applied in the field of network information, can solve problems such as time-consuming, reduce unavailability, improve comprehensive management capabilities, and reduce invalid collection operations

Inactive Publication Date: 2016-09-28
SHENZHEN AUDAQUE DATA TECH
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For this kind of abnormal output of the crawler program, it takes a lot of manual time to troubleshoot and re-modify the crawler program before continuing to collect web page information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website state reconnaissance method and device
  • Website state reconnaissance method and device
  • Website state reconnaissance method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] Such as figure 1 As shown, the method for detecting website status in this embodiment includes:

[0027] Step S10: According to a preset reconnaissance cycle, periodically send an access request to the collection target webpage, and receive response information returned by the server of the collection target webpage;

[0028] Step S20: Process the response information;

[0029] Step S30: Determine whether the response information indicates that the collection target webpage is accessible, and when the collection target webpage is not accessible, perform step S40:

[0030] Step S40: Send first alarm information, the first alarm information is used to indicate that the collection target webpage is not accessible; the collection target webpage is an item in the task webpage list.

[0031] Preferably, webpage information corresponding to each collection target webpage in the task webpage list is collected periodically according to a preset collection period.

[0032] Due to the natura...

Embodiment 2

[0036] The method for detecting the status of the website in this embodiment is based on Embodiment 1, and further describes the structure of the reconnaissance webpage.

[0037] Such as figure 2 As shown, the method for detecting website status in this embodiment further includes:

[0038] Step S50: when the response information indicates that the collection target webpage is accessible, access the collection target webpage to obtain the webpage structure information of the collection target webpage;

[0039] Step S60: When it is detected that the webpage structure of the collection target webpage has changed according to the webpage structure information, second alarm information is issued, and the second alarm information is used to indicate that the webpage structure of the collection target webpage has changed.

[0040] Crawler programs usually analyze information based on collection templates customized for the web page. Therefore, if the web page structure changes, causing the...

Embodiment 3

[0043] The method for detecting website status in this embodiment is based on Embodiment 2, and further describes the method for detecting changes in the webpage structure.

[0044] Such as image 3 As shown, according to the web page structure information in the method for detecting website status in this embodiment, it is detected that the web page structure of the collection target web page has changed, which may include one or more of the following:

[0045] Step S61: It is detected that the frame information of the collected target webpage has changed;

[0046] Step S62: It is detected that the content information of the collection target webpage has changed;

[0047] Step S63: It is detected that the rendering information of the collection target webpage has changed;

[0048] Step S64: It is detected that the format information of the collection target webpage has changed.

[0049] A webpage is a kind of compound file that carries content displayed in a certain layout. Regular web...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of network information, in particular to a website state reconnaissance method and device. The website state reconnaissance method includes the following steps that an access request is sent to an acquisition target webpage periodically according to a preset reconnaissance period, and response information returned by a server of the acquisition target webpage is received; the response information is processed; when the response information indicates that the acquisition target webpage is not accessible, first alarming information is sent, wherein the first alarming information is used for indicating that the acquisition target webpage is not accessible. According to the website state reconnaissance method, the accessable state of the acquisition target webpage is reconnoitered periodically, the alarming information is sent when the acquisition target webpage cannot be accessed, and the availability and large-scale acquisition management capacity of information acquisition programs are improved.

Description

Technical field [0001] The present invention relates to the technical field of network information, in particular to a method and device for detecting website status. Background technique [0002] A web crawler is a program or script that automatically crawls Internet information according to certain rules. Web crawlers are responsible for collecting web pages from the Internet and collecting information from the collected web pages. More specifically, it obtains web content data from a web server. [0003] The current crawler program, when performing large-scale web page information collection, often returns abnormal output of unexpected useful information when collecting information according to pre-configured parameters, and is forced to suspend web page information collection. For this kind of abnormal output of the crawler program, it takes a lot of time to manually troubleshoot and re-modify the crawler program to continue the web page information collection work. Summary ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36G06F11/30G06F17/30
CPCG06F11/3006G06F11/3612G06F16/951
Inventor 张军贾西贝
Owner SHENZHEN AUDAQUE DATA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products