Unlock instant, AI-driven research and patent intelligence for your innovation.

HTTP message collection method based on proxy, terminal equipment and storage medium

A collection method and message technology, which are applied in the Internet field, can solve the problems of increasing the burden on the server of the collected site and the loss of network virtual property, and achieve the effects of avoiding the loss of network virtual property, avoiding re-collection, and reducing the burden.

Active Publication Date: 2021-06-18
XIAMEN MEIYA PICO INFORMATION
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003]After downloading the HTTP response message, the traditional crawler analyzes the response body through regular or XPath rule bases, only pays attention to the content that conforms to the collection rules, and discards the content that does not conform to the rules. Content, when the rules are wrong or the requirements change, it is necessary to re-collect the web pages that have been collected. This operation will increase the burden on the server of the collected site. When accessing the collected content needs to consume network virtual assets, it will also cause the collection side network virtual property loss

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HTTP message collection method based on proxy, terminal equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] The embodiment of the present invention provides a proxy-based HTTP packet collection method, such as figure 1 As shown, the method includes the following steps:

[0025] S1: Build an HTTP message proxy module, and receive the HTTP request message sent by the crawler module through the HTTP message proxy module.

[0026] The HTTP message proxy module provides proxy services of protocols such as HTTP, HTTPS or SOCKS for crawler modules that can be configured with proxy, and global proxy for the system for crawler modules that do not support proxy configuration.

[0027] S2: After the HTTP message agent module receives the HTTP request message, it judges whether there is an HTTP request message identical to the received HTTP request message in the HTTP message database, and if so, enters S4; otherwise, enters S3.

[0028] When the HTTP request message is transmitted using the HTTPS protocol, the crawler module uses the TLS certificate issuing authority corresponding to t...

Embodiment 2

[0042]The present invention also provides a proxy-based HTTP message collection terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the computer program The steps in the above method embodiment of Embodiment 1 of the present invention are realized at the same time.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a HTTP message collection method based on proxy, terminal equipment and a storage medium, and the method comprises the steps: S1, constructing an HTTP message proxy module, and receiving an HTTP request message sent by a crawler module through the HTTP message proxy module; S2, judging whether an HTTP request message which is the same as the received HTTP request message exists in an HTTP message library or not, and if yes, entering S4; otherwise, entering S3; S3, forwarding the HTTP request message to a corresponding crawling target server, receiving an HTTP response message, forwarding the HTTP response message to a crawler module, storing the HTTP request message and the HTTP response message in an HTTP message library, and recording an association relationship; and S4, acquiring an HTTP response message associated with the HTTP request message from the HTTP message library, and forwarding the HTTP response message to the crawler module. According to the method and the device, the crawler module is prevented from re-collecting the content which is not updated by crawling the content of the target server, the burden of crawling the target server is reduced, and the possible network virtual property loss is also avoided.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to an agent-based HTTP message collection method, a terminal device and a storage medium. Background technique [0002] Since the birth of crawler technology, it has been widely used in the collection of Internet information. Some popular sites on the Internet have publicly released a large amount of open source data and allowed crawlers to access it to a limited extent. [0003] After downloading the HTTP response message, the traditional crawler analyzes the response body through a regular or XPath rule base, only pays attention to the content that conforms to the collection rules, and discards the content that does not comply with the rules. When the rules are wrong or the requirements change, it needs to analyze the collected This operation will increase the burden on the server of the collected site. When accessing the collected content needs to consume the network virtual...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08H04L29/06G06F16/951
CPCH04L67/02G06F16/951H04L63/0428H04L63/145G06F2216/03H04L67/56H04L67/60
Inventor 赖子琪王博朱振水
Owner XIAMEN MEIYA PICO INFORMATION