Malicious crawler defense strategy selection method for Web server

A server and crawler technology, applied in the field of network information security, can solve the problems of server resource consumption, Web server difficulty, accidental injury to users, etc., to achieve the effect of improving effectiveness and wide application value

Active Publication Date: 2017-12-26
FUDAN UNIV
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

(1) Masquerade User-Agent, each browser has its regular and fixed User-Agent field, in order to show its identity to the server, malicious crawlers can pretend to be a regular browser according to this principle, thereby evading the web server detection
(2) Use an IP proxy to access the Web server in turn using multiple different IP proxies. Although the number of requests received by the server surges in a short period of time, the client IP addresses of these requests are not the same, and it is difficult for the Web server to target specific Web servers. IP takes countermeasures
[0005] For the common techniques of malicious crawlers, web servers also have some corresponding counterattack strategies 5-8 , mainly include: (1) Restricting IP addresses, the server background counts access requests, and sets the threshold of access times for a single IP address within a specific period of time. If the threshold limit is exceeded, the IP can be temporarily blocked; (2) Pass the verification code Bullet box, aiming at reptiles simulating human access habits, it is easy to accidentally injure users when taking action against reptiles, and the technology of blocking some crawlers by entering verification codes is also currently widely used, but this must be at the expense of user experience
[0006] Although there are many mechanisms for detecting and blocking crawlers, it is still difficult for web servers to decide whether to use these technologies and under what conditions to use these technologies to prevent malicious access by crawlers.
This problem involves the manpower, capital and time investment required for technology implementation, as well as the difficulties brought about by the continuous improvement of crawler technology.
Therefore, the complexity of the problem also makes the web server keep the anti-crawler mechanism effective after deploying the anti-crawler, but this will consume server resources, and it is easy to misjudge the behavior of normal users
[0007] Current defense technologies lack a formalized and reliable model, and most rely on human decisions and settings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious crawler defense strategy selection method for Web server
  • Malicious crawler defense strategy selection method for Web server
  • Malicious crawler defense strategy selection method for Web server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The main technologies involved in the three steps of the invention are described in detail below, including the formalization and calculation of benefits and costs, the game’s profit matrix, the strategy selection method based on incomplete information dynamic games, and the strategy selection method based on incomplete information. The strategy selection method under the game.

[0030] 1. Formalization and calculation of benefits and costs

[0031] The benefits and costs are calculated separately for the web server and the crawler. For a web server, the strategies it can adopt include defensive and non-defensive, while for crawlers, its strategies include normal crawling and malicious crawling. Under different strategies, the price they pay and the benefits they get are related to many different factors, and the calculation needs to be calculated separately according to the strategy.

[0032] (1) Web server

[0033] When the web server chooses a defense strategy, it...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of network information security, and specifically relates to a malicious crawler defense strategy selection method for a Web server. The logical architecture of the invention comprises the Web server and a crawler, and the method comprises the following steps: calculating the benefit and cost of the Web server and the crawler; based on dynamic games of incomplete information, calculating an equilibrium solution of a game model, that is, calculating the expected revenue of the web server when adopting a defense strategy and the expected revenue when adopting a non-defense strategy, when the two revenues are equal, then a critical point when the server selects the defense or non-defense strategy is obtained; based on repeated games of incomplete information, calculating the equilibrium solution of the game model, that is, calculating the revenue obtained when the crawler adopts a normal access behavior before a certain moment, and the revenue obtained when the crawler respectively adopts a malicious access behavior and the normal access behavior thereafter; and when the latter revenue is larger than the former revenue, the crawler does not adopt the strategy of the malicious access behavior; and a game parameter satisfying the condition is the optimal choice of a server strategy.

Description

technical field [0001] The invention belongs to the technical field of network information security, and in particular relates to a method for selecting a malicious crawler defense strategy of a Web server. Background technique [0002] With the development of the application of big data analysis technology, Internet data has attracted people's attention due to its good openness. As a crawler technology that automatically collects web page data, it has become an indispensable technology for big data analysis applications. Various crawlers Came into being 1-4 . [0003] However, there is a big contradiction between crawlers and web servers. Due to the large amount of page data, malicious crawlers usually use various means to speed up the collection process in order to improve the collection efficiency, but the use of these technologies leads to the decline in the performance of the web server system, making it difficult to provide services for normal users. Therefore, in o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06H04L29/08H04L12/26
CPCH04L43/04H04L63/1416H04L63/20H04L67/02
Inventor 曾剑平张晓惠
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products