A method and device for anti-crawling and related equipment thereof

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By dividing continuous parameters into multiple intervals and calculating the access ratio, and using a threshold to determine whether a user is a web crawler, the problem of inaccurate identification of low-speed, long-cycle web crawlers and high storage costs in existing technologies is solved, and lightweight web crawler identification and interception is achieved.

CN122247652APending Publication Date: 2026-06-19PEOPLE'S INSURANCE COMPANY OF CHINA

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: PEOPLE'S INSURANCE COMPANY OF CHINA
Filing Date: 2026-02-06
Publication Date: 2026-06-19

Application Information

Patent Timeline

06 Feb 2026

Application

19 Jun 2026

Publication

CN122247652A

IPC: H04L9/40

AI Tagging

Application Domain

Securing communication

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively identify and prevent low-speed, long-cycle data traversal crawlers, especially when they do not rely on full data storage. Traditional rate limiting and short-term behavior pattern analysis methods are ineffective when faced with deliberate evasion of detection, and the full trajectory comparison method has excessively high storage costs and computational overhead.

Method used

By dividing the numerical range of continuous parameters into multiple continuous intervals, recording and calculating the user's access ratio in each interval, and using preset ratio thresholds and quantity thresholds to determine whether a user is a web crawler, the system can identify and block low-speed web crawlers.

Benefits of technology

Without relying on full data storage, it can accurately identify and block slow, long-cycle data traversal behavior, reduce storage costs and computing overhead, and improve the detection efficiency of malicious crawlers.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122247652A_ABST

Patent Text Reader

Abstract

This application provides an anti-crawling method, apparatus, and related equipment. In this application embodiment, an access request is received, the access request pointing to a target interface and containing continuous parameters. The numerical range of the continuous parameters is divided into multiple continuous intervals according to a preset interval length. The access request records the interval to which the continuous parameters belong each time the user who issued the access request accesses the target interface within a preset statistical period, and the access count for each corresponding interval is accumulated. At the end of the statistical period, the access ratio of the user in each interval is calculated, the access ratio being the ratio of the access count for the corresponding interval to the interval length. The number of intervals where the user's access ratio exceeds a preset first ratio threshold is counted. If the number exceeds a preset second number threshold, the user is determined to be a crawler, and anti-crawling measures are taken against the user.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing, and in particular to an anti-crawling method, apparatus and related equipment. Background Technology

[0002] In modern internet applications, various business platforms interact with each other through API interfaces. Data query interfaces based on pagination numbers or auto-incrementing IDs are particularly common, such as policy queries in insurance systems and order lists on e-commerce platforms. Because their parameters (pagination numbers or IDs) are continuous and traversable, these interfaces are highly susceptible to automated data crawling. Crawling not only consumes server resources and increases bandwidth costs, but it can also lead to the leakage of core business data or user privacy information, posing a serious threat to data security. Therefore, effectively identifying and preventing malicious crawlers is one of the core technical challenges in ensuring the stability of web services and data security.

[0003] To combat web crawlers, the industry has developed various technical solutions. The most basic is interface rate limiting, which sets a limit on the number of requests a single IP address or user can make within a given time period to resist high-concurrency, fast-paced crawling attacks. Another common approach is based on short-term behavioral pattern analysis, such as detecting whether request parameters exhibit a monotonically increasing pattern within a short time window (e.g., 5 minutes) to identify traversal behavior. Furthermore, there is a theoretically more accurate method: full-trajectory comparison. This involves recording and storing all parameter values requested by each visitor and comparing them with historical records to determine if their behavior is abnormal.

[0004] However, the aforementioned existing technologies all have significant shortcomings when dealing with slow-speed crawlers that deliberately evade detection. First, the interface rate limiting scheme is completely ineffective against slow-speed crawlers that actively reduce their request frequency to meet the rate limiting threshold. Second, based on short-time window-based behavioral pattern analysis, once the crawler extends its traversal period beyond the analysis window, its regularity will be disrupted, leading to detection failure. Finally, while the full trajectory comparison method has high detection accuracy, it requires storing massive amounts of historical request parameter data and performing efficient comparison calculations. In a large-scale production environment, this would result in unbearable storage costs and computational overhead, making it impractical.

[0005] Therefore, there is an urgent need for a lightweight solution that can effectively identify slow, long-cycle data crawlers and discover their collaborative operation patterns without relying on massive data storage, in order to address the aforementioned problems of existing technologies. Summary of the Invention

[0006] This application provides several aspects of an anti-crawling method, apparatus, and device to address the pressing technical challenge of effectively identifying and preventing slow, long-cycle data traversal behavior when crawlers deliberately reduce their request frequency to evade detection, without relying on the storage of the full amount of data.

[0007] This application provides an anti-crawling method, including: Receive an access request, the access request being directed to a target interface and containing continuous parameters, and divide the numerical range of the continuous parameters into multiple continuous intervals according to a preset interval length; Record the interval to which the continuous parameter belongs each time the user who issued the access request accesses the target interface within a preset statistical period, and accumulate the access count of the corresponding interval; At the end of the statistical period, the access ratio of the user in each interval is calculated, and the access ratio is the ratio of the number of accesses in the corresponding interval to the length of the interval. The system counts the number of intervals in which the user's access ratio exceeds a preset first ratio threshold. If the number exceeds a preset second number threshold, the user is determined to be a web crawler, and anti-crawling measures are taken against the user.

[0008] This application also provides an anti-crawling device, including: The partitioning module is used to receive access requests, which point to the target interface and contain continuous parameters, and to divide the numerical range of the continuous parameters into multiple continuous intervals according to a preset interval length. The recording module is used to record the interval to which the continuous parameter belongs each time the user who issued the access request accesses the target interface within a preset statistical period, and to accumulate the number of accesses for the corresponding interval. The calculation module is used to calculate the user's access ratio in each interval at the end of the statistical period, wherein the access ratio is the ratio of the number of accesses in the corresponding interval to the length of the interval. The processing module is used to count the number of intervals in which the user's access ratio exceeds a preset first ratio threshold. If the number exceeds a preset second number threshold, the user is determined to be a web crawler, and anti-crawling processing is performed on the user.

[0009] This application also provides an electronic device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. The processor communicates with the memory via the bus. When the machine-readable instructions are executed by the processor, they perform the steps in the anti-crawling method provided in this application.

[0010] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to implement the steps in the anti-crawling method provided in this application.

[0011] This application also provides a computer program product that stores instructions that, when executed by a computer, cause the computer to perform the steps in the anti-crawling method provided in this application.

[0012] The anti-crawling method provided in this application divides the numerical range of continuous parameters into multiple continuous intervals according to a preset interval length, records and counts the user's access ratio in each interval, and determines it as a crawler when the number of intervals where the access ratio exceeds a first ratio threshold exceeds a second number threshold. This effectively identifies low-speed crawlers that circumvent traditional rate limiting and short-term pattern detection by reducing access frequency and dispersing access intervals, achieving accurate identification and interception of malicious crawlers employing low-speed traversal strategies. This solution solves the technical problems of inaccurate detection or excessively high storage costs in existing technologies when dealing with such low-speed crawlers. Attached Figure Description

[0013] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings: Figure 1 A flowchart illustrating an anti-crawling method provided for an exemplary embodiment of this application; Figure 2 A flowchart illustrating the application of the anti-crawling method provided in an exemplary embodiment of this application in a real-world scenario; Figure 3 A schematic diagram of an anti-crawling device provided for an exemplary embodiment of this application; Figure 4 This is a schematic diagram of the structure of an electronic device provided as an exemplary embodiment of this application. Detailed Implementation

[0014] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0015] The following is a description of the terms used in this application: Low-speed web crawling (also known as slow-paced crawling) refers to a type of malicious web crawler that deliberately reduces the request frequency and extends the request interval to avoid triggering traditional anti-crawling mechanisms (such as limits on the number of requests per unit time). It gradually traverses and crawls system data in a low-noise manner by making long, dispersed requests to interfaces with continuous or auto-incrementing parameters (such as paginated queries or queries by ID). Its characteristic is that the request rhythm is close to that of a normal user, making it difficult to detect by pattern detection within a short time window.

[0016] Continuous parameters (also referred to as request parameters in the claims) are parameters in the target interface (such as paginated query or query by ID) whose values are continuous or can be traversed sequentially. Typical examples include page numbers, auto-incrementing IDs, or IDs containing auto-incrementing parts. These types of parameters are the main crawling path for data traversal crawlers.

[0017] Interval length (also known as single interval length L) refers to the preset numerical span of a single interval for segmenting and statistically analyzing the numerical range of a continuous parameter. For example, if the page number range is 1-100000 and the preset interval length is 1000, then this range can be divided into 100 intervals (1-1000, 1001-2000, etc.). Interval length is a core configuration parameter that determines the statistical granularity and recognition accuracy.

[0018] The statistical period (also known as the statistical time C) refers to the time range set for analyzing user access behavior, which is usually relatively long (e.g., 30 days). The system will statistically analyze user access behavior within each defined continuous parameter interval during this period to calculate the interval access ratio, thus adapting to the long-term behavioral characteristics of slow-speed crawlers.

[0019] As described in the background section, to address the data security risks and server resource pressures posed by automated web crawlers, existing technologies have proposed various anti-crawler solutions. Interface rate limiting is a common method, which limits the number of requests per unit time to resist high-concurrency, rapid crawling behavior. However, this solution has limited effectiveness against "slow-speed crawlers" or "slow-traversal crawlers" that intentionally reduce request frequency to evade threshold detection. These crawlers extend request intervals, making their behavior indistinguishable from normal users for a brief moment or short time window, making them difficult to identify with simple rate-limiting strategies.

[0020] To identify such slow traversal behavior, existing technologies have further proposed methods based on request pattern analysis. For example, within a preset time window (e.g., 5 minutes), the pagination parameters are analyzed to determine whether they exhibit a monotonically increasing or decreasing pattern, thus identifying whether it is a crawler traversal. The drawback of this method is that if the crawler extends its traversal period beyond the preset analysis time window, its regularity will be fragmented by the time window, leading to identification failure. In other words, this approach is insufficient to protect against crawlers with longer cycles and lower speeds.

[0021] Another, more precise approach is to record the entire crawling process, storing all page numbers or IDs visited by the crawler. By comparing historical records, it's possible to accurately determine whether the current request is a new traversal. While this method offers high accuracy, it requires storing and processing massive amounts of request parameter data, leading to a sharp increase in storage costs and computational overhead, making it difficult to implement in large-scale, high-concurrency production systems.

[0022] Therefore, how to effectively identify data traversal crawlers with long cycles and low speeds without relying on full data storage, and how to discover collaborative crawler groups, has become a pressing technical problem in this field.

[0023] To address the aforementioned problems in related technologies, this application provides a meteorological data processing method. It designs a radar echo extrapolation prediction model comprising a parallel spatial feature extraction subnetwork and a temporal feature extraction subnetwork. The extracted spatiotemporal features are cross-fused using interleaved connection units, and spatial resolution is further enhanced by a pixel recombination module. This method specifically overcomes the technical shortcomings of traditional methods, which suffer from feature difference cancellation and severe loss of spatiotemporal information due to the use of a unified network architecture for spatiotemporal feature extraction. This series of techniques works synergistically, enabling the model to capture the spatial distribution and continuous temporal evolution of radar echoes more precisely and robustly, significantly extending the effective prediction duration and overcoming the problem of a sharp drop in accuracy after the third or fourth prediction time in existing technologies. Finally, based on the generated high-precision, long-term radar echo extrapolation prediction results, the system can automatically generate and push meteorological risk warning information, directly triggering various business operations in the insurance business system from underwriting risk assessment to claims decision-making. This effectively solves the problems of insufficient meteorological risk prediction capabilities and lagging integration of insurance and meteorology in related technologies.

[0024] The technical solutions provided by the various embodiments of this application are described in detail below with reference to the accompanying drawings.

[0025] Figure 1 This is a flowchart illustrating an anti-crawling method provided as an exemplary embodiment of this application. Figure 1 As shown, the method includes: Step 110: Receive an access request. The access request points to the target interface and contains continuous parameters. Based on a preset interval length, the numerical range of the continuous parameters is divided into multiple continuous intervals.

[0026] In some exemplary embodiments, the target interface is a pagination query interface, and the continuous parameter is the page number; or, the target interface is a query interface by ID, and the continuous parameter is an auto-incrementing ID.

[0027] Here, the target interface refers to the application programming interface (API) monitored and protected in the embodiments of this application, which is typically a service endpoint that allows users to query data. More specifically, in this application, it refers to query interfaces whose request parameters can be traversed.

[0028] Continuous parameters refer to request parameters in the target interface whose values form a continuous, ordered sequence. The two most common types are: (1) Page number: In paginated queries, it is used to specify which page of data to return, usually starting from 1 and incrementing page by page; (2) Auto-incrementing ID: A unique identifier assigned to a record in the database, whose ID value automatically increments each time a new record is added. Continuous parameters provide a clear crawling path for data traversal crawlers.

[0029] The interval length (L) is a preset configuration parameter of the core algorithm in this application embodiment, representing the numerical span of a single statistical interval. It determines how many statistical units the continuous parameter range is divided into, directly affecting the granularity of statistics and the sensitivity of detection. For example, L=1000 means that every 1000 consecutive parameter values are grouped into the same statistical interval.

[0030] The total number of parameters refers to the total number of possible values for consecutive parameters in the target interface. For example, if the system has a total of 100,000 pages of data, then the total number of page number parameters is 100,000.

[0031] In practice, the request URL or parameter body can be parsed to identify the interface type and extract key parameters.

[0032] For paginated query interfaces: the key is to extract parameter names and their integer values, such as page=5 and pageNo=12, from the request.

[0033] For the ID-based query interface: the key implementation is to extract parameters such as id=1000345 and policyNo='BJ20240012345' from the request. For the latter, although the policy number may contain letters and province codes, its last part is often a continuous auto-incrementing number sequence. Appropriate parsing rules (such as regular expressions) need to be implemented to extract the auto-incrementing number part for range calculation.

[0034] First, an access request is captured. The core operation is parameter range intervalization. During implementation, the system needs to preload or configure two key values: the maximum value (or total number) of continuous parameters for the target interface, and the interval length L. Through a simple division operation (total number of parameters / L), the system logically creates an interval map covering the entire parameter range. This map does not physically store all parameters, but rather defines the framework for subsequent statistics. For example, for a product list interface with product IDs ranging from 1 to 1,000,000, setting L=10,000 will create 100 logical intervals. This step lays the foundation for subsequent interval-based low-frequency behavior statistics.

[0035] For example, the system receives access requests and divides the range of continuous parameters into multiple continuous intervals based on a preset interval length. The specific implementation process may include: receiving an access request from a user pointing to a target interface. The target interface is typically a query interface containing continuous parameters, such as a pagination query interface (where the continuous parameter is the page number) or an ID-based query interface (where the continuous parameter is an auto-incrementing ID such as the policy number). After receiving the request, an optional verification sub-step can be executed: determining whether the access path of the request (i.e., the order of page or interface calls the user goes through from login to initiating the current request) conforms to the preset normal operation order (e.g., the normal policy query path is "Home -> Personal Center -> Policy List -> Policy Details"). If the access path does not conform to the normal order (e.g., the user directly requests the pagination interface or details interface of the policy list), the system can initially determine that it is a web crawler and directly execute anti-crawling measures, ending the process. If the access path conforms to the normal order, or the system has not enabled this path verification, the core processing flow is entered. The system divides the entire range of values of continuous parameters into multiple continuous, non-overlapping value intervals according to the preset interval length (e.g., L=1000) and the formula: number of intervals = total number of continuous parameters / interval length L. (e.g., dividing 100,000 pages into interval 0: 1-1000, interval 1: 1001-2000, ..., interval 99: 99001-100000).

[0036] In some exemplary embodiments, after receiving the access request, the method further includes: Determine whether the access path of the access request conforms to the preset normal operation sequence; If the conditions are not met, the user will be directly identified as a web crawler and anti-crawling measures will be taken. If the conditions are met, continue with the step of dividing the interval according to the preset interval length.

[0037] The access path refers to the logical order in which a user accesses different pages or calls different interfaces within a complete session. It reflects the user's operational logic, and normal users typically follow the business processes designed in the product.

[0038] The normal operational sequence is a series of steps that legitimate users are most likely to follow, defined based on business logic and user experience. For example, before querying policy details, users typically need to log in, enter their personal center, and view the policy list. This sequence serves as the benchmark for identifying direct and abnormal access.

[0039] Specifically, upon receiving a request, this embodiment does not immediately perform interval division, but instead first invokes a path analysis module. This module can check the Referer field in the HTTP request header, historical access nodes recorded in the session, or specially maintained user behavior event logs. The system internally predefines a valid path state machine or regular expression rule set (e.g., / index -> / profile -> / list* -> / detail*). If the source of the current request (e.g., the source URL or the previous operation) is not within the allowed state set, a path exception is triggered. This embodiment can efficiently intercept lazy crawlers that do not simulate user interaction and directly attack data interfaces, reducing the computational pressure of subsequent complex statistics. Therefore, before entering time-consuming interval statistics, it uses common sense in business logic for rapid filtering, immediately blocking a batch of the most primitive crawler attacks, improving the overall system protection efficiency and response speed. Step 120: Record the interval to which the continuous parameters belong each time the user who issued the access request accessed the target interface within the preset statistical period, and accumulate the access count of the corresponding interval.

[0040] Upon receiving an access request, the first step is to determine the entity dimension for behavior tracking. In this embodiment, the request source IP address is used as the user's identifier dimension (of course, in other embodiments, a logged-in user account can also be used). Subsequently, based on a preset interval length L, the numerical range of the pagination parameter [1, 100000] is divided into 100 consecutive intervals. For this IP address, the interval to which the pagination parameter belongs in each access request can be recorded and accumulated on the corresponding interval access count counter. At the end of the statistical period C, for this IP address, its access ratio in each interval is calculated... If the judgment condition is met, the entity corresponding to the IP address is determined to be a crawler, and corresponding anti-crawling measures are executed.

[0041] Specifically, firstly, request listening and parameter extraction are performed. This can be done through an API gateway, application middleware, or server log analysis module to listen for or process requests to the target interface in real time or near real time. After capturing the request, the specific values of the continuous parameters are parsed out according to predefined rules (such as from the query string, request body, or URL path).

[0042] Secondly, user identifier extraction is performed, specifically extracting information as an identifier dimension from the same request. If the dimension is an IP address, it is extracted from network packets or request headers (such as X-Forwarded-For); if the dimension is a user account, it is parsed from the session or token.

[0043] Next, perform rapid interval positioning. Specifically, utilize the interval mapping relationship pre-calculated in step 110, and quickly determine the interval number to which the parameter value belongs through an efficient mathematical operation (usually an integer division operation: interval number = floor((parameter value - parameter initial value) / interval length L)). For example, if page = 1500, the initial value is 1, and L = 1000, then floor((1500-1) / 1000) = 1.

[0044] Finally, the count is accumulated and stored, specifically using a composite key of user ID and range number. An atomic read-increment-write-back operation is performed in the storage system. To improve performance, in high-concurrency scenarios, a memory cache (such as a Redis hash) can be used for real-time accumulation, followed by periodic persistence to the database. All operations are valid within a preset statistical period C. The system needs to periodically (e.g., daily) clean up old data exceeding the time window to control storage growth.

[0045] In some exemplary embodiments, the user identification dimension is either an IP address or a user account.

[0046] As an example, the behavior of users who issue access requests can be continuously tracked and recorded within a preset statistical period (e.g., 30 days). The user's identifier can be their IP address from the network or their logged-in user account in the system. Whenever a user initiates an access to the target interface, the following operations are performed: First, extract the specific value of the continuous parameters from the request (e.g., page number page=1500); then, based on the interval rules defined in step 110, quickly calculate the interval number to which the parameter value belongs (e.g., for an interval length L=1000, the value 1500 falls within interval number 1, corresponding to the parameter range 1001-2000); finally, find the counter corresponding to the interval number in the user's behavior record table and increment its access count by 1. This process continues within the statistical period C, dynamically maintaining a data snapshot reflecting the distribution of each user's access preferences.

[0047] By transforming records of accessing specific parameter values into counts of access parameter intervals, significant data compression is achieved. Storage overhead is reduced from being proportional to the number of requests multiplied by parameter precision to being proportional to the number of users multiplied by the number of intervals, which is crucial for handling massive data query scenarios. Furthermore, this step doesn't concern itself with whether a user accesses page=1501 or page=1499, only that they belong to the same interval. This ensures that even if the crawler introduces minor random perturbations in its crawling order (such as skipping a few numbers), as long as its activity remains concentrated within certain intervals, it cannot escape being recorded and counted, enhancing the method's robustness. The interval-distributed list of access counts maintained for each user (IP or account) constitutes a lightweight behavioral profile. This profile characterizes the concentration and range of a user's exploration in the parameter space, serving as the data basis for distinguishing normal users (concentrated in the first few intervals) from traversing crawlers (dispersed across multiple intervals). Finally, by introducing a statistical period C, this method can accumulate and analyze user behavior data spanning several days or even weeks. This makes it possible to identify extremely slow crawlers that crawl only a small number of pages per day but persist for a long time, filling the blind spot of traditional short-window detection schemes.

[0048] Step 130: At the end of the statistical period, calculate the user's access ratio in each interval. The access ratio is the ratio of the number of visits in the corresponding interval to the length of the interval.

[0049] Specifically, at the end of a preset statistical period (C, e.g., 30 days), or during a scheduled statistical analysis task executed at a fixed frequency (e.g., daily), for each tracked user (identified by IP address or account), the system reads the accumulated access count data for each interval in step 120. For each interval visited by the user, the system calculates an access ratio. The formula for calculating the access ratio is: Access Ratio = Number of Accesses in the Interval / Interval Length L. For example, for an interval with a length L = 1000, if the system records 500 accesses from a certain IP address within that interval, the calculated access ratio for that interval is 500 / 1000 = 0.5. This ratio quantifies the intensity of a user's accesses within a specific parameter interval.

[0050] Among them, the access ratio (a i / L): The core metric defined in this invention is used to normalize and quantify the intensity of user access behavior. Its numerator a i This represents the total number of visits by a user within the i-th interval, where L is the length of that interval (i.e., the number of consecutive parameter values contained within it). This ratio eliminates the influence of the interval size itself on the absolute value of the number of visits, making the visit density comparable between different intervals. For example, with the same number of visits (100), the ratio is 1.0 (very dense) within the interval L=100, but only 0.01 (very sparse) within the interval L=10000.

[0051] For example, step 130 can be triggered by a scheduled task scheduler. For instance, a scheduled task (Cron Job) can be set up to execute daily at 1 AM. After the task starts, its logical "statistical period" is fixed (e.g., "the last 30 days"), and the system automatically calculates the start and end times of the period based on the current time. For all target interfaces configured by the system and their corresponding interval division rules, the task program iterates through all user identifiers (IPs or accounts) with activity records within the statistical period C. For each user, the number of accesses a_i in each interval is read from the storage system (e.g., a database or cache). Here, i represents the interval number. Finally, for each user, all i can be iterated (from 0 to N-1, where N is the total number of intervals). For each interval i, a floating-point or high-precision division operation is performed: ratio_i = a_i / L. Where L is a pre-configured constant. The calculated ratio_i is the user's access ratio in the i-th interval.

[0052] For example, configuration parameters: total pages = 100000, L = 1000, N = 100. For user IP... AIf the number of visits in interval 0 is a0 = 1200, then ratio0 = 1200 / 1000 = 1.2; and the number of visits in interval 50 is a... 50 = 50, then ratio 50 = 50 / 1000 = 0.05.

[0053] The calculated set of access ratios {ratio_0, ratio_1, ..., ratio_{N-1}} can be temporarily stored in memory for further judgment, or selectively persisted to the database as a derived feature of user behavior profile for subsequent analysis or auditing.

[0054] For slow-speed crawlers, although their access frequency per unit time is low, their accesses will cover multiple intervals evenly or dispersedly in order to traverse the data. After the calculation in step 130, the access ratio of each accessed interval, compared to its sparse absolute access frequency, more significantly reveals its intention to attempt to cover that interval. Even if only 30% of the parameters in a certain interval are accessed (ratio 0.3), this density is too high for a normal user aiming to browse rather than traverse. Moreover, since this calculation is based on cumulative data over a long statistical period C, the result reflects the user's long-term behavioral patterns, rather than instantaneous bursts. This effectively filters out the occasional intensive operations of normal users (such as rapid page turning for information searching), while ensuring that the long-term, slow traversal behavior of extremely slow crawlers can be captured through the cumulative ratio, improving the comprehensiveness of detection and resistance to spoofing behavior.

[0055] Step 140: Count the number of user access ratios that exceed the preset first ratio threshold. If the number exceeds the preset second number threshold, the user is determined to be a web crawler, and anti-crawling measures are taken against the user.

[0056] The first ratio threshold (A) serves as a baseline value for determining whether access behavior within a single interval is abnormally concentrated. When a user's access ratio_i ≥ A within a certain interval, their behavior in that interval is considered to have a potential traversal intention. The threshold A is typically set to a value much higher than the proportion that a normal user might accidentally reach when accessing that interval.

[0057] The second threshold (B) serves as a comprehensive quantitative benchmark for determining whether the overall user behavior constitutes a crawler's activity. A final determination is made only when the number of intervals exhibiting abnormally intensive user activity (proportion ≥ A) reaches or exceeds B. This threshold distinguishes between normal users who are active in a few places and traversing crawlers that are active in many places.

[0058] In some exemplary embodiments, in order to effectively avoid misjudging normal active users (such as customer service, operations personnel or enthusiastic users) who frequently access the homepage or the latest data as web crawlers, the first interval is excluded when the number of intervals in which the user's access ratio exceeds a preset first ratio threshold.

[0059] In a preferred embodiment, to improve accuracy, the first interval (usually corresponding to the homepage or starting data) can be excluded from this statistic because its visit volume may also be high among normal users. The determination formula is as follows:

[0060] in, Let B be the percentage of visits made by a user in the i-th interval. If count ≥ B, then the user is considered a web crawler.

[0061] In some exemplary embodiments, to achieve comprehensive detection and countermeasures against organized web crawler attacks, after determining that a user is a web crawler, the method further includes: When it is determined that at least two users are crawlers, it is determined whether there is a complementary relationship between the sets of intervals where the access ratio of each user exceeds the first ratio threshold. If a complementary relationship exists, and the size of the union of all complementary intervals exceeds a preset third ratio threshold, then at least two users are determined to constitute a crawler group, and collaborative anti-crawling measures are taken against the crawler group.

[0062] The complementary relationship refers to a relationship between two or more sets, where no two sets have any common elements (the intersection is empty). In this invention, it specifically refers to the fact that the abnormal access interval sets of different crawler users do not overlap, and each crawls different parameter intervals.

[0063] The third proportion threshold (D): This threshold is used to determine coordinated attacks. It represents the minimum proportion of the total range of parameters that multiple crawlers cover together in a complementary manner. Exceeding this threshold means that the attackers, through division of labor and cooperation, have posed a systematic threat of traversing a significant portion of the target data.

[0064] Crawler swarm: refers to multiple users independently identified as crawlers whose attack behaviors exhibit cooperative and complementary characteristics in the parameter space. It can be reasonably inferred that they are directed by the same controller to jointly complete a large-scale data crawling task.

[0065] Collaborative anti-crawling measures: Unlike the independent handling of a single crawler, this refers to coordinated, upgraded, or scoped protective measures taken against multiple crawlers identified as belonging to the same group.

[0066] For users identified as web crawlers, the system will take anti-crawling measures, such as limiting their access frequency, returning a verification code, or directly blocking their requests.

[0067] Furthermore, after identifying at least two independent users as web crawlers, a sub-step for detecting coordinated attacks by a web crawler group can be executed: Analyze the set of intervals where the access ratio of each of these web crawler users exceeds a threshold A. Determine whether there is a complementary relationship between these sets (i.e., the sets are mutually exclusive and have no overlapping intervals). If a complementary relationship exists, calculate the total number of intervals contained in the union of all these complementary intervals, and calculate the proportion of this total to the total number of intervals n. If this proportion exceeds a preset third proportion threshold D (e.g., 25%), then these users are determined to constitute a web crawler group. The system then implements coordinated anti-crawling measures against the web crawler group, for example, adding the identifiers (IP or account) of all users in the group to a high-level blacklist, or triggering a protection upgrade for the entire IP range to which the group belongs in response to any subsequent attack attempt by any user in the group.

[0068] Figure 2 A schematic diagram illustrating the application of the anti-crawling method provided in the exemplary embodiments of this application in a real-world scenario. The following is in conjunction with the appendix... Figure 2 Taking the policy query scenario of an insurance business system as an example, this application provides a detailed description of the technical solution provided in its embodiment. This embodiment addresses an anti-crawling method targeting slow-speed crawler data traversal. Figure 2 As shown, this method mainly includes steps such as parameter range division, access behavior recording, access ratio calculation, crawler determination, and crawler group identification. The following details each step in a specific scenario.

[0069] The target system is an online policy query system of an insurance company. The protected interface is the policy list pagination query interface (e.g., / api / policy / list). Interface parameters: The page number parameter `page` is a continuously incrementing integer. The system has a total of 100,000 pages of data, meaning the value of the `page` parameter ranges from 1 to 100,000. Attack simulation: An attacker might use multiple proxy IP addresses or user accounts to access this interface at an extremely low frequency (e.g., only a few dozen requests per hour), attempting to slowly crawl the entire policy list data without triggering regular anti-crawling rules based on request frequency.

[0070] To implement this method, the system is preset with the following key parameters: length of a single interval L = 1000; first proportional threshold (threshold for judging abnormal density in a single interval) A = 0.4; second quantity threshold (threshold for the number of abnormally dense intervals) B = 5; statistical period C = 30 days; third proportional threshold (threshold for identifying crawler groups) D = 25%; Step 1: Parameter range division First, the system logically divides the continuous parameters in the target interface that needs protection into intervals. In this embodiment, this means processing the page parameter in the paginated query interface.

[0071] Based on a preset interval length L=1000, the total number of page parameters (100,000) is divided into 100 continuous and non-overlapping numerical intervals. The division follows the formula: Number of intervals = Total number of parameters / Interval length L. The specific division results are as follows: Interval 0: page values 1 ~ 1000, Interval 1: page values 1001 ~ 2000, Interval 2: page values 2001 ~ 3000..., Interval 99: page values 99001 ~ 100000. This maps a massive number of discrete parameter values (100,000) to a limited number of logical units (100 intervals), laying the foundation for low-cost analysis that does not require storing the full request trajectory.

[0072] Step 2: Range Access Records When a user (taking IP address as an example) initiates an access request to the policy list interface, the system performs real-time recording.

[0073] For each access request, the following operations are performed: First, extract the identifier and parameters: extract the source IP address (e.g., 192.168.1.100) and the page value (e.g., page=1500) from the request parameters. Second, calculate the interval: calculate the interval number to which the page value belongs according to the partitioning rules in step S110. The calculation formula is: interval number = floor((page value - 1) / L). Taking page=1500 as an example, floor((1500-1) / 1000) = 1, so the request belongs to interval 1. Finally, accumulate the access count: in the system's storage structure (e.g., a hash table with IP:interval number as the key), perform an atomic increment operation on the access counter of the currently hit interval (interval 1). This recording process continues within a preset statistical period C (30 days), dynamically maintaining an access count distribution map for each IP address that reflects its activity in each parameter interval.

[0074] Step 3: Calculate the access ratio The analysis task is started periodically (e.g., every morning at midnight) to process the recorded data within the statistical period and calculate the core metric—the access ratio. For each recorded IP address (e.g., 192.168.1.100), firstly, the access count for each interval is obtained: read the cumulative access count for the past 30 days in intervals 0 to 99, denoted as a_0, a_1,..., a_99. Then, the access ratio for each interval is calculated: for each interval i, its access ratio ratio_i is calculated. The formula is: ratio_i = a_i / L. This ratio reflects the average access density of the IP within the corresponding parameter interval (containing L consecutive parameter values). For example, if the cumulative access count of IP 192.168.1.100 in interval 1 is a_1 = 420, then its access ratio in that interval is ratio_1 = 420 / 1000 = 0.42. If the cumulative number of visits in interval 50, a_50 = 15, then ratio_50 = 15 / 1000 = 0.015. By calculating the visit ratio, a normalized measure of visit behavior within intervals of different lengths is achieved, making the behavior density across intervals comparable and providing a standardized input for subsequent threshold determination.

[0075] Step 4: Crawler Identification and Handling Based on the calculated access ratio, a dual threshold rule is applied to determine the crawler. This includes: single-interval density screening: comparing the access ratio_i of the IP address in each interval i with the first ratio threshold A (0.4). If ratio_i>= 0.4, the interval is marked as an abnormally dense interval; and multi-interval breadth determination: counting the total number of intervals marked as abnormally dense for the IP address, denoted as count. Finally, count is compared with the second quantity threshold B (5). If count>=B, the behavior subject corresponding to the IP address is determined to be a low-speed data traversal crawler.

[0076] For example, consider normal user behavior: Customer service personnel frequently inquire about the latest policies, with their access highly concentrated in range 0 (the homepage), where ratio_0 might reach as high as 2.0, far exceeding the threshold of 0.4. However, their access ratio in other ranges is extremely low. Therefore, their count value is usually 1 (assuming only range 0 exceeds the limit), less than 5, and the system classifies them as normal users.

[0077] Slow-speed crawling behavior: The attacking IP 192.168.1.100 slowly crawled data over 30 days. Analysis revealed that its access ratios in five intervals (1, 15, 28, 42, and 56) were 0.42, 0.45, 0.41, 0.47, and 0.40 respectively, all reaching or exceeding 0.4. Therefore, count = 5, satisfying the condition count>= B, and the system determined that this IP was a crawler.

[0078] For IP addresses identified as web crawlers, anti-crawling strategies are automatically triggered, such as limiting the access frequency of subsequent requests, requiring human verification (such as entering a verification code), or directly denying service.

[0079] Preferably, when calculating the count, the first interval (interval 0) can be excluded. This is because the homepage interval carries a large number of repeated visits from normal users, and excluding it can effectively reduce the false positive rate, allowing the model to focus more on identifying abnormal traversal behavior in non-popular data areas.

[0080] Step 5: Crawler Swarm Identification To counter organized distributed web crawler attacks, after identifying multiple independent crawlers, further correlation analysis can be performed to identify crawler swarms. First, a candidate set is obtained: the system collects multiple IP addresses (e.g., IP_A, IP_B, IP_C) identified as crawlers in step S140 within a time window (e.g., the past hour) and their corresponding sets of abnormally dense intervals. Second, complementarity analysis is performed: the relationships between these sets are analyzed. If these sets have no overlap, i.e., satisfy a complementary relationship, the next calculation step is performed. For example: abnormal intervals for IP_A: intervals 11-20; abnormal intervals for IP_B: intervals 21-35; abnormal intervals for IP_C: intervals 36-50.

[0081] Next, coverage is calculated: the total number of intervals covered by the union of all these complementary intervals is calculated. Taking the above example, the union covers intervals 11 to 50, a total of 40 intervals. The total number of intervals in the system is 100. Then, the crawler group is determined: the union coverage rate is calculated: Coverage rate = Number of union intervals / Total number of intervals = 40 / 100 = 40%. This coverage rate is compared with a preset third proportion threshold D (25%). If the coverage rate exceeds D, IP_A, IP_B, and IP_C are determined to constitute a collaborative crawler group. Finally, coordinated action is taken: for the identified crawler group, upgraded protection measures are implemented, such as adding all IP addresses within the group and their potentially associated IP ranges to a higher-priority blacklist, or implementing a unified and stricter access restriction policy for the group, thereby dismantling distributed crawling attacks from a higher level.

[0082] Through the above steps, this invention provides a lightweight anti-crawling solution that can effectively identify and defend against low-speed, long-cycle data traversal crawlers and their coordinated attacks.

[0083] The meteorological data processing method provided in this application design a radar echo extrapolation prediction model that includes a parallel spatial feature extraction subnetwork and a temporal feature extraction subnetwork. It utilizes interleaved connection units to cross-fuse the extracted spatiotemporal features, and then a pixel recombination module enhances spatial resolution. This method specifically overcomes the technical shortcomings of traditional methods, which suffer from feature difference cancellation and severe loss of spatiotemporal information due to the use of a unified network architecture for spatiotemporal feature extraction. This series of techniques works synergistically, enabling the model to capture the spatial distribution and continuous temporal evolution of radar echoes more precisely and robustly, thus significantly extending the effective prediction time and overcoming the problem of a sharp drop in accuracy after the third or fourth prediction in existing technologies. Finally, based on the generated high-precision, long-term radar echo extrapolation prediction results, the system can automatically generate and push meteorological risk warning information, directly triggering various business operations in the insurance business system from underwriting risk assessment to claims decision-making. This effectively solves the problems of insufficient meteorological risk prediction capabilities and lagging integration of insurance and meteorology in related technologies.

[0084] Figure 3 This is a schematic diagram of the structure of an anti-climbing device 300 provided for an exemplary embodiment of this application. Figure 3 As shown, the device 300 includes: The partitioning module 310 is used to receive an access request, the access request pointing to a target interface and containing continuous parameters, and to divide the numerical range of the continuous parameters into multiple continuous intervals according to a preset interval length. The recording module 320 is used to record the interval to which the continuous parameter belongs each time the user who issued the access request accesses the target interface within a preset statistical period, and to accumulate the number of accesses for the corresponding interval. The calculation module 330 is used to calculate the access ratio of the user in each interval at the end of the statistical period, wherein the access ratio is the ratio of the number of accesses in the corresponding interval to the length of the interval. The processing module 340 is used to count the number of intervals in which the user's access ratio exceeds a preset first ratio threshold. If the number exceeds a preset second number threshold, the user is determined to be a web crawler, and anti-crawling processing is performed on the user.

[0085] The anti-crawling device 800 provided in this application divides the numerical range of continuous parameters into multiple continuous intervals according to a preset interval length, records and counts the user's access ratio in each interval, and determines it as a crawler when the number of intervals where the access ratio exceeds a first ratio threshold exceeds a second number threshold. This effectively identifies low-speed crawlers that circumvent traditional rate limiting and short-term pattern detection by reducing access frequency and dispersing access intervals, achieving accurate identification and interception of malicious crawlers employing low-speed traversal strategies. This solution solves the technical problems of inaccurate detection or excessively high storage costs in existing technologies when dealing with such low-speed crawlers.

[0086] Optionally, when the number of intervals in which the user's access ratio exceeds a preset first ratio threshold is counted, the first interval is excluded.

[0087] Optionally, after receiving the access request, the device further includes a determination module, configured to: Determine whether the access path of the access request conforms to the preset normal operation sequence; If the condition is not met, the user will be directly identified as a web crawler and anti-crawling measures will be taken. If the conditions are met, continue with the step of dividing the interval according to the preset interval length.

[0088] Optionally, after the anti-crawling module 340 determines that the user is a web crawler, the device further includes a collaborative anti-crawling processing module, used for: When it is determined that at least two users are crawlers, it is determined whether there is a complementary relationship between the set of intervals where the access ratio of each user exceeds the first ratio threshold. If a complementary relationship exists, and the size of the union of all complementary intervals accounts for more than a preset third ratio threshold, then it is determined that the at least two users constitute a crawler group, and collaborative anti-crawling measures are taken against the crawler group.

[0089] Optionally, the target interface is a paginated query interface, and the continuous parameter is the pagination number; or, the target interface is a query interface by ID, and the continuous parameter is an auto-incrementing ID.

[0090] Optionally, the user's identification dimension can be an IP address or a user account.

[0091] The anti-crawling device 300 can achieve Figure 1 For details of the method implementation examples, please refer to [link / reference]. Figures 1-2 The anti-crawling method shown in the embodiment will not be described in detail.

[0092] Figure 4 This is a schematic diagram of the structure of an electronic device provided as an exemplary embodiment of this application. For example... Figure 4As shown, the device includes a memory 41 and a processor 42.

[0093] Memory 41 is used to store computer programs and can be configured to store various other data to support operation on the computing device. Examples of this data include instructions for any application or method used to operate on the computing device, contact data, phone book data, messages, images, videos, etc.

[0094] The processor 42, coupled to the memory 41, is used to execute a computer program in the memory 41 for: receiving an access request, the access request pointing to a target interface and containing continuous parameters, and dividing the numerical range of the continuous parameters into multiple continuous intervals according to a preset interval length; recording the interval to which the continuous parameters belong each time the user who issued the access request accesses the target interface within a preset statistical period, and accumulating the access count of the corresponding interval; at the end of the statistical period, calculating the access ratio of the user in each interval, the access ratio being the ratio of the access count of the corresponding interval to the interval length; counting the number of intervals in which the user's access ratio exceeds a preset first ratio threshold, and if the number exceeds a preset second number threshold, determining that the user is a web crawler and performing anti-crawling measures on the user.

[0095] The electronic device provided in this application divides the numerical range of continuous parameters into multiple continuous intervals according to a preset interval length, records and counts the user's access ratio in each interval, and determines it as a web crawler when the number of intervals with access ratios exceeding a first ratio threshold exceeds a second number threshold. This effectively identifies low-speed web crawlers that circumvent traditional rate limiting and short-term pattern detection by reducing access frequency and dispersing access intervals, achieving accurate identification and interception of malicious web crawlers employing low-speed traversal strategies. This solution solves the technical problems of inaccurate detection or excessively high storage costs in existing technologies when dealing with such low-speed web crawlers.

[0096] Furthermore, such as Figure 4 As shown, the electronic device also includes other components such as a communication component 43, a display 44, a power supply component 45, and an audio component 46. Figure 4 The diagram only shows some components and does not mean that the electronic device includes only these components. Figure 4 The components shown. Additionally, depending on the implementation of the traffic playback device, Figure 4 The components within the dashed box are optional, not mandatory. For example, when an electronic device is implemented as a terminal device such as a smartphone, tablet, or desktop computer, it may include... Figure 4 The components within the dashed box; when the electronic device is implemented as a server-side device such as a conventional server, cloud server, data center, or server array, it may be excluded. Figure 4The component within the dashed box.

[0097] The above Figure 4 The communication component is configured to facilitate wired or wireless communication between the device containing the communication component and other devices. The device containing the communication component can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, the communication component receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component may further include a Near Field Communication (NFC) module, Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, etc.

[0098] The above Figure 4 The memory in the memory can be implemented by any class of volatile or non-volatile storage devices or combinations thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk or optical disk.

[0099] The above Figure 4 The display includes a screen, which may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors can sense not only the boundaries of the touch or swipe action, but also the duration and pressure associated with the touch or swipe operation.

[0100] The above Figure 4 The power supply component provides power to the various components of the device in which it resides. The power supply component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which it resides.

[0101] The above Figure 4 The audio component can be configured to output and / or input audio signals. For example, the audio component includes a microphone (MIC) configured to receive external audio signals when the device containing the audio component is in an operating mode, such as call mode, recording mode, or voice recognition mode. The received audio signals can be further stored in memory or transmitted via a communication component. In some embodiments, the audio component also includes a speaker for outputting audio signals.

[0102] Accordingly, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, enables the processor to implement the steps in the above-described meteorological data processing method embodiments.

[0103] Accordingly, this application also provides a computer program product, which stores instructions that, when executed by a computer, cause the computer to implement the steps in the meteorological data processing method embodiment provided in this application.

[0104] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0105] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0106] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0107] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1The steps of the function specified in one or more boxes.

[0108] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0109] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0110] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other classes of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0111] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0112] The above description is merely an embodiment of this application and is not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. An anti-crawling method, characterized in that, include: Receive an access request, the access request being directed to a target interface and containing continuous parameters, and divide the numerical range of the continuous parameters into multiple continuous intervals according to a preset interval length; Record the interval to which the continuous parameter belongs each time the user who issued the access request accesses the target interface within a preset statistical period, and accumulate the access count of the corresponding interval; At the end of the statistical period, the access ratio of the user in each interval is calculated, and the access ratio is the ratio of the number of accesses in the corresponding interval to the length of the interval. The system counts the number of intervals in which the user's access ratio exceeds a preset first ratio threshold. If the number exceeds a preset second number threshold, the user is determined to be a web crawler, and anti-crawling measures are taken against the user.

2. The method according to claim 1, characterized in that, When the number of intervals in which the user's access ratio exceeds a preset first ratio threshold is counted, the first interval is excluded.

3. The method according to claim 1 or 2, characterized in that, Following the receipt of the access request, the following is also included: Determine whether the access path of the access request conforms to the preset normal operation sequence; If the condition is not met, the user will be directly identified as a web crawler and anti-crawling measures will be taken. If the conditions are met, continue with the step of dividing the interval according to the preset interval length.

4. The method according to claim 1, characterized in that, After determining that the user is a web crawler, the method further includes: When it is determined that at least two users are crawlers, it is determined whether there is a complementary relationship between the set of intervals where the access ratio of each user exceeds the first ratio threshold. If a complementary relationship exists, and the size of the union of all complementary intervals accounts for more than a preset third ratio threshold, then it is determined that the at least two users constitute a crawler group, and collaborative anti-crawling measures are taken against the crawler group.

5. The method according to claim 1, characterized in that, The target interface is a paginated query interface, and the continuous parameter is the pagination number; or, the target interface is a query interface by ID, and the continuous parameter is an auto-incrementing ID.

6. The method according to claim 1, characterized in that, The user's identification dimension is either IP address or user account.

7. An anti-climbing device, characterized in that, include: The partitioning module is used to receive access requests, which point to the target interface and contain continuous parameters, and to divide the numerical range of the continuous parameters into multiple continuous intervals according to a preset interval length. The recording module is used to record the interval to which the continuous parameter belongs each time the user who issued the access request accesses the target interface within a preset statistical period, and to accumulate the number of accesses for the corresponding interval. The calculation module is used to calculate the user's access ratio in each interval at the end of the statistical period, wherein the access ratio is the ratio of the number of accesses in the corresponding interval to the length of the interval. The processing module is used to count the number of intervals in which the user's access ratio exceeds a preset first ratio threshold. If the number exceeds a preset second number threshold, the user is determined to be a web crawler, and anti-crawling processing is performed on the user.

8. An electronic device, characterized in that, include: A memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the method as claimed in any one of claims 1 to 6.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it causes the processor to perform the steps of the method as described in any one of claims 1 to 6.

10. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1 to 6.