An adaptive user throttling method and system for an online platform

By adopting an adaptive user rate limiting method and system, multi-dimensional performance indicators are collected in real time, and load rate and user level quotas are dynamically adjusted. This solves the problems of resource waste and service instability in existing rate limiting strategies, and achieves efficient utilization of system resources and high-quality service for high-value users.

CN122247934APending Publication Date: 2026-06-19BEIJING SILICONFLOW TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING SILICONFLOW TECHNOLOGY CO LTD
Filing Date
2026-04-21
Publication Date
2026-06-19

Smart Images

  • Figure CN122247934A_ABST
    Figure CN122247934A_ABST
Patent Text Reader

Abstract

This invention discloses an adaptive user rate limiting method and system for online platforms, belonging to the field of internet technology. Through dynamic coefficient adaptive adjustment, it relaxes the rate limiting threshold under low load to improve resource utilization, and quickly tightens the rate limiting under high load to ensure system stability, achieving a dual balance between resource utilization and service stability. The 5xx status code ratio is introduced as a key indicator, weighted and integrated with multi-dimensional indicators such as CPU, memory, and bandwidth to avoid misjudgment based on a single indicator. Mechanisms such as smoothing and noise reduction, variation amplitude limitation, lag compensation, and mutation suppression ensure a smooth and controllable rate limiting threshold adjustment process. Based on dynamic clustering of user levels, secondary allocation of baseline quotas, differentiated configuration of protection coefficients, and a priority queue mechanism, resource allocation is tilted towards high-priority users to ensure that critical business operations are not affected. Combined with periodic level reassessment and quota rebalancing triggered by success rate, the timeliness and fairness of user level classification are ensured.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of Internet technology, specifically relating to an adaptive user rate limiting method and system for online platforms. Background Technology

[0002] Currently, most internet platforms employ fixed-level rate limiting strategies, such as setting fixed bandwidth limits of 1Gbps or 1Mbps, or QPS rate limiting based on a single threshold. While this static rate limiting method is simple to implement, it has significant drawbacks in practical applications. First, even when system resources are sufficient, the rate limiting threshold remains strictly limited, resulting in a large amount of idle resources not being effectively utilized and causing resource waste. Second, during peak system load periods, the fixed rate limiting threshold cannot be dynamically adjusted according to the actual load, making it difficult to effectively prevent system overload, and may even trigger service avalanche due to sudden traffic surges.

[0003] With the widespread adoption of cloud computing and microservice architectures, load fluctuations on service nodes have become more frequent and severe. The resource utilization of a single node can rapidly escalate from idle to overloaded within a short period. Traditional fixed rate limiting strategies fail to improve resource utilization under low loads or guarantee service stability under high loads, becoming a major bottleneck restricting system performance optimization and service quality assurance. Furthermore, existing technologies lack the ability to differentiate rate limiting for users with different priorities, failing to guarantee service quality for high-value users when resources are limited.

[0004] While some solutions have proposed dynamic rate limiting based on single metrics such as CPU or memory, a single metric cannot fully reflect the overall health of the system. Instantaneous fluctuations in a particular metric can easily lead to misadjustments of the rate limiting threshold, causing system instability. Furthermore, existing dynamic rate limiting solutions generally lack smooth control over the rate limiting threshold adjustment process; excessively large adjustments may cause secondary impacts on the system. Summary of the Invention

[0005] To address the shortcomings of the existing technology, this application provides an adaptive user rate limiting method and system for an online platform.

[0006] Firstly, this application proposes an adaptive user rate limiting method for online platforms, comprising the following steps:

[0007] The real-time acquisition system includes multi-dimensional performance metrics, such as CPU utilization, memory usage, network bandwidth utilization, and the 5xx HTTP status code ratio, which characterizes the frequency of server-side processing failures.

[0008] The overall load rate is calculated based on the multi-dimensional performance indicators, and the load state of the system is determined based on the overall load rate. The load state includes at least an idle state and an overload state.

[0009] The dynamic coefficient is dynamically adjusted according to the load state, wherein the dynamic coefficient is adjusted to a value greater than 1 in the idle state and to a value less than 1 in the overload state, and the adjustment response speed of the dynamic coefficient in the overload state is faster than the adjustment response speed in the idle state.

[0010] Obtain the user level and the corresponding baseline quota, calculate the dynamic coefficient and the baseline quota to generate the real-time flow limiting threshold for each user level. Different user levels are configured with differentiated protection coefficients, which are used to ensure that the quota change of higher-level users is less than that of lower-level users when the system load changes.

[0011] Traffic control is performed on user access requests based on the real-time rate limiting threshold.

[0012] In some embodiments, the step of calculating the overall load rate based on the multi-dimensional performance indicators and determining the load state of the system based on the overall load rate further includes:

[0013] The collected multi-dimensional performance indicators are smoothed and denoised to generate stable indicator estimates.

[0014] The stable index estimates are weighted and fused to generate a comprehensive load factor.

[0015] When the overall load rate is lower than the first threshold, it is determined to be in an idle state; when the overall load rate is higher than the second threshold, it is determined to be in an overload state; otherwise, it is determined to be in a normal state.

[0016] In some embodiments, in the weighted fusion, each metric is configured with a preset weight, wherein CPU utilization and 5xx HTTP status code ratio are configured with higher weights, and memory usage and network bandwidth utilization are configured with lower weights.

[0017] In some embodiments, the adjustment response speed of the dynamic coefficient under overload conditions is faster than that under idle conditions, and the implementation steps are as follows:

[0018] In the idle state, the dynamic coefficient is gradually increased with the first adjustment step size until the preset upper limit threshold is reached;

[0019] In the overload state, the dynamic coefficient is gradually reduced with the second adjustment step size until the preset lower threshold is reached;

[0020] Wherein, the absolute value of the second adjustment step size is greater than the absolute value of the first adjustment step size.

[0021] In some embodiments, obtaining the user level and corresponding baseline quota further includes:

[0022] A clustering algorithm is used to dynamically classify users, taking into account the user's historical request frequency, service level agreement weight, and average resource consumption per request.

[0023] Set up a total baseline quota pool for the system, perform an initial allocation based on the weight of each user level, and then perform a secondary allocation based on the number of users in each level to generate the baseline quota for each user level.

[0024] In some embodiments, it also includes:

[0025] Real-time monitoring of request success rates for users at all levels;

[0026] When the success rate of a certain level continues to fall below the preset target, the quota rebalancing mechanism is triggered to increase the baseline quota for that level.

[0027] In some embodiments, the step of calculating the dynamic coefficient with the baseline quota to generate real-time traffic limiting thresholds for users of different levels further includes:

[0028] Multiply the dynamic coefficient by the base quota for each user level to obtain the initially adjusted quota;

[0029] Multiply the initially adjusted quota by the protection coefficient corresponding to the level to obtain the real-time flow limiting threshold.

[0030] The calculated real-time threshold is smoothed to control the magnitude of threshold variation between adjacent periods.

[0031] In some embodiments, the step of performing traffic control on user access requests based on the real-time rate limiting threshold further includes:

[0032] Rate limiting is performed using the token bucket algorithm;

[0033] When the number of requests exceeds the quota, high-priority requests are placed in a priority queue to wait, while low-priority requests are returned a rate-limited response directly.

[0034] Secondly, this application proposes an adaptive user rate limiting system for an online platform, comprising:

[0035] The system monitoring module is used to collect multi-dimensional performance indicators of the system in real time. These multi-dimensional performance indicators include CPU utilization, memory usage, network bandwidth utilization, and the ratio of 5xx HTTP status codes used to characterize the frequency of server-side processing failures. The module calculates the overall load rate based on these multi-dimensional performance indicators and determines the load state of the system based on the overall load rate. The load state includes at least an idle state and an overload state.

[0036] The dynamic coefficient calculation module is used to dynamically adjust the dynamic coefficient according to the load state, wherein the dynamic coefficient is adjusted to a value greater than 1 in the idle state and to a value less than 1 in the overload state, and the adjustment response speed of the dynamic coefficient in the overload state is faster than the adjustment response speed in the idle state.

[0037] The tiered quota management module is used to obtain user levels and their corresponding baseline quotas.

[0038] The real-time rate limiting adjustment module is used to calculate the dynamic coefficient and the baseline quota to generate real-time rate limiting thresholds for users of different levels. Different levels of users are configured with differentiated protection coefficients, which are used to ensure that the quota change of higher-level users is less than that of lower-level users when the system load changes.

[0039] The flow control execution module is used to perform flow control on user access requests based on the real-time flow limiting threshold.

[0040] Thirdly, this application proposes an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.

[0041] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method described above.

[0042] The beneficial effects of this invention are:

[0043] By adaptively adjusting the dynamic coefficients, the rate limiting threshold is automatically relaxed when the system load is low, making full use of idle resources and significantly improving system throughput; during peak load periods, the rate limiting is quickly tightened to effectively prevent overload and avalanche effects and ensure the stability of core services.

[0044] Introducing the 5xx HTTP status code ratio as a key indicator indirectly reflects the degree of server-side processing failures and dependency failures. Combined with multi-dimensional weighted integration of CPU, memory, and bandwidth, the overall load assessment is more comprehensive and accurate, avoiding misjudgments caused by a single indicator.

[0045] Through multiple mechanisms such as smoothing and noise reduction, exponential weighted moving average, variation range limitation, lag compensation and mutation suppression, the changes in β value and current limiting threshold are ensured to be stable and controllable, avoiding secondary impact on the system due to excessive adjustment range;

[0046] By dynamically clustering users by level, redistributing baseline quotas, configuring protection coefficients differently, and establishing a high-level request priority queue mechanism, the system achieves precise resource allocation to high-priority users, ensuring that critical business operations are not affected under system pressure. Through regular user level reassessment and quota rebalancing mechanisms triggered by success rate monitoring, the system ensures the timeliness of user level classification and fairness among users within the same level, enabling the system to adaptively respond to changes in business models and user behavior. Attached Figure Description

[0047] Figure 1 This is a flowchart of the present invention.

[0048] Figure 2 This is a schematic diagram of a three-level state machine transition.

[0049] Figure 3 This is a diagram illustrating the tiered quota system.

[0050] Figure 4 This is a system structure block diagram of the present invention. Detailed Implementation

[0051] Exemplary embodiments of the invention will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be implemented in various forms and should not be limited to the embodiments set forth herein; rather, these embodiments are provided so that a more thorough understanding of the invention can be achieved and that the full scope of the invention can be conveyed to those skilled in the art.

[0052] Firstly, this application proposes an adaptive user rate limiting method for online platforms, comprising the following steps:

[0053] S100: Multi-dimensional performance indicators of the real-time acquisition system, including CPU utilization, memory usage, network bandwidth utilization, and the 5xx HTTP status code ratio used to characterize the frequency of server-side processing failures.

[0054] Specifically, lightweight probes are deployed on each server, using non-blocking data acquisition technology:

[0055] CPU utilization: Calculated by parsing system files to determine the ratio of CPU non-idle time to total time per unit of time.

[0056] Memory usage: Calculates the ratio of used memory to total memory by combining system memory information.

[0057] Network bandwidth: Calculate the number of bytes sent and received per unit time through network interface statistics, and compare it with the maximum bandwidth to obtain the utilization rate.

[0058] 5xx HTTP status code ratio: This is a real-time measure of the proportion of 5xx responses to the total number of requests within a given time period, collected through API gateway logs or server-side statistics. It reflects the frequency of server-side processing failures.

[0059] The 5xx HTTP status code ratio refers to the ratio of the number of responses with HTTP status codes of 5xx (such as 500, 502, 503, etc.) returned by the server within a preset time window to the total number of requests within the same time window. It is used to quantitatively characterize the frequency of request failures caused by internal server errors or dependent service failures.

[0060] S200: Calculate the overall load rate based on the multi-dimensional performance indicators, and determine the load state of the system based on the overall load rate. The load state includes at least an idle state and an overload state.

[0061] In some embodiments, the step of calculating the overall load rate based on the multi-dimensional performance indicators and determining the load state of the system based on the overall load rate further includes:

[0062] The collected multi-dimensional performance indicators are smoothed and denoised to generate stable indicator estimates.

[0063] The stable index estimates are weighted and fused to generate a comprehensive load factor.

[0064] like Figure 2 As shown, when the overall load rate is lower than the first threshold, it is determined to be in an idle state; when the overall load rate is higher than the second threshold, it is determined to be in an overload state; otherwise, it is determined to be in a normal state, thus realizing a three-level state machine transition.

[0065] In some embodiments, in the weighted fusion, each metric is configured with a preset weight, wherein CPU utilization and 5xx HTTP status code ratio are configured with higher weights, and memory usage and network bandwidth utilization are configured with lower weights.

[0066] A smoothing algorithm is used to denoise the collected indicators, eliminate the influence of instantaneous fluctuations, and generate stable indicator estimates.

[0067] The comprehensive load calculation introduces a non-linear weight allocation, weighting and fusing different indicators to generate a comprehensive load rate that represents the overall system workload. The weights of each indicator can be dynamically configured according to system characteristics; in a typical configuration, CPU, memory, bandwidth, and 5xx HTTP status codes each account for a certain proportion.

[0068] S300: Dynamically adjust the dynamic coefficient according to the load state, wherein the dynamic coefficient is adjusted to a direction greater than 1 in the idle state and to a direction less than 1 in the overload state, and the adjustment response speed of the dynamic coefficient in the overload state is faster than the adjustment response speed in the idle state.

[0069] In some embodiments, the adjustment response speed of the dynamic coefficient under overload conditions is faster than that under idle conditions, and the implementation steps are as follows:

[0070] In the idle state, the dynamic coefficient is gradually increased with the first adjustment step size until the preset upper limit threshold is reached;

[0071] In the overload state, the dynamic coefficient is gradually reduced with the second adjustment step size until the preset lower threshold is reached;

[0072] Wherein, the absolute value of the second adjustment step size is greater than the absolute value of the first adjustment step size.

[0073] Specifically, the dynamic adjustment coefficient β is calculated based on monitoring data. When the system idle rate is higher than the threshold, β>1; when the load exceeds the warning value, β<1; and under normal conditions, β=1.

[0074] Load status quantification: Based on the overall load rate, a nonlinear mapping function is used to compress it to the (0,1) interval, defining a system load index. The closer the index is to 1, the closer the system is to overload; the closer it is to 0, the closer the system is to idle. The mapping function is designed to be sensitive to changes in the load in the intermediate region, while changing smoothly under extreme loads, simulating the actual response characteristics of the system to load changes.

[0075] Dynamic Coefficient Generation: A dynamic adjustment algorithm based on control theory is introduced to calculate the coefficient β. This algorithm generates an adjustment value by comparing the deviation between the current load index and the target load, considering the historical accumulation of the deviation and the trend of its change, and then corrects it based on the baseline value of 1. The algorithm can adaptively adjust the β value according to the real-time changes, historical trends, and rate of change of the load, enabling the system to respond quickly to load changes and maintain stability. To prevent the β value from exceeding a reasonable range in extreme cases, its output is truncated to ensure that the current limiting threshold does not expand or shrink indefinitely, thus guaranteeing the basic stability of the system.

[0076] Stability optimization: Add a hysteresis compensation mechanism: When the system state changes, the newly generated β value is smoothed and weighted by combining it with the historical β value to avoid drastic jumps in β value due to sudden state changes.

[0077] Introducing a mutation suppression function: When the β value is detected to change too much, linear interpolation is used to gradually adjust it to the target value, so as to avoid the impact on the system caused by a single large adjustment.

[0078] S400: Obtain the user level and the corresponding baseline quota, calculate the dynamic coefficient and the baseline quota to generate the real-time flow limiting threshold for each user level, wherein different user levels are configured with differentiated protection coefficients, which are used to make the quota change of higher-level users less than that of lower-level users when the system load changes.

[0079] In some embodiments, obtaining the user level and corresponding baseline quota further includes:

[0080] A clustering algorithm is used to dynamically classify users, taking into account the user's historical request frequency, service level agreement weight, and average resource consumption per request.

[0081] Set up a total baseline quota pool for the system, perform an initial allocation based on the weight of each user level, and then perform a secondary allocation based on the number of users in each level to generate the baseline quota for each user level.

[0082] In some embodiments, it also includes:

[0083] Real-time monitoring of request success rates for users at all levels;

[0084] When the success rate of a certain level continues to fall below the preset target, the quota rebalancing mechanism is triggered to increase the baseline quota for that level.

[0085] In some embodiments, the step of calculating the dynamic coefficient with the baseline quota to generate real-time traffic limiting thresholds for users of different levels further includes:

[0086] Multiply the dynamic coefficient by the base quota for each user level to obtain the initially adjusted quota;

[0087] Multiply the initially adjusted quota by the protection coefficient corresponding to the level to obtain the real-time flow limiting threshold.

[0088] The calculated real-time threshold is smoothed to control the magnitude of threshold variation between adjacent periods.

[0089] User level classification: such as Figure 3As shown, an improved clustering algorithm is used to dynamically classify users. This algorithm comprehensively considers multiple features such as the user's historical request frequency, service level agreement weights, and average resource consumption per request. By calculating the similarity between user characteristics and preset level centroids, users are divided into high, medium, and low levels. The clustering process is not completed all at once, but is periodically reassessed and dynamically adjusted based on the user's recent behavioral characteristics to ensure the timeliness and fairness of the level classification. A smoothing mechanism is introduced during centroid updates to avoid frequent level changes due to single fluctuations.

[0090] Baseline quota calculation: A total baseline quota pool is set up for the system, and initial allocation is performed based on the weight of each user level (e.g., high, medium, and low levels account for 0.5, 0.3, and 0.2 respectively). Subsequently, the quota for each level is further allocated based on the number of users within that level to ensure fairness among users within the same level. A smoothing algorithm can be used for the secondary allocation to avoid drastic fluctuations in quotas when the number of users changes.

[0091] Dynamic adjustment mechanism: The system monitors the request success rate of users at each level in real time. When the success rate of a certain level consistently falls below a preset target, a quota rebalancing mechanism is triggered. This mechanism will moderately increase the baseline quota for that level without severely impacting other levels, ensuring service quality for users at that level. Furthermore, the system performs daily level reassessments, using a sliding window to statistically analyze recent user behavior characteristics and combining this with historical characteristics for weighted updates, ensuring that the level classification promptly reflects changes in user behavior.

[0092] S500: Perform traffic control on user access requests based on the real-time rate limiting threshold.

[0093] In some embodiments, the step of performing traffic control on user access requests based on the real-time rate limiting threshold further includes:

[0094] Rate limiting is performed using the token bucket algorithm;

[0095] When the number of requests exceeds the quota, high-priority requests are placed in a priority queue to wait, while low-priority requests are directly returned with a rate-limited response.

[0096] The system accurately calculates real-time rate limiting thresholds for users at different tiers by multiplying a dynamic coefficient β by the user's baseline quota. The specific implementation process includes:

[0097] Dynamic coefficient acquisition: The β value corresponding to the current system load state is obtained from the dynamic coefficient calculation module. The logic for determining this value is as follows: when the system resource idle rate is higher than a certain threshold, β is greater than 1, allowing the quota to be increased; when the resource idle rate is lower than a certain threshold (i.e., the system is overloaded), β is less than 1, and the quota is reduced; under normal conditions, β remains at 1. The formula design ensures that the quota adjustment range is positively correlated with the resource idle rate and has upper and lower limits, guaranteeing the rationality and controllability of the adjustment.

[0098] Threshold calculation engine: Performs parallel calculations for each user level, multiplying the base quota for each level by the current β value, and then multiplying it by the protection coefficient corresponding to that level. The level protection coefficient (e.g., 1.0 for high level, 0.8 for medium level, and 0.6 for low level) is used to ensure that high-level users still receive relatively more resources when the system is overloaded and quotas are uniformly reduced; and that the quota advantage of high-level users is maintained when quotas are uniformly increased during periods of system idleness.

[0099] Smoothing mechanism: The calculated real-time threshold is processed using a smoothing algorithm. The calculation result of the current period is weighted and fused with the final threshold of the previous period to avoid drastic changes in the threshold due to sudden changes in β value or calculation fluctuations.

[0100] Setting limits on the variation of thresholds between adjacent periods ensures that the threshold adjustment process is smooth and controllable, and will not cause any impact on the system.

[0101] Secondly, this application proposes an adaptive user rate limiting system for an online platform, comprising:

[0102] The system monitoring module is used to collect multi-dimensional performance indicators of the system in real time. These multi-dimensional performance indicators include CPU utilization, memory usage, network bandwidth utilization, and the ratio of 5xx HTTP status codes used to characterize the frequency of server-side processing failures. The module calculates the overall load rate based on these multi-dimensional performance indicators and determines the load state of the system based on the overall load rate. The load state includes at least an idle state and an overload state.

[0103] The dynamic coefficient calculation module is used to dynamically adjust the dynamic coefficient according to the load state, wherein the dynamic coefficient is adjusted to a value greater than 1 in the idle state and to a value less than 1 in the overload state, and the adjustment response speed of the dynamic coefficient in the overload state is faster than the adjustment response speed in the idle state.

[0104] The tiered quota management module is used to obtain user levels and their corresponding baseline quotas.

[0105] The real-time rate limiting adjustment module is used to calculate the dynamic coefficient and the baseline quota to generate real-time rate limiting thresholds for users of different levels. Different levels of users are configured with differentiated protection coefficients, which are used to ensure that the quota change of higher-level users is less than that of lower-level users when the system load changes.

[0106] The flow control execution module is used to perform flow control on user access requests based on the real-time flow limiting threshold.

[0107] The following will provide a detailed description of an example using a specific e-commerce platform application scenario.

[0108] Scenario Setting: A large e-commerce platform deploys a microservice architecture, and its API gateway integrates the adaptive rate limiting system of this invention. Users in the system are divided into three categories: high-level (VIP members), medium-level (regular members), and low-level (non-members). In the initial configuration, the baseline quotas for high-level, medium-level, and low-level users are 1000 QPS, 600 QPS, and 400 QPS, respectively, with protection coefficients of 1.0, 0.9, and 0.8. The dynamic coefficient β is initially set to 1.0. The monitoring cycle is 10 seconds, and the stability confirmation cycle is 3 monitoring cycles (30 seconds). The overload state decrease step size is 0.2, the idle state increase step size is 0.1, the upper limit of the β value is 1.5, and the lower limit is 0.5.

[0109] Step 1: System monitoring and load status determination.

[0110] At 10:00 AM, the system monitoring module collected key performance indicators using lightweight probes deployed on the server. The collection process employed non-blocking asynchronous I / O to avoid blocking business threads. Raw data: CPU utilization 40%, memory usage 50%, network bandwidth usage 30%, and the 5xx HTTP status code ratio was 1% as observed in the API gateway logs.

[0111] The monitoring module first performs smoothing and noise reduction on the raw data. Taking CPU utilization as an example, the system uses a weighted average method to merge the currently collected instantaneous value with historical smoothed values ​​according to a certain ratio, making the CPU utilization curve smoother and effectively suppressing the risk of misjudgment caused by instantaneous spikes or occasional fluctuations. Similarly, memory usage, network bandwidth utilization, and 5xx HTTP status code ratios all undergo the same smoothing mechanism to generate their respective stable estimates.

[0112] Subsequently, the system performs weighted fusion according to a preset weight allocation strategy. In this strategy, CPU utilization is given the highest weight, followed by the 5xx HTTP status code ratio, while memory usage and network bandwidth utilization have relatively lower weights. The logic behind this weight configuration is as follows: CPU utilization directly reflects the stress level of the system's computing resources and is the core basis for judging overload; the 5xx HTTP status code ratio indirectly reflects the degree of server-side processing failures or dependency faults, providing early warnings before the system is fully overloaded; while memory and bandwidth serve as auxiliary indicators to help comprehensively judge the overall health of the system. By fusing and calculating each indicator with its corresponding weight, the system generates a comprehensive load rate representing the current overall busy level. The calculated comprehensive load rate is 29.3%. Since this value is lower than the preset idle state threshold of 30%, the system determines that it is currently in an "idle state." This determination result is written to shared memory for other modules to read in real time.

[0113] Step 2: Adjust the dynamic coefficient.

[0114] The dynamic coefficient calculation module reads the system load status every 10 seconds. At 10:00:10, the module detects that the system is in an idle state and executes incremental logic: it increases the β value from 1.0 to 1.1 by a fixed increment based on the previous cycle's β value. To avoid frequent changes in β due to instantaneous load fluctuations, the module introduces a state stability confirmation mechanism. In the next three monitoring cycles (10:00:20, 10:00:30, and 10:00:40), the system continuously detects an idle state. After confirming the state stability, β continues to increase by the same increment to 1.2 and 1.3, until it reaches 1.3 at 10:01:00 and stabilizes around that value.

[0115] During the adjustment process, the dynamic coefficient calculation module also incorporates a dynamic adjustment mechanism based on control theory. This mechanism compares the difference between the current overall load rate and the preset target load rate, while also considering the trend of this difference accumulating over time and the rate of change, to generate a correction factor, which is then added to the β value. This mechanism enables the system to adaptively adjust the β value based on the real-time situation, historical trends, and rate of change of load, thereby approaching the ideal load range more quickly while avoiding system oscillations caused by over-adjustment. The final output β value is strictly limited within the preset upper and lower limits, ensuring that the current limiting threshold will not exceed the system's capacity due to excessive relaxation, nor will it affect the availability of basic services due to excessive tightening.

[0116] Step 3: Tiered quota management and dynamic reassessment.

[0117] The tiered quota management module maintains a user tier database. Every day at 2:00 AM, the system performs a user tier reassessment. Taking user A as an example, their characteristics include: high historical request frequency, high service level agreement weight, and low average resource consumption per request. The system uses a clustering algorithm to dynamically tier users: first, it constructs a user feature vector, quantifying the user's behavioral characteristics across multiple dimensions; then, it calculates the similarity between this user feature vector and the preset high, medium, and low tier criteria; finally, it assigns the user to the tier with the highest similarity. During the tier centroid update process, the system introduces a smoothing mechanism, weightedly fusing historical centroid data with recent user behavior statistics to avoid frequent tier changes due to fluctuations in single user behavior, thus ensuring the stability and continuity of tier classification.

[0118] The system's total baseline quota pool is set at 2000 QPS. During quota allocation, the system first performs an initial allocation based on the overall weight of the high, medium, and low quota levels to determine the total quota for each level. Subsequently, within each level, a secondary allocation is performed based on the actual number of users within that level, ensuring that users within the same level receive a fair baseline quota. Since the number of users in each level is stable, the baseline quota for each user remains unchanged after the secondary allocation.

[0119] Step 4: Real-time rate limiting threshold calculation and smoothing.

[0120] The real-time rate limiting adjustment module obtains the current β value from the dynamic coefficient calculation module and the baseline quota and protection coefficient for each user level from the tiered quota management module. The configuration logic for the protection coefficient is as follows: the highest protection coefficient is for high-level users, followed by medium-level users, and the lowest for low-level users. This design aims to ensure that when the system needs to uniformly increase or decrease quotas, high-level users receive a larger quota increase or experience a smaller quota reduction, thus maintaining the service quality differences between levels under different load conditions.

[0121] When calculating the real-time threshold, the system first multiplies the dynamic coefficient β by the baseline quota for each user level to achieve an initial quota adjustment based on system load. Then, it multiplies by the protection coefficient corresponding to that level to achieve service quality differentiation based on user level. Taking a current β value of 1.3 as an example, the real-time threshold for high-level users, based on the baseline quota of 1000 QPS, is amplified twice by the dynamic coefficient and the protection coefficient, ultimately reaching 1300 QPS.

[0122] To avoid the impact of sudden threshold changes on the system, the module introduces a smoothing mechanism. This mechanism weights and merges the real-time threshold calculated in the current cycle with the threshold that took effect in the previous cycle, resulting in a smoother threshold change. Simultaneously, the system sets a maximum allowable range for threshold changes between adjacent cycles. When the calculated threshold change exceeds this range, the system does not directly apply a new threshold, but instead adjusts it gradually over multiple cycles, using the maximum allowable range as the step size, until it reaches the target threshold. These two mechanisms together ensure that the adjustment process of the current limiting threshold is smooth and controllable.

[0123] Step 5: Flow control execution and differential processing.

[0124] At 11:00 AM, the API gateway's traffic control execution module began applying the new rate-limiting threshold. The module implements traffic control based on the token bucket algorithm; each user level maintains an independent token bucket, and the token generation rate is equal to its real-time rate-limiting threshold.

[0125] User A (high-level) made 1200 requests per second (QPS), below the real-time threshold of 1270 QPS. With sufficient tokens in the token bucket, the request was allowed normally, and the gateway logged the request. User B (low-level) made 450 requests per second, exceeding the real-time threshold of 416 QPS. When the token bucket was exhausted, subsequent low-level requests were directly rejected by the gateway, which returned a specific status code and included a rate-limiting reset timeout in the response, informing the client when they could retry.

[0126] If user A's requests suddenly surge to 1400 QPS, exceeding its real-time threshold, then due to their high-level status, the excessive requests are not directly rejected but are placed in a priority queue. The priority queue employs a fair scheduling mechanism based on waiting time. When a new token is added to the token bucket, the system prioritizes processing high-level requests in the priority queue, ensuring that high-value users still receive relatively high-quality service even under system pressure.

[0127] Step 6: Dynamic quota rebalancing.

[0128] The system monitoring module continuously tracks the request success rate for users at each level. Over multiple monitoring periods, the request success rate for lower-level users consistently falls below the preset target success rate threshold. Upon receiving a success rate alarm, the tiered quota management module triggers a quota rebalancing mechanism.

[0129] The rebalancing mechanism first assesses the overall system load to confirm that there is room for adjustment. Then, the system gradually increases the baseline quota for low-level users in small increments, while continuously monitoring the success rate changes for high- and medium-level users. The adjustment process terminates when the baseline quota for low-level users reaches a certain level and their success rate returns to the target level. Throughout the adjustment process, the system ensures the concurrency safety of quota data through atomic operations and synchronizes the adjusted quotas to the flow control execution module in real time, allowing the adjustment effect to be quickly reflected in actual rate limiting decisions.

[0130] Step 7: Rapid Response and Protection under Overload Conditions

[0131] At 2:00 PM, due to a sudden flash sale, the system monitoring module detected a continuous deterioration in several performance indicators: CPU utilization climbed to a high level, memory usage approached saturation, network bandwidth utilization was nearing its limit, and the 5xx HTTP status code rate increased significantly. After weighted calculation, the overall load rate exceeded the overload threshold, and the system was determined to be in an "overload state."

[0132] The dynamic coefficient calculation module responded immediately, rapidly reducing the β value in large decreasing steps under overload conditions until it reached the preset lower threshold. The real-time rate limiting threshold was subsequently lowered, and quotas for users at all levels were reduced synchronously. The flow control execution module then tightened the release rate, causing requests in the priority queue to back up, but system resource pressure gradually eased, and the 5xx HTTP status code ratio dropped significantly within a short period. By rapidly tightening the rate limiting, the system successfully prevented overload crashes and ensured the basic availability of the core transaction chain.

[0133] Thirdly, this application proposes an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.

[0134] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method described above.

[0135] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0136] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0137] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this disclosure.

[0138] In the embodiments provided in this disclosure, it should be understood that the disclosed apparatus / computer devices and methods can be implemented in other ways. For example, the apparatus / computer device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. Multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0139] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0140] Furthermore, the functional units in the various embodiments of this disclosure can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0141] If an integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program may include computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. A computer-readable medium may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in a computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in a jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0142] The above are merely preferred embodiments of the present invention. It should be noted that any modifications and improvements made by those skilled in the art without departing from the present technical solution should also be considered to fall within the scope of protection claimed by the present solution.

Claims

1. An adaptive user rate limiting method for an online platform, characterized in that, Includes the following steps: The real-time acquisition system has multiple performance metrics, including CPU utilization, memory usage, network bandwidth utilization, and the 5xx HTTP status code ratio, which is used to characterize the frequency of server-side processing failures. The overall load rate is calculated based on multi-dimensional performance indicators, and the load state of the system is determined based on the overall load rate. The load state includes at least idle state and overload state. The dynamic coefficient is dynamically adjusted according to the load status. In the idle state, the dynamic coefficient is adjusted to be greater than 1, and in the overload state, the dynamic coefficient is adjusted to be less than 1. The adjustment response speed of the dynamic coefficient in the overload state is faster than that in the idle state. Obtain the user level and the corresponding baseline quota, calculate the dynamic coefficient with the baseline quota, and generate the real-time rate limiting threshold for each user level. Different user levels are configured with differentiated protection coefficients, which are used to ensure that the quota change of higher-level users is less than that of lower-level users when the system load changes. Traffic control is performed on user access requests based on real-time rate limiting thresholds.

2. The method according to claim 1, characterized in that: The overall load rate is calculated based on multi-dimensional performance indicators, and the system load state is determined based on the overall load rate, further including: The collected multi-dimensional performance indicators are smoothed and denoised to generate stable indicator estimates. The stable index estimates are weighted and fused to generate a comprehensive load factor. When the overall load rate is lower than the first threshold, it is determined to be in an idle state; when the overall load rate is higher than the second threshold, it is determined to be in an overload state; otherwise, it is determined to be in a normal state.

3. The method according to claim 2, characterized in that: In the weighted fusion, each metric is assigned a preset weight, with CPU utilization and 5xx HTTP status code ratio assigned higher weights, and memory usage and network bandwidth utilization assigned lower weights.

4. The method according to claim 3, characterized in that: The dynamic coefficient adjusts faster under overload conditions than under idle conditions. The implementation steps are as follows: In the idle state, the dynamic coefficient is gradually increased with the first adjustment step size until the preset upper limit threshold is reached; In the overload state, the dynamic coefficient is gradually reduced with the second adjustment step size until the preset lower threshold is reached; The absolute value of the second adjustment step is greater than the absolute value of the first adjustment step.

5. The method according to claim 4, characterized in that: Obtaining user levels and corresponding baseline quotas further includes: A clustering algorithm is used to dynamically classify users, taking into account the user's historical request frequency, service level agreement weight, and average resource consumption per request. Set up a total baseline quota pool for the system, perform an initial allocation based on the weight of each user level, and then perform a secondary allocation based on the number of users in each level to generate the baseline quota for each user level.

6. The method according to claim 5, characterized in that: Also includes: Real-time monitoring of request success rates for users at all levels; When the success rate of a certain level continues to fall below the preset target, the quota rebalancing mechanism is triggered to increase the baseline quota for that level.

7. The method according to claim 6, characterized in that: The dynamic coefficient is calculated with the baseline quota to generate real-time rate limiting thresholds for users at each level, further including: Multiply the dynamic coefficient by the base quota for each user level to obtain the initially adjusted quota; Multiply the initially adjusted quota by the protection coefficient corresponding to the level to obtain the real-time flow limiting threshold. The calculated real-time threshold is smoothed to control the magnitude of threshold variation between adjacent periods.

8. The method according to claim 7, characterized in that: Based on real-time rate limiting thresholds, traffic control is performed on user access requests, further including: Rate limiting is performed using the token bucket algorithm; When the number of requests exceeds the quota, high-priority requests are placed in a priority queue to wait, while low-priority requests are directly returned with a rate-limited response.

9. An adaptive user rate limiting system for an online platform, characterized in that, include: The system monitoring module is used to collect multi-dimensional performance indicators of the system in real time. These indicators include CPU utilization, memory usage, network bandwidth utilization, and the ratio of 5xx HTTP status codes used to characterize the frequency of server-side processing failures. The module calculates the overall load rate based on the multi-dimensional performance indicators and determines the system's load status based on the overall load rate. The load status includes at least idle and overload states. The dynamic coefficient calculation module is used to dynamically adjust the dynamic coefficient according to the load status. In the idle state, the dynamic coefficient is adjusted to be greater than 1, and in the overload state, the dynamic coefficient is adjusted to be less than 1. The adjustment response speed of the dynamic coefficient in the overload state is faster than that in the idle state. The tiered quota management module is used to obtain user levels and their corresponding baseline quotas. The real-time rate limiting adjustment module is used to calculate the dynamic coefficient and the baseline quota to generate the real-time rate limiting threshold for each level of user. Different levels of users are configured with differentiated protection coefficients, which are used to ensure that the quota change of higher-level users is less than that of lower-level users when the system load changes. The flow control execution module is used to perform flow control on user access requests based on real-time flow limiting thresholds.

10. A computer-readable storage medium, characterized in that: The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the method as described in any one of claims 1-8.