A CDN edge node cooperative optimization method, system, device and medium
By constructing a dynamic resource transfer graph and a distributed reputation mechanism among CDN edge nodes, collaborative optimization of CDN edge nodes is achieved, solving the problem of load imbalance among traditional CDN nodes and improving resource utilization and user access stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHENGDU TIANCHENG TECH CO LTD
- Filing Date
- 2026-05-18
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional CDN edge nodes lack effective information interaction and collaboration mechanisms, resulting in some nodes being overloaded and resources being idle, failing to meet the service quality requirements in complex scenarios, and causing problems such as increased cross-node traffic scheduling latency, increased bandwidth consumption, and service interruptions.
By establishing a state synchronization cycle among CDN edge nodes, constructing a dynamic resource transfer graph, performing local predictions and generating early warning events, and combining distributed reputation mechanisms and multi-objective collaborative scheduling decisions, collaborative optimization among nodes can be achieved.
It achieves proactive predictive resource scheduling, reduces cross-node scheduling latency and bandwidth consumption, improves resource utilization, avoids service lag, provides a stable user access experience, and promotes node self-optimization through a reputation scoring mechanism.
Smart Images

Figure CN122248033A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a CDN edge node collaborative optimization method, system, device and medium. Background Technology
[0002] With the explosive growth in demand for digital content distribution and the widespread adoption of services such as high-definition video, live streaming, cloud gaming, and IoT device interaction, Content Delivery Networks (CDNs) have become a core infrastructure for ensuring network service quality and reducing access latency. By deploying edge nodes globally, CDNs cache content from the origin server to nodes closest to the user, enabling "nearest access," effectively alleviating bandwidth pressure on the origin server, shortening data transmission paths, and improving the user experience.
[0003] However, with the increasing heterogeneity and dynamism of network traffic, the randomness of user access behavior and the rapid migration of hot content (such as traffic fluctuations in scenarios like viral videos, live broadcasts of major events, and e-commerce promotions) place higher demands on the collaborative scheduling capabilities of CDN edge nodes.
[0004] Traditional CDN edge nodes mostly operate independently, lacking effective information exchange and collaboration mechanisms between nodes. This often leads to an imbalance where some nodes are overloaded while others are idle, resulting in increased cross-node traffic scheduling latency, higher bandwidth consumption, and even service lag and interruption issues, failing to meet service quality requirements in complex scenarios. Summary of the Invention
[0005] To meet the service quality requirements in complex scenarios, this application provides a CDN edge node collaborative optimization method, system, device, and medium.
[0006] Firstly, this application provides a CDN edge node collaborative optimization method, which adopts the following technical solution: A CDN edge node collaborative optimization method includes: Each edge node in the target area periodically collects its own state data according to the set state synchronization period and constructs a dynamic resource transfer map, which serves as the basic input for local prediction and global collaboration. Based on its own state data and the dynamic resource transfer graph, each edge node performs local prediction in parallel. When a high-confidence resource hotspot transfer event is predicted, an early warning event is generated and broadcast to neighboring nodes. The resource hotspot transfer event represents a significant and predictable migration process of the corresponding request load from the current edge node to one or more other edge nodes within an adjacent time window for a specific resource object. Neighboring nodes verify the received warning events based on a distributed reputation mechanism to form a trusted warning consensus. Each edge node integrates its own state, the state of its neighboring nodes, local prediction results, and the aforementioned trusted early warning consensus to execute multi-objective collaborative scheduling decisions. The collaborative scheduling decision is executed, and the decision effect is audited within a preset verification period. The reputation scores between edge nodes are dynamically updated based on the audit results.
[0007] By adopting the above technical solution, each edge node constructs a dynamic resource transfer graph based on periodically collected status data. Combined with local parallel prediction, it proactively captures high-confidence resource hotspot transfer events, breaking the limitations of traditional CDNs that rely on static configuration or post-event adjustments. This transforms resource scheduling from passive response to proactive prediction, enabling pre-scheduling and deployment of resources before hotspot traffic migration occurs. This reduces latency and bandwidth consumption in cross-node traffic scheduling, improving overall resource utilization. The introduction of a distributed reputation mechanism ensures the authenticity and reliability of early warning events. Consensus verification by neighboring nodes filters out false or low-quality early warnings, avoiding scheduling chaos caused by erroneous warnings. Furthermore, multi-objective collaborative scheduling decisions integrate node status, neighbor status, local prediction, and trusted early warning consensus, achieving precise load balancing in complex and ever-changing network environments. This reduces service lag or interruption caused by single-node overload, providing users with a more stable access experience. The audit of decision-making effectiveness and dynamic updates of reputation scores during the preset verification period form a closed-loop mechanism of self-iteration and self-optimization. The reputation status of a node directly reflects its prediction accuracy and scheduling decision-making ability. This combination of positive incentives and negative constraints can drive the edge node group to continuously optimize its prediction model and scheduling strategy, allowing the entire CDN network to evolve continuously with the increase of running time and adapt to traffic characteristics and business needs in different scenarios.
[0008] Optionally, based on their own state data and the dynamic resource transfer graph, each edge node performs local prediction in parallel, specifically including: The collected time-series state data is input into the pre-trained lightweight time-series prediction model, which outputs the predicted node load risk value and the list of local hot resources in the near future. The dynamic resource transfer graph is input into a pre-trained lightweight graph neural network model, which outputs the transfer path and probability from the current hot resource to the next possible hot resource, along with the model confidence score. When both the transfer probability and confidence level exceed the set threshold, and the time-series prediction model detects that the real-time request rate of the corresponding source resource exceeds the dynamic threshold, it is determined as a high-confidence resource hotspot transfer event prediction.
[0009] By employing the aforementioned technical solutions, the lightweight time-series prediction model analyzes the collected time-series status data, enabling it to quickly output predicted node load risks and a list of local hot resources in the near future. This prediction method based on historical time-series data allows edge nodes to perceive their own load changes in advance and promptly grasp the distribution of local resource popularity, avoiding passive scheduling caused by a lag in their own status awareness. The lightweight graph neural network model's processing of dynamic resource transfer graphs breaks through the limitations of traditional predictions that only focus on the status of a single node. By mining the transfer paths and probabilities between resources, it can capture the migration direction and probability of hot resources. By combining multi-dimensional threshold judgments of transfer probability, confidence level, and real-time request rate, low-reliability prediction results are filtered out, ensuring that only resource hotspot transfer events with truly high confidence are triggered with early warnings. This not only reduces the network communication overhead and consensus verification pressure on neighboring nodes caused by invalid early warnings but also greatly improves the accuracy and value of early warning events, ensuring the efficiency and effectiveness of the entire CDN edge node collaborative optimization from the source.
[0010] Optionally, the steps of generating and broadcasting alert events to neighboring nodes specifically include: The predicted resource hotspot transfer events are encapsulated into standard early warning events. The event content includes at least: event type, source resource identifier, target resource identifier, prediction intensity, sending node identifier, and timestamp. The warning event is digitally signed using the private key of the sending edge node; The warning event, carrying a digital signature, is broadcast to the set of neighboring nodes via the security event bus.
[0011] By adopting the above technical solution, resource hotspot transfer events are encapsulated into a standard format containing event type, source and target resource identifiers, prediction intensity, sending node identifier, and timestamp. This standardizes and unifies the warning event information, allowing different edge nodes to quickly parse the warning content without additional adaptation, reducing communication adaptation costs between nodes, ensuring the integrity and consistency of warning information during transmission, and enabling neighboring nodes to clearly and accurately obtain the core information of hotspot transfer. Using the sending node's private key to digitally sign the warning event provides reliable assurance of its authenticity and immutability. Neighboring nodes can verify the validity of the signature using the sending node's public key, eliminating the possibility of malicious nodes forging warning events or tampering with warning information, and preventing false warnings from interfering with the entire collaborative scheduling system.
[0012] Optionally, the steps for neighboring nodes to perform consensus verification on the received early warning events based on a distributed reputation mechanism to form a trusted early warning consensus specifically include: After receiving the warning event, the neighboring node uses the public key of the sending node to verify the corresponding digital signature; After verification, each neighboring node performs a weighted vote on the warning event based on its own maintained historical reputation score for the sending node; If the weighted sum of all neighboring nodes' votes on the warning event exceeds a preset consensus threshold, then a credible consensus is reached on the warning event; otherwise, the warning event is marked as an untrustworthy or low-trustworthy event.
[0013] By adopting the above technical solution, neighboring nodes verify digital signatures by sending node public keys, directly confirming the authenticity and integrity of the source of the warning event from a technical perspective, and filtering out invalid warnings from unknown sources or those that have been tampered with. Each neighboring node performs weighted voting based on its own maintained historical reputation score of the sending node, binding the node's historical performance with the credibility of the current warning. Warnings issued by nodes with consistently reliable performance and high accuracy rates receive higher weight, while the influence of warnings from nodes with poor reputation is weakened. This not only accurately reflects the actual credibility of the warning event but also creates a positive incentive, encouraging edge nodes to prioritize maintaining their own reputation and improve the quality of warning information. When the total weighted vote exceeds a threshold, the credibility of the warning event is globally recognized, while warnings that do not reach the threshold are marked as low-credibility or untrustworthy, preventing erroneous information from misleading decisions. This allows warning verification in a distributed environment to break free from dependence on a central node, achieving autonomous and efficient consensus among nodes. This not only improves the transparency and fairness of the verification process but also provides reliable information support for the entire CDN edge node collaborative optimization system.
[0014] Optionally, the steps for executing multi-objective cooperative scheduling decisions specifically include: For high-priority resources indicated in the scheduling decision factor, if they are not cached locally, they are added to the high-priority prefetch queue, and the prefetch bandwidth is determined according to the prediction intensity and the local load. The scheduling decision factor is defined as a comprehensive information set obtained by "combining the local state, the state of neighboring nodes, the local prediction results and the early warning events verified by consensus". The high-priority resources include resources in the local hot resource list and the transfer target resources in the early warning events. Compare the real-time load risk values of the node itself with those of each neighbor node. If the node itself has a high risk and there are feasible neighbor nodes with low load, calculate the diversion ratio and divert some of the newly arriving requests to the feasible neighbor nodes with low load and reachable links. For a cache miss request, based on the resource cache status and link latency of neighboring nodes in the scheduling decision factors, the origin path with the optimal overall response latency is selected, and the origin path passes through a neighboring node that has cached the target resource.
[0015] By adopting the above technical solutions, the prefetching strategy for high-priority resources includes local hot resources and resources targeted for relocation in early warning events into a high-priority prefetching queue. The prefetching bandwidth is flexibly adjusted based on the predicted intensity and the user's own load. This ensures that resources about to become hotspots are cached locally in advance, avoiding resource shortages and origin-fetching pressure caused by a large influx of requests. It also allows for reasonable control of prefetching bandwidth based on the user's own load, preventing excessive resource consumption and disruption to normal service. This achieves both accuracy and rationality in resource prefetching. The dynamic load balancing mechanism compares the load risk values of the user and neighboring nodes in real time. When the user's load is too high, it automatically calculates the balancing ratio and diverts new requests to low-load, reachable neighboring nodes. This real-time dynamic balancing breaks the limitations of traditional static load balancing, enabling rapid response to changes in node load and avoiding service delays or even interruptions caused by single-node overload. Simultaneously, consideration of link reachability ensures that diverted requests are processed efficiently, effectively improving the load balancing level of the entire CDN network. For optimal origin-fetching path selection for cache-missed requests, the system combines the resource caching status of neighboring nodes and link latency to choose the path with the best overall response latency. By reusing resources already cached by neighboring nodes, it avoids the high bandwidth costs and long response times associated with direct origin-fetching, significantly reducing origin-fetching traffic and improving the response speed of user requests. These three levels of scheduling strategies work together synergistically, enabling edge nodes to make globally optimal scheduling decisions based on multi-dimensional information such as their own status, neighboring node status, local predictions, and trusted early warning events. This not only greatly improves the resource utilization and service stability of the CDN network but also provides users with a lower latency and smoother access experience, driving the transformation of CDN edge nodes from single resource storage nodes to intelligent and collaborative service nodes.
[0016] Optional steps for auditing the effectiveness of decisions and updating credit scores include: During the verification period following the execution of the collaborative scheduling decision, actual traffic patterns and system performance indicators are collected. Assess the accuracy of early warning events by comparing the predicted increase in demand for target resources with the predicted intensity, the consistency between the predicted transfer path and the actual resource transfer path, and the difference between the early warning event initiation time and the actual resource transfer initiation time. Evaluate the effectiveness of scheduling decisions: analyze the hit rate improvement of prefetching operations, the effect of traffic offloading operations on their own load, and the improvement of user response speed by path optimization; Based on all evaluation results, the reputation scores of relevant nodes are updated according to preset rules: nodes that provide accurate warnings or reliable status information have their reputation scores increased; nodes that provide false warnings or invalid information have their reputation scores decreased.
[0017] By employing the aforementioned technical solutions, the accuracy of early warning event predictions is assessed by comparing the predicted increase in actual resource requests to the target of resource transfer with the predicted intensity. This directly measures the accuracy of the edge node prediction model, allowing nodes with accurate predictions to be recognized, while nodes with significant prediction deviations are identified. The evaluation of the effectiveness of scheduling decisions is conducted from three dimensions: prefetching operation hit rate, load mitigation effect, and the improvement in response speed through path optimization. This comprehensively covers the core scenarios of multi-objective collaborative scheduling and can accurately determine the actual value of scheduling strategies. Furthermore, the mechanism of dynamically updating node reputation scores based on evaluation results binds node performance to its "credit value." Nodes providing accurate early warnings or reliable status information have their reputation scores increased, while those providing erroneous early warnings or invalid information have their reputation scores decreased. This combination of positive incentives and negative constraints motivates each edge node to proactively optimize its prediction model and scheduling strategy, improving the quality of early warning information and the rationality of scheduling decisions.
[0018] Optionally, the collaborative optimization method further includes: Each edge node periodically reports the de-identification operation package to the central cloud platform. The de-identification operation includes state sequences, predicted events, decision actions, and effect audit data. The central cloud platform aggregates data from the entire network, retrains the time series prediction and graph neural network models, and optimizes the parameters of the collaborative strategy. The central cloud platform calculates the parameter differences between the new model and the old version, generates a lightweight incremental update package, and distributes it to the edge nodes; Each edge node silently applies the incremental update package during periods of low load.
[0019] By adopting the above technical solution, the anonymized operation packages periodically reported by edge nodes include state sequences, predicted events, decision actions, and effect audit data. This protects user data privacy while providing the central cloud platform with real-world operational data covering the entire network. This massive amount of data from different scenarios and nodes provides rich sample support for the global training of the model, enabling the central cloud platform to perceive global traffic patterns and optimization opportunities that edge nodes cannot capture. The central cloud platform uses the aggregated network data to retrain time-series prediction and graph neural network models and optimize collaborative strategy parameters. This allows for iterative upgrades of the model from a global perspective, compensating for the model accuracy bottleneck caused by limited data and insufficient computing power at edge nodes, and generating prediction models and collaborative strategies that better fit the characteristics of network traffic. The lightweight incremental update package design avoids the huge bandwidth consumption and storage pressure caused by full model updates, only distributing the parameter differences between the old and new models to edge nodes, reducing update costs and allowing edge nodes to efficiently obtain the latest models and strategies. Each edge node silently applies incremental update packages during low-load periods, ensuring that the update process does not affect normal service operation and that the results of global optimization are applied locally in a timely manner, thereby continuously improving the prediction and scheduling capabilities of the edge nodes.
[0020] Secondly, this application provides a CDN edge node collaborative optimization system, which adopts the following technical solution: A CDN edge node collaborative optimization system includes: The edge data acquisition module is used to periodically collect the status data of each edge node in the target area according to the set status synchronization period, and to construct a dynamic resource transfer map based on the status data, which serves as the basic input for local prediction and global collaboration. A lightweight AI inference module is deployed on each edge node to perform local predictions in parallel based on its own state data and the dynamic resource transfer graph. When a high-confidence resource hotspot transfer event is predicted, an early warning event is generated. The resource hotspot transfer event represents a significant and predictable migration process of the corresponding request load from the current edge node to one or more other edge nodes within an adjacent time window for a specific resource object. The edge collaborative communication module is used to realize lightweight information synchronization and decision consensus among nodes, including broadcasting the warning event to neighboring nodes and performing consensus verification on the received warning event based on a distributed reputation mechanism to form a trusted warning consensus. The collaborative scheduling execution module is used by each edge node to perform multi-objective collaborative scheduling decisions by integrating its own status, the status of neighboring nodes, local prediction results, and the trusted early warning consensus. The central monitoring and global optimization module is used to audit the effectiveness of decisions within a preset verification period and dynamically update the reputation scores between edge nodes based on the audit results.
[0021] Thirdly, this application provides a computer device that adopts the following technical solution: A computer device includes a memory, a processor, and a computer program stored in the memory, the processor executing the computer program to implement the CDN edge node collaborative optimization method as described in the first aspect.
[0022] Fourthly, this application provides a computer-readable storage medium, which adopts the following technical solution: A computer-readable storage medium storing a computer program that can be loaded by a processor and executed as described in the first aspect of the CDN edge node collaborative optimization method.
[0023] In summary, this application includes at least one of the following beneficial technical effects: 1. Achieve proactive and predictive resource scheduling, breaking away from the passive mode of traditional CDN static configuration and post-event adjustment, completing resource pre-scheduling before hot traffic migration in advance, reducing cross-node scheduling latency and bandwidth loss, and improving the overall network resource utilization.
[0024] 2. Construct a distributed trusted early warning and multi-target collaborative scheduling mechanism. Filter false early warnings through reputation consensus verification, and achieve precise load balancing by combining multi-dimensional status information to avoid service lag or interruption caused by single-node overload, thus ensuring access stability.
[0025] 3. Form a closed-loop self-optimization and global iterative evolution system, rely on decision auditing to dynamically update node reputation scores, and cooperate with cloud-based lightweight model incremental updates to drive the CDN network to continuously adapt to traffic characteristics and business needs, and achieve long-term performance optimization. Attached Figure Description
[0026] Figure 1 This is a first flowchart of an embodiment of the method of this application; Figure 2 This is a second flowchart of an embodiment of the method of this application; Figure 3 This is a third flowchart of an embodiment of the method of this application; Figure 4 This is the fourth flowchart of an embodiment of the method of this application; Figure 5 This is the fifth flowchart of an embodiment of the method of this application; Figure 6 This is the sixth flowchart of an embodiment of the method of this application; Figure 7 This is the seventh flowchart of an embodiment of the method of this application. Detailed Implementation
[0027] To make the purpose, technical solution, and advantages of this application clearer, the following description is provided in conjunction with the appendix. Figures 1-7 The present application will be further described in detail below with reference to embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the application.
[0028] The first embodiment of this application discloses a CDN edge node collaborative optimization method. (Refer to...) Figure 1 The CDN edge node collaborative optimization method includes S110-S150: S110, each edge node in the target area periodically collects its own state data according to the set state synchronization period and constructs a dynamic resource transfer map as the basic input for local prediction and global collaboration; S120, based on its own state data and dynamic resource transfer graph, each edge node performs local prediction in parallel. When a high-confidence resource hotspot transfer event is predicted, an early warning event is generated and broadcast to neighboring nodes. The resource hotspot transfer event represents a significant and predictable migration process of the corresponding request load from the current edge node to one or more other edge nodes within an adjacent time window for a specific resource object. S130, neighboring nodes verify the received warning events based on a distributed reputation mechanism to form a trusted warning consensus; S140, each edge node integrates its own state, the state of neighboring nodes, local prediction results, and trusted early warning consensus to execute multi-objective collaborative scheduling decisions; S150 executes collaborative scheduling decisions and audits the effectiveness of these decisions within a preset verification period, dynamically updating the reputation scores between edge nodes based on the audit results.
[0029] Specifically, for step S110, the target area is determined according to geographical administrative divisions (such as provinces and cities) or network topology partitions (such as the coverage area of backbone network nodes). Each edge node within the target area is pre-configured with a unified status synchronization period. This period adopts a dynamically adjustable mechanism, flexibly adjusting according to the node's current load. When the load is low (CPU utilization ≤ 30%, bandwidth utilization ≤ 20%), the period is set to 30 seconds; when the load is high (CPU utilization ≥ 70% or bandwidth utilization ≥ 80%), the period is shortened to 10 seconds, ensuring a balance between the real-time nature of status data and resource consumption. Each edge node periodically collects its own core status data through a built-in resource monitoring module. The collection scope includes hardware resources (CPU utilization, memory usage, disk cache usage, network card bandwidth usage), network resources (link latency, packet loss rate, number of connections), and service resources (request rate of various resources, cache hit rate, number of cache misses). All collected data is accompanied by a timestamp accurate to milliseconds to avoid data time sequence disorder.
[0030] Then, a dynamic resource transfer graph is constructed, with specific resource objects as vertices. Vertex attributes include unique resource ID, resource size, current request rate, and cache node distribution. Directed edges are formed by a user's continuous request sequence for two different resources within the same session or a very short time window. The very short time window can be set to 100ms (to ensure the correlation of the request sequence and avoid interference from irrelevant requests). The same session is identified by the user session ID to ensure that continuous requests come from the same user. The edge weights dynamically decay and accumulate with increasing request frequency. Specifically, the initial weight is set to 0. Each time a consecutive request sequence that meets the conditions is detected, the weight is incremented by 1. At the same time, a time decay coefficient of 0.95 is set, with a fixed time base (e.g., 10 seconds) as the decay step size. After each time base step, the weight is multiplied by the decay coefficient (if no request is detected within the state synchronization period, the weight decays continuously according to the step size). This achieves the cumulative effect of request frequency while avoiding excessive influence of historical requests on the current graph state, ensuring that the dynamic resource transfer graph can reflect the request association patterns between resources in real time. This provides accurate and comprehensive basic input for subsequent local prediction and global collaboration. For example, if a user requests resources X and Y in the same session with an interval of 80ms between the two requests, a directed edge from X to Y is established in the dynamic resource transfer graph with resources X and Y as vertices, and the weight is incremented by 1. If the consecutive request sequence is not detected again in the next state synchronization period, the edge weight becomes 0.95. If it is detected again, the weight is incremented to 1.95.
[0031] Reference Figure 2 In S120, based on their own state data and the dynamic resource transfer graph, each edge node performs the local prediction step in parallel, specifically including S210-S230: S210 inputs the collected time-series state data into the pre-trained lightweight time-series prediction model and outputs the predicted node load risk value and local hot resource list in the near future. S220 inputs the dynamic resource transfer graph into the pre-trained lightweight graph neural network model and outputs the transfer path and transfer probability from the current hot resource to the next possible hot resource, along with the model confidence score. S230, when both the transfer probability and confidence level exceed the set threshold, and the time series prediction model detects that the real-time request rate of the corresponding source resource exceeds the dynamic threshold, it is determined as a high-confidence resource hotspot transfer event prediction.
[0032] Specifically, for step S210, the time-series state data collected in S110 is preprocessed to remove outliers (such as negative CPU utilization due to sensor failure, or data where the request rate suddenly increases to more than 10 times the normal peak value), missing data is supplemented by linear interpolation, and then the data is mapped to the [0,1] interval by Min-Max normalization to eliminate the influence of different units on the model.
[0033] The pre-trained lightweight time-series prediction model employs an improved LSTM (Long Short-Term Memory) network. Lightweighting is achieved through model pruning and quantization. Pruning removes redundant neural connections (retaining core connections and reducing model parameters by over 30%). Quantization converts model parameters from 32-bit floating-point numbers to 8-bit integers, reducing node computational consumption and enabling efficient operation on embedded hardware at edge nodes. The model input consists of time-series data from the past 10 state synchronization cycles (i.e., historical data of 300 seconds during off-peak hours and 100 seconds during peak hours). Input features include CPU utilization, memory usage, various resource request rates, and link latency. The output consists of two parts: first, a predicted node load risk for the next five state synchronization cycles (i.e., 150 seconds during off-peak hours and 50 seconds during peak hours). This value uses a 0-100 scoring system, with higher scores indicating higher load risk. For example, a predicted value of 85 indicates that the node is about to enter a high-load state and requires advance scheduling. Second, a local hot resource list is created by calculating the request frequency and growth rate of various resources over the past three state synchronization cycles. A hot resource threshold is set (e.g., request frequency ≥ 50 times / second and growth rate ≥ 10%), and resources meeting these criteria are filtered out and sorted in descending order of request frequency. For example, if a node's request frequency for resource X over the past three cycles was 60, 68, and 75 times / second, with a growth rate of 12.5%, and the request frequency for resource Y was 55, 58, and 60 times / second, with a growth rate of 8%, then the local hot resource list will only include resource X, ensuring the accuracy of hot resource identification.
[0034] For step S220, the dynamic resource transfer graph constructed in S110 is used as input to the pre-trained lightweight graph neural network model. This model can be either a lightweight graph convolutional network (GCN) or a graph attention network (GAT). Both models have been optimized for lightweight operation, simplifying the computation process, reducing the amount of computation, and adapting to the computing power limitations of edge nodes. GCN reduces computational complexity by simplifying graph convolutional kernel operations, while GAT reduces the number of parameters through sparsification of the attention mechanism. Nodes can flexibly choose the model type according to their own computing power.
[0035] During the model training phase, historical resource hotspot transfer data and dynamic resource transfer graph data are used as samples to label the transfer paths and corresponding probabilities of different hotspot resources. Backpropagation is used to optimize model parameters, enabling the model to accurately learn the request association patterns and transfer characteristics between resources. The graph neural network aggregates features of adjacent resource nodes through message passing on the topology. Specifically, each resource node receives feature information from its neighboring nodes (resources connected by directed edges), combines it with its own features, and performs weighted fusion. The weights are automatically learned during model training, thereby uncovering potential associations between resources and outputting the transfer path from the currently identified hotspot resource to the next potential hotspot resource and its corresponding transfer probability. The transfer path is presented in the form of "current hotspot resource - potential hotspot resource," and can contain multiple paths, arranged in descending order of transfer probability. For example, if the current hotspot resource is resource X, the model predicts that its next potential hotspot resources are resource Y and resource Z, with corresponding transfer probabilities of 75% and 20%, respectively. Simultaneously, the model outputs the model confidence score based on the Dropout mechanism to characterize the reliability of the prediction result. A lightweight Monte Carlo Dropout mechanism is employed, randomly discarding some neurons during inference and performing repeated inference only a set number of times (e.g., 3 times). The model confidence is estimated by calculating the variance of these few predictions (range 0-1). The smaller the variance, the more stable the prediction and the higher the confidence, thus providing a reliability assessment while maintaining lightweight design. If there are no clear local hotspot resources, the model output is empty, waiting for the next state synchronization cycle to re-input data for prediction.
[0036] For step S230, a preset transition probability threshold and model confidence threshold are established. These thresholds can be dynamically adjusted based on actual business needs. Typically, the transition probability threshold is set to 70%, and the model confidence threshold is set to 0.85, ensuring that only highly reliable prediction results are considered valid events. The time-series prediction model monitors the real-time request rate of the source resource (i.e., the resource experiencing a surge in requests, selected from the local hotspot resource list, and whose request rate increases by ≥ the growth threshold within the current period compared to the previous period). Simultaneously, it calculates the dynamic threshold for this resource. The dynamic threshold is based on the average request rate over the past 7 state synchronization periods, adjusted in conjunction with the node's current load. The formula is: Dynamic Threshold = Historical Average Request Rate × (1 + Current Load Coefficient), where the current load coefficient = Current CPU Utilization / Preset CPU High Load Warning Threshold, with a range of 0.2-2.0. A high-confidence resource hotspot transition event prediction is determined when three conditions are met: 1) the model output transition probability ≥ 70%; 2) the model confidence ≥ 0.85; and 3) the real-time request rate of the source resource exceeds the calculated dynamic threshold. For example, if the real-time request rate of source resource X is 180 times / second, exceeding the dynamic threshold of 176 times / second, and the model predicts that the probability of it being transferred to node B is 78% with a confidence level of 0.90, both of which exceed the preset threshold, then it is determined to be a resource hotspot transfer event. This event is specifically characterized by the fact that in the next 5 state synchronization cycles, the request load of resource X will undergo a significant and predictable migration from the current node A to node B, with the migration range expected to reach more than 70% of the current request volume.
[0037] Reference Figure 3 In S120, the steps of generating and broadcasting the warning event to neighboring nodes specifically include S310-S330: S310 encapsulates the predicted resource hotspot transfer events into standard early warning events. The event content includes at least: event type, source resource identifier, target resource identifier, prediction intensity, sending node identifier, and timestamp. S320 uses the private key of the sending edge node to digitally sign the warning event; The S330 broadcasts the alert event carrying a digital signature to the set of neighboring nodes via the security event bus.
[0038] Specifically, for step S310, the high-confidence resource hotspot transfer events are encapsulated as standard early warning events in JSON format to ensure the standardization and parsability of the event content, facilitating subsequent transmission and processing between nodes. The event content includes core elements, each implemented as follows: Event type is represented by an enumeration value, with the resource hotspot transfer event corresponding to the enumeration value "0x01" for quick event category identification; the source resource identifier adopts the format "node identifier-resource unique ID", where the node identifier is the MAC address (unique identifier) of the edge node, and the resource unique ID is the URL hash value (32 bits) of the resource; the target resource identifier has the same format as the source resource identifier. If there are multiple target resources, all target resource identifiers are listed sequentially, separated by commas; the prediction intensity adopts a 0-10 scoring system, with the score calculated based on the transfer probability and request growth rate, using the formula: Prediction Intensity = Transfer Probability × 10 (rounded to the nearest integer). For example, a transfer probability of 78% results in a prediction intensity of 8 points, indicating a high intensity of the transfer event; the sending node identifier is the MAC address of the edge node that initiated the warning event; the timestamp uses a UTC timestamp, accurate to milliseconds, ensuring the event's time traceability. For example, a complete standard early warning event JSON format is as follows: {"Event Type":"0x01","Transfer Source Resource Identifier":"00:1A:2B:3C:4D:5E-8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c3d","Transfer Target Resource Identifier":"00:1A:2B:3C:4D:5F-8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c3d","Prediction Intensity":8,"Sending Node Identifier":"00:1A:2B:3C:4D:5E","Timestamp":1713000000123}.
[0039] For step S320, an asymmetric encryption algorithm (RSA-2048) is used to digitally sign the encapsulated standard warning event, ensuring the integrity and immutability of the warning event while verifying the legitimacy of the sending node. The sending edge node pre-stores its private key in its local hardware security module (HSM). The private key is stored encrypted and has a length of 2048 bits. The private key is changed periodically (e.g., every 7 days) to prevent leakage. The signing process consists of three steps: First, the JSON string of the standard warning event is hashed using SHA-256 to obtain a 256-bit hash value, eliminating the impact of the event content length on the signature. Then, the hash value is encrypted using the sending node's private key to obtain a digital signature (2048 bits long). Finally, the digital signature is concatenated with the JSON string of the standard warning event to form a warning event carrying the digital signature. The concatenation format is "warning event JSON string + separator + digital signature," with the separator "|" for easy verification by neighboring nodes.
[0040] For step S330, a secure event bus based on the MQTT (Message Queuing Telemetry Transport Protocol) is established. This bus adopts a publish-subscribe model, supporting efficient and secure communication between edge nodes, while adapting to low-bandwidth, high-latency scenarios for edge nodes. The sending edge node, acting as the publisher, publishes alert events with digital signatures to a designated topic on the event bus (topic format: "CDN / Edge Node Collaboration / Alert Event / Target Region ID"). The target region ID is a unique identifier for the region where the current node is located, ensuring that only neighboring nodes within that region can subscribe to the corresponding event. The neighbor node set is defined as: edge nodes located in the same target region as the sending node, with link latency ≤ 50ms and link packet loss rate ≤ 1%. The neighbor node list is dynamically updated by the sending node through periodic link probing (e.g., every 60 seconds) to ensure the validity of the neighbor nodes. To ensure transmission security, the event bus uses the TLS 1.3 encryption protocol for data transmission, and performs end-to-end encryption on the transmitted warning events to prevent data theft or tampering. At the same time, a retransmission mechanism is set up. If the sending node does not receive an acknowledgment message from the event bus within 500ms, it will retransmit the warning event, up to 3 times. If it still fails, it will be logged and reported to the central cloud platform to ensure that the warning event can be successfully broadcast to all neighboring nodes.
[0041] Reference Figure 4 S130, the steps for neighboring nodes to verify the received warning events based on a distributed reputation mechanism and form a trusted warning consensus specifically include S410-S430: S410: After receiving the warning event, the neighboring node uses the public key of the sending node to verify the corresponding digital signature. S420, after verification, each neighboring node performs a weighted vote on the warning event based on its own maintained historical reputation score for the sending node; S430: If the weighted sum of all neighboring nodes' votes on the warning event exceeds the preset consensus threshold, then a credible consensus on the warning event is determined; otherwise, the warning event is marked as an untrustworthy or low-trustworthy event.
[0042] Specifically, for step S410, neighboring nodes, as subscribers to the event bus, listen for messages on the corresponding topic in real time. Upon receiving an alert event carrying a digital signature, they separate the alert event JSON string and the digital signature using the separator "|" according to the agreed-upon concatenation format to avoid confusion between the two. Then, the neighboring node retrieves the public key of the sending node from its local cache. This public key is uniformly distributed by the central cloud platform. Each edge node retrieves the public keys of all nodes within its region from the central cloud platform during initialization and updates them periodically (e.g., every 24 hours) to ensure the validity and timeliness of the public keys. The verification process consists of three steps: First, the SHA-256 hash operation is performed on the split JSON string of the warning event to obtain the same 256-bit hash value as the sending node; then, the digital signature is decrypted using the sending node's public key to obtain the hash value of the sending node before encryption; finally, the locally calculated hash value is compared with the decrypted hash value. If the two are completely consistent, the digital signature verification passes, indicating that the warning event has not been tampered with and the sending node's identity is legitimate; if the two are inconsistent, the verification fails, the neighboring node directly discards the warning event, and records the sending node identifier, event content, and reason for verification failure for subsequent node reputation score updates.
[0043] For step S420, after the digital signature verification is successful, each neighboring node performs a weighted vote on the warning event based on its own maintained historical reputation score for the sending node. The core purpose of the vote is to further verify the credibility of the warning event and avoid a single node providing a false warning. Each neighboring node maintains a local reputation score table that records the reputation scores of all nodes it interacts with. The score ranges from 0 to 100, with an initial score of 80. The score is dynamically updated based on subsequent evaluation results. The specific rules for weighted voting are as follows: each neighboring node's voting weight equals its own reputation score divided by the sum of the reputation scores of all participating neighboring nodes. Voting values are divided into 1 (support, believing the warning event is credible) and 0 (oppose, believing the warning event is unreliable). Neighboring nodes give their votes based on their own judgment of the warning event (combining their own state data and local prediction results). For example, if a neighboring node believes that the target resource in the warning event matches its own locally predicted hotspot resource, it casts 1 vote; otherwise, it casts 0 votes. The weighted vote total is calculated as the sum of (voting weight × voting value) of all neighboring nodes. For example, if the reputation scores of three neighboring nodes are 90, 80, and 70, and their voting values are all 1, then their voting weights are 90 / (90+80+70)=0.375, 80 / 240≈0.333, and 70 / 240≈0.292, respectively. The weighted vote total is 0.375+0.333+0.292≈1.0.
[0044] For step S430, a consensus threshold is preset. This threshold is dynamically adjusted based on the number of neighboring nodes participating in the vote to ensure the rationality of the consensus result. The specific rules are as follows: when the number of neighboring nodes is 3-5, the consensus threshold is set to 0.7; when the number of neighboring nodes is 6-10, the consensus threshold is set to 0.6; when the number of neighboring nodes exceeds 10, the consensus threshold is set to 0.55. The core principle of threshold setting is to balance the rigor and efficiency of consensus, and to avoid the inability to reach consensus due to an excessively high threshold, or the acceptance of false warnings due to an excessively low threshold. After calculating the weighted sum of votes from all neighboring nodes, it is compared with a preset consensus threshold: if the weighted sum exceeds the consensus threshold, a credible consensus is reached on the warning event; if the weighted sum does not exceed the consensus threshold, the warning event is marked as an untrustworthy or low-trustworthy event. The specific marking rules are as follows: when the weighted sum is ≥0.5 and < the consensus threshold, it is marked as a low-trustworthy event and is not used for scheduling decisions, but the event record is retained for subsequent re-evaluation; when the weighted sum is <0.5, it is marked as an untrustworthy event and is discarded directly. At the same time, the result is fed back to the sending node for subsequent reputation score adjustment.
[0045] Reference Figure 5 S140, the steps for executing multi-objective cooperative scheduling decisions specifically include S510-S530: S510, for high-priority resources indicated in the scheduling decision factor, if they are not cached locally, they are added to the high-priority prefetch queue, and the prefetch bandwidth is determined according to the prediction intensity and its own load; the scheduling decision factor is defined as a comprehensive information set obtained by "combining its own state, the state of neighboring nodes, local prediction results and consensus-verified early warning events", and high-priority resources include resources in the local hot resource list and transfer target resources in the early warning events; S520 compares the real-time load risk values of its own node with those of each neighbor node. If its own risk is too high and there are feasible neighbor nodes with low load, it calculates the diversion ratio and diverts some of the newly arrived requests to feasible neighbor nodes with low load and reachable links. S530: For cache miss requests, based on the resource cache status and link latency of neighboring nodes in the scheduling decision factors, selects the origin path with the optimal overall response latency. The origin path passes through a neighboring node that has cached the target resource.
[0046] Specifically, for step S510, the scheduling decision factor is a comprehensive set of information obtained by integrating its own status (real-time status data collected by S110), neighbor node status (real-time status of neighbor nodes obtained synchronously through the security event bus), local prediction results (load risk prediction value and local hot resource list output by S120), and early warning events verified by consensus. The information of each part is quantified into a score of 0-10 by weighted summation, which serves as the basis for scheduling decisions.
[0047] High-priority resources are clearly categorized into two types: first, all resources in the local hotspot resource list output by S120; and second, the transfer target resources in the consensus-reached early warning events. These two types of resources are directly related to user request experience and node load, and must be processed first. For high-priority resources, neighboring nodes first query their own local cache index. If there is no record of the resource in the cache index (i.e., not cached locally), the resource is added to the high-priority prefetch queue. The prefetch queue is managed using a FIFO (First-In, First-Out) mechanism, and the queue capacity is set according to the node's cache size, typically 10% of the cache capacity, to ensure that the prefetch operation does not consume too many cache resources. The prefetch bandwidth is determined based on the prediction intensity and the node's own load. The specific calculation formula is: Prefetch bandwidth = Remaining bandwidth × (Prediction intensity / 10), where remaining bandwidth = Total node bandwidth - Current bandwidth usage.
[0048] For step S520, each edge node calculates its own real-time load risk value in real time. The calculation method is consistent with the scoring system of the load risk prediction value in S120 (0-100 points), based on the weighted calculation of indicators such as CPU utilization, memory utilization, bandwidth utilization, and number of connections, with weights of 40%, 30%, 20%, and 10%, respectively. For example, if the CPU utilization is 60%, memory utilization is 50%, bandwidth utilization is 70%, and the number of connections is saturated at 40%, then the real-time load risk value = 60 × 0.4 + 50 × 0.3 + 70 × 0.2 + 40 × 0.1 = 24 + 15 + 14 + 4 = 57 points. Then, the node synchronously obtains the real-time load risk values of all neighboring nodes through the security event bus, compares the differences between its own load risk and those of each neighboring node, and selects feasible neighboring nodes. The criteria for determining feasible neighboring nodes are: real-time load risk value ≤ 60 points (low load state), link latency ≤ 50ms, link packet loss rate ≤ 1%, and link bandwidth ≥ 100Mbps, ensuring that the node has the ability to handle traffic offloading requests. If the real-time load risk value of the node itself is ≥80 points (high load state) and there is at least one feasible neighbor node, then the diversion ratio is calculated. The formula for calculating the diversion ratio is: Diversion ratio = (self-load risk value - feasible neighbor node load risk value) / self-load risk value × diversion coefficient. The diversion coefficient is preset to 0.2-0.5 and is adjusted according to the urgency of the self-load. The more urgent the load, the larger the diversion coefficient. For example, if the self-load risk value is 85 points, the feasible neighbor node load risk value is 50 points, and the diversion coefficient is 0.3, then the diversion ratio = (85-50) / 85×0.3≈12.35%, that is, the node will divert 12.35% of the newly arriving requests to the feasible neighbor node.
[0049] For step S530, when a user initiates a request for a specific resource to an edge node, the node first queries its local cache index. If the cache index does not contain a record for the resource, and the cache does not store the complete data for that resource, the request is considered a cache miss. At this point, based on the resource cache status of neighboring nodes and link latency in the scheduling decision factors, the node selects the origin-back path with the optimal overall response latency. The core requirement for the origin-back path is that it passes through a neighboring node that has cached the target resource, avoiding a direct return to the origin cloud platform and reducing response latency. First, the node obtains the resource cache status of all neighboring nodes through the security event bus (synchronizing every 500ms), filters out neighboring nodes that have cached the target resource, and forms a candidate neighboring node list. Then, it measures the real-time link latency between itself and each candidate neighboring node (using the ping command, measuring every 100ms, and taking the average of 3 measurements). Finally, it calculates the overall response latency of each candidate neighboring node, where overall response latency = link latency + neighboring node cache read latency, and the neighboring node cache read latency is usually 1-5ms (adjusted according to the neighboring node cache type, SSD cache read latency is 1-2ms, HDD cache read latency is 3-5ms). The candidate neighboring node with the lowest overall response latency is selected as the intermediate node of the back-to-origin path. The node initiates a resource request to this neighboring node, obtains the resource, and then forwards it to the user.
[0050] If no neighboring nodes caching the target resource are found after querying via the security event bus, the node will send a resource cache query request to all neighboring nodes again via the security event bus (shortening the query cycle) to confirm that no neighboring nodes have been missed. Simultaneously, it checks its own cache for resource fragments (if fragments exist, fragment splicing is immediately initiated, and a response to the user is given immediately after splicing). If, after the second confirmation, no neighboring node still caches the resource, the node abandons the attempt to retrieve the resource from the neighboring node and directly adopts the direct connection path from the edge node to the central cloud platform origin server. At this point, based on the link latency factor in the scheduling decision, the direct connection path with the minimum link latency and the most sufficient bandwidth to the central cloud platform origin server will be selected. After retrieving the target resource from the origin server, the resource is immediately added to the local high-priority cache (the cache duration is set to 2 hours, which can be dynamically adjusted according to the frequency of subsequent requests). Simultaneously, the information that "no neighboring node caches this resource" is reported to the central cloud platform via the security event bus. The central cloud platform will subsequently optimize the pre-caching strategy for this resource, instructing surrounding edge nodes to pre-fetch the resource in advance to avoid the situation of no neighboring node caching occurring again.
[0051] Reference Figure 6 S150, the steps for auditing the effectiveness of decisions and updating credit scores, specifically including S610-S640: S610 collects actual traffic patterns and system performance indicators during the verification period after the execution of the collaborative scheduling decision. S620, assess the accuracy of the early warning event prediction: compare the actual request growth of the predicted target resource transfer in the early warning event with the predicted intensity, the consistency between the predicted transfer path and the actual resource transfer path, and the difference between the early warning event initiation time and the actual resource transfer start time. S630, evaluate the effectiveness of scheduling decisions: analyze the hit rate improvement of prefetching operations, the effect of traffic offloading operations on their own load, and the improvement of user response speed by path optimization; S640 updates the reputation scores of relevant nodes according to preset rules based on all evaluation results: increases the reputation score of nodes that provide accurate warnings or reliable status information; and decreases the reputation score of nodes that provide false warnings or invalid information.
[0052] Specifically, for step S610, after the collaborative scheduling decision is executed, a fixed verification period is set. The duration of the verification period is determined based on the prediction time window of the warning event, typically 50% of the prediction time window. For example, if the predicted resource transfer time window for the warning event is 50 seconds (during peak hours), the verification period is set to 25 seconds; if the prediction time window is 150 seconds (during off-peak hours), the verification period is set to 75 seconds, ensuring sufficient data collection on the actual effects of the scheduling decision. During the verification period, the node collects two types of data in real time through the resource monitoring module: first, actual traffic patterns, including the actual request rate of various resources, request distribution (the proportion of requests from different user IPs), and the actual path and magnitude of resource transfers, collected every 10 seconds to ensure real-time data; second, system performance indicators, including CPU utilization, memory usage, bandwidth utilization, average user response latency, cache hit rate, and traffic splitting success rate (the proportion of normal responses to split requests), collected every 5 seconds to comprehensively evaluate the impact of the scheduling decision on system performance. All collected data is timestamped and associated with the execution time of scheduling decisions. It is stored in a dedicated directory on the local disk and backed up to the central cloud platform (after being anonymized).
[0053] For step S620, key prediction information is extracted from the early warning events, including the predicted source resources, target resources, transfer paths, prediction intensity (corresponding to the increase in requests), and prediction time window (i.e., the predicted duration of resource transfer). This information is then compared one by one with the actual data during the verification period to ensure the comprehensiveness of the assessment. Secondly, the prediction accuracy is calculated based on the match between the prediction intensity and the actual request growth. The calculation method is: Prediction accuracy = (Number of early warning events where the actual request growth is ≥ 80% of the predicted growth) / Total number of early warning events × 100%, where the predicted growth = prediction intensity × 10% (e.g., a prediction intensity of 8 points corresponds to a predicted growth of 80%). Second, assess the consistency between the predicted transfer path and the actual resource transfer path. The calculation method is: Transfer path accuracy rate = (Number of early warning events where the actual transfer path is completely consistent with the predicted transfer path) / Total number of early warning events × 100%. If the actual transfer path includes the core nodes in the predicted path (≥80% of the target nodes), it is judged as partially consistent and counted as 50% in the statistics. Third, assess the difference between the early warning event initiation time and the actual resource transfer start time, and set a timeliness threshold (30 seconds during off-peak hours and 10 seconds during peak hours). If the difference is ≤ the timeliness threshold, it is judged as timely and qualified; otherwise, it is unqualified. Timeliness qualification rate = Number of timely and qualified early warning events / Total number of early warning events × 100%. Combine the three indicators to give a comprehensive evaluation result and divide it into three evaluation levels: Excellent (prediction accuracy rate ≥85%, transfer path accuracy rate ≥80%, timeliness qualification rate ≥90%), Qualified (prediction accuracy rate ≥70% and <85%, transfer path accuracy rate ≥60% and <80%, timeliness qualification rate ≥75%), Unqualified (any one indicator is below the qualified line). Record the individual indicator scores and comprehensive evaluation levels for each early warning event, associate them with the sending node identifier and early warning event ID, and synchronously store them in local logs and the central cloud platform. This provides a direct basis for updating node reputation scores and also provides sample annotations for retraining the model on the S720 central cloud platform, facilitating the optimization of model prediction accuracy. Example: For an early warning event with a predicted intensity of 8 points (corresponding to an 80% growth rate), a transfer path of resource X → resource Y, and a prediction time window of 50 seconds (at peak times), if the actual growth rate during the verification period is 75%, the actual transfer path is resource X → resource Y, and the difference between the actual start time and the early warning initiation time is 8 seconds (≤10 seconds), then the prediction accuracy rate for this early warning event is 75% / 80% × 100% = 93.75%, the transfer path accuracy rate is 100%, the timeliness pass rate is 100%, and the comprehensive evaluation is excellent. If the actual transfer path is resource X → resource Z, and the actual growth rate is 40%, then the prediction accuracy rate is 50%, the transfer path accuracy rate is 0%, and the comprehensive evaluation is unqualified.
[0054] For step S630, the effectiveness of the scheduling decision is evaluated from three dimensions to comprehensively assess its impact on system performance and provide a basis for subsequent strategy optimization. The first dimension is the increase in prefetch hit rate, calculated as follows: Hit rate increase = (Prefetch cache hit rate - Prefetch cache hit rate) / Prefetch cache hit rate × 100%, where the prefetch cache hit rate is the hit rate of the state synchronization cycle before the scheduling decision is executed, and the prefetch cache hit rate is the average hit rate during the verification period. For example, if the prefetch hit rate is 65% and the prefetch hit rate is 78%, then the hit rate increase = (78% - 65%) / 65% × 100% ≈ 20%, indicating that the prefetch operation effectively improves cache utilization. The second dimension is the effect of traffic offloading on its own load, calculated as: Load mitigation rate = (Load risk value before offloading - Load risk value after offloading) / Load risk value before offloading × 100%. For example, if the load risk value before offloading is 85 points and after offloading it is 68 points, then the load mitigation rate = (85-68) / 85×100%≈20%, indicating that the traffic offloading operation effectively reduced the node load. The third dimension is the improvement of user response speed by path optimization, calculated as: Response latency improvement rate = (Average response latency before path optimization - Average response latency after path optimization) / Average response latency before path optimization × 100%. For example, if the average response latency before optimization is 120ms and after optimization it is 85ms, then the improvement rate = (120-85) / 120×100%≈29.17%, indicating that path optimization effectively improved the user experience. After independent evaluation across the three dimensions, an overall result for the effectiveness of the scheduling decision is given, with the following judgment rules: If all three dimensions are satisfactory, the overall decision is "effective"; if two dimensions are satisfactory and one dimension is unsatisfactory, the overall decision is "basically effective"; if one dimension is satisfactory and two dimensions are unsatisfactory, the overall decision is "inefficient"; if all three dimensions are unsatisfactory, the overall decision is "ineffective". Upon completion of the evaluation, a scheduling decision effectiveness report is generated, detailing the individual evaluation results, satisfactory status, and overall effectiveness judgment for each of the three dimensions. This report is simultaneously stored in local logs and on the central cloud platform, providing direct evidence for updating node reputation scores and optimizing subsequent collaborative strategies.
[0055] For step S640, the reputation adjustment rules are set as follows (score range 0-100 points, bonus points exceeding 100 points are counted as 100 points, and deduction points below 0 points are counted as 0 points): 1. "Excellent" early warning + "effective" scheduling: add 15 points; if all three dimensions of the S630 single indicator exceed the threshold by 10% at the same time (such as hit rate improvement ≥20%, load mitigation rate ≥25%, response latency improvement rate ≥30%), then add an extra 3 points (maximum of 18 points per instance).
[0056] 2. Early warning "excellent" + dispatch "basically effective": add 8 points.
[0057] 3. "Excellent" early warning + "Inefficient" dispatch: Add 2 points (acknowledging the accuracy of the early warning, without pursuing responsibility for the inefficient dispatch).
[0058] 4. "Excellent" early warning + "ineffective" dispatch: deduct 4 points (taking into account the advantages of early warning, and appropriately deducting the responsibility for dispatch failure).
[0059] 5. Warning "Qualified" + Dispatch "Effective": Add 10 points.
[0060] 6. Early warning "qualified" + dispatch "basically effective": add 3 points.
[0061] 7. Warning "qualified" + Dispatch "inefficient": deduct 3 points.
[0062] 8. Warning "qualified" + Dispatch "ineffective": Deduct 8 points.
[0063] 9. Warning “Unqualified” + Dispatch “Effective”: Deduct 2 points (acknowledging the effectiveness of dispatch, and appropriately deducting the responsibility for the inaccurate warning).
[0064] 10. Warning "Unqualified" + Dispatch "Basically Effective": Deduct 7 points.
[0065] 11. Warning "unqualified" + Dispatch "inefficient": deduct 12 points.
[0066] 12. Warning “unqualified” + dispatch “ineffective”: deduct 20 points (double dereliction of duty, strengthen constraints).
[0067] In addition, special scenario rules are added (which do not conflict with the above rules, but are additionally adjusted): 1. Receive "Excellent Early Warning + Effective Dispatch" rating 3 times in a row: Add 15 points; Receive "Unsatisfactory Early Warning + Ineffective Dispatch" rating 3 times in a row: Deduct 25 points.
[0068] 2. Retrospective Evaluation and Rewards / Penalties for Low-Reliability Warnings: For low-reliability warning events that are not yet globally scheduled due to not reaching the consensus threshold, a retrospective evaluation will be conducted after the verification period. If the verification shows an accurate warning (excellent / qualified), and the sending node is observed to have performed prefetching or autonomous traffic splitting actions locally based on the warning, achieving the effective / basically effective standard, then the sending node will receive 5 points, and neighboring nodes participating in the opposing vote will receive 3 points. If the retrospective verification shows an erroneous warning (unqualified), and the sending node still blindly performs invalid local actions (manifesting as inefficient / ineffective), then the sending node will receive 10 points, and neighboring nodes participating in the supporting vote will receive 5 points.
[0069] 3. Status data reliability: If the deviation between status data and actual data is ≤10% (reliable), add 3 points; if the deviation is ≥30% (invalid), deduct 8 points (unrelated to early warning and scheduling results, added separately).
[0070] After the reputation score is updated, it is synchronized to the local reputation score table in real time, and then synchronized to all neighboring nodes through the security event bus to ensure that the reputation information among nodes is consistent.
[0071] Furthermore, this collaborative optimization method also includes S710-S740: S710: Each edge node periodically reports the de-identification operation package to the central cloud platform. The de-identification operation includes state sequences, predicted events, decision actions, and effect audit data. S720, the central cloud platform aggregates data from the entire network, retrains the time series prediction and graph neural network models, and optimizes the parameters of the collaborative strategy; S730: The central cloud platform calculates the parameter differences between the new model and the old version, generates a lightweight incremental update package, and distributes it to the edge nodes. S740: Each edge node silently applies incremental update packages during periods of low load.
[0072] Specifically, for step S710, each edge node periodically reports the de-identification operation package to the central cloud platform according to the preset reporting cycle. The reporting cycle is set to 2 hours. To avoid network congestion caused by centralized reporting, a staggered reporting mechanism is adopted, with the reporting time of nodes in different regions staggered by 10 minutes (for example, node 1 in region 1 reports at 0 minutes past the hour, and node 2 in region 2 reports at 10 minutes past the hour). The core of the data anonymization package is to anonymize sensitive data to ensure data privacy and security while preserving data usability. The anonymization process specifically includes: anonymizing user IP addresses in state sequences (keeping the first two segments of the IP address and replacing the last two segments with "*", e.g., "192.168.*.*"); anonymizing resource identifiers in predicted events (keeping the first 8 bits of the resource URL hash and replacing the last 24 bits with "*", e.g., "8a7b6c5d**********"); anonymizing node MAC addresses in decision-making action and effect audit data (keeping the first 4 segments and replacing the last 2 segments with "*", e.g., "00:1A:2B:3C:**:**"); and deleting user personal information (such as phone numbers and usernames) from the audit data to ensure that the anonymized data does not contain any identifiable personal or device sensitive information. The de-identification operation package uses ZIP compression format and contains de-identified state sequences, predicted events, decision actions, and effect audit data. It also adds a data verification code (MD5 hash value) for the central cloud platform to verify the integrity of the data. The reporting uses the HTTPS protocol to ensure data security during transmission. If the reporting fails, the node will re-report after 30 minutes, with a maximum of 3 retries. If it still fails, it will log the data and wait for the next reporting cycle.
[0073] For step S720, after receiving the de-identification operation packets reported by all edge nodes, the central cloud platform first verifies the data in each packet by comparing MD5 hash values to check the data's integrity and tamper-proof nature. If verification fails, the corresponding edge node is notified to re-report; if verification succeeds, the de-identified data is aggregated. Aggregation is performed by region and node type, grouping edge node data from the same region and type (e.g., metropolitan area nodes, county area nodes) together. Outlier data (e.g., data exceeding three standard deviations) is removed, and missing data is filled using mean imputation to form a unified dataset across the entire network. The central cloud platform then uses this dataset to retrain the time-series prediction model and the graph neural network model. The training process employs a distributed training approach, utilizing the cloud platform's GPU cluster to improve training efficiency. The training objective is to reduce the load risk prediction error of the time-series prediction model and improve the resource transfer path prediction accuracy of the graph neural network model. Meanwhile, based on aggregated data, the parameters of the collaborative strategy are optimized, including the adjustment threshold of the state synchronization cycle, the threshold of the transition probability and confidence, the consensus threshold, the traffic splitting coefficient, and the prefetch bandwidth calculation ratio. The optimization principle is to adjust according to the average performance indicators of the entire network. For example, if the average cache hit rate of the entire network is less than 70%, the calculation ratio of prefetch bandwidth is reduced and the number of prefetch resources is increased; if the average response latency of the entire network is higher than 100ms, the selection strategy of the back-to-origin path is optimized and the link latency weight is reduced.
[0074] For step S730, after the central cloud platform completes model retraining and policy parameter optimization, it calculates the parameter differences between the new and old models. Using a parameter comparison algorithm (cosine similarity comparison), it extracts parameters with significant differences between the new and old models (parameters with cosine similarity < 0.9), eliminates redundant parameter differences, and generates a lightweight incremental update package. This avoids the bandwidth consumption and node computing power requirements caused by a full update. The incremental update package is compressed using the LZ4 compression algorithm, achieving a compression ratio of ≥ 80%, ensuring a small update package size for easy download by edge nodes. The update package is named in the format "model type_version number_update time.zip", for example, "Time Series Prediction Model_V2.1_202604141000.zip", facilitating edge nodes' identification of the update package type and version. The central cloud platform distributes incremental update packages to each edge node via HTTPS protocol, using a batch distribution mechanism. The packages are first distributed to the core nodes (nodes with low load and high computing power) within the region. After the core nodes receive and verify the update packages, they are then forwarded to the ordinary nodes within the region. This avoids network congestion caused by the central cloud platform directly distributing the packages. At the same time, a distribution timeout mechanism is set. If a node does not receive the update package within 10 minutes, the central cloud platform re-distributes the package to ensure that all edge nodes can obtain the update package.
[0075] For step S740, after receiving the incremental update package from the central cloud platform, each edge node first decompresses and verifies the update package. It verifies the integrity and correctness of the update package by comparing the MD5 hash value. If the verification fails, it requests a re-delivery from the central cloud platform; if the verification succeeds, it temporarily stores the update package in a local temporary directory, waiting for a low-load period for silent application. The criteria for determining a low-load period are: node CPU utilization ≤ 30%, bandwidth utilization ≤ 20%, and this state lasting for more than 10 minutes. Nodes automatically identify low-load periods by monitoring their own load status in real time. For example, 2-4 AM is typically a low-peak period for user requests, and most nodes will be in a low-load state. During silent application, nodes first suspend non-core services (such as log backup and data statistics) while retaining core request processing services to ensure user requests are unaffected. Then, they load incremental update packages to update the parameters of the time-series prediction model and graph neural network model, as well as the collaborative strategy parameters. Logs are recorded during the update process, and if an update fails, it immediately rolls back to the old version of the parameters to prevent the node from malfunctioning. After the update is complete, the core services are restarted to verify the validity of the model and parameters, ensuring the updated model functions correctly. The entire application process does not affect the response to user requests, achieving seamless model and parameter updates. After the update is complete, the node sends an update confirmation message to the central cloud platform, which records the update status of each node to ensure that the models and parameters of all nodes across the network remain consistent.
[0076] Based on the above method embodiments, the second embodiment of this application discloses a CDN edge node collaborative optimization system. The CDN edge node collaborative optimization system of this embodiment can implement any of the above-described CDN edge node collaborative optimization methods, and the specific working process of each module in the CDN edge node collaborative optimization system can be referred to the corresponding process in the above method embodiments.
[0077] For ease of understanding, an example is as follows: A CDN edge node collaborative optimization system includes: The edge data acquisition module is used to periodically collect the status data of each edge node in the target area according to the set status synchronization period, and to construct a dynamic resource transfer map based on the status data, which serves as the basic input for local prediction and global collaboration. A lightweight AI inference module is deployed on each edge node to perform local predictions in parallel based on its own state data and dynamic resource transfer graph. When a high-confidence resource hotspot transfer event is predicted, an early warning event is generated. A resource hotspot transfer event represents a significant and predictable migration process of the corresponding request load from the current edge node to one or more other edge nodes within an adjacent time window for a specific resource object. The edge collaborative communication module is used to realize lightweight information synchronization and decision consensus among nodes, including broadcasting early warning events to neighboring nodes and verifying the received early warning events based on a distributed reputation mechanism to form a trusted early warning consensus. The collaborative scheduling execution module is used by each edge node to perform multi-objective collaborative scheduling decisions by integrating its own status, the status of neighboring nodes, local prediction results, and trusted early warning consensus. The central monitoring and global optimization module is used to audit the effectiveness of decisions within a preset verification period and dynamically update the reputation scores between edge nodes based on the audit results.
[0078] The third embodiment of this application provides a computer device, which may include a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement a CDN edge node collaborative optimization method.
[0079] The memory can communicate with the processor via a communication bus, which can be an address bus, a data bus, a control bus, etc.
[0080] Additionally, the memory may include random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device.
[0081] Furthermore, the processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0082] The fourth embodiment of this application provides a computer-readable storage medium storing a computer program that can be loaded by a processor and executed as a CDN edge node collaborative optimization method.
[0083] The computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in connection with an instruction execution system, apparatus, or device; the program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.
[0084] The above are all preferred embodiments of this application and are not intended to limit the scope of protection of this application. Any feature disclosed in this specification (including the abstract and drawings) may be replaced by other equivalent or similar features unless specifically stated otherwise. That is, unless specifically stated otherwise, each feature is only one example of a series of equivalent or similar features.
Claims
1. A CDN edge node collaborative optimization method, characterized in that, include: Each edge node in the target area periodically collects its own state data according to the set state synchronization period and constructs a dynamic resource transfer map, which serves as the basic input for local prediction and global collaboration. Based on its own state data and the dynamic resource transfer graph, each edge node performs local prediction in parallel. When a high-confidence resource hotspot transfer event is predicted, an early warning event is generated and broadcast to neighboring nodes. The resource hotspot transfer event represents a significant and predictable migration process of the corresponding request load from the current edge node to one or more other edge nodes within an adjacent time window for a specific resource object. Neighboring nodes verify the received warning events based on a distributed reputation mechanism to form a trusted warning consensus. Each edge node integrates its own state, the state of its neighboring nodes, local prediction results, and the aforementioned trusted early warning consensus to execute multi-objective collaborative scheduling decisions. The collaborative scheduling decision is executed, and the decision effect is audited within a preset verification period. The reputation scores between edge nodes are dynamically updated based on the audit results.
2. The CDN edge node collaborative optimization method according to claim 1, characterized in that, Based on their own state data and the dynamic resource transfer graph, each edge node performs local prediction steps in parallel, specifically including: The collected time-series state data is input into the pre-trained lightweight time-series prediction model, which outputs the predicted node load risk value and the list of local hot resources in the near future. The dynamic resource transfer graph is input into a pre-trained lightweight graph neural network model, which outputs the transfer path and probability from the current hot resource to the next possible hot resource, along with the model confidence score. When both the transfer probability and confidence level exceed the set threshold, and the time-series prediction model detects that the real-time request rate of the corresponding source resource exceeds the dynamic threshold, it is determined as a high-confidence resource hotspot transfer event prediction.
3. The CDN edge node collaborative optimization method according to claim 1, characterized in that, The specific steps for generating and broadcasting alert events to neighboring nodes include: The predicted resource hotspot transfer events are encapsulated into standard early warning events. The event content includes at least: event type, source resource identifier, target resource identifier, prediction intensity, sending node identifier, and timestamp. The warning event is digitally signed using the private key of the sending edge node; The warning event, carrying a digital signature, is broadcast to the set of neighboring nodes via the security event bus.
4. The CDN edge node collaborative optimization method according to claim 3, characterized in that, The steps for neighboring nodes to verify the received early warning events based on a distributed reputation mechanism and form a trusted early warning consensus specifically include: After receiving the warning event, the neighboring node uses the public key of the sending node to verify the corresponding digital signature; After verification, each neighboring node performs a weighted vote on the warning event based on its own maintained historical reputation score for the sending node; If the weighted sum of all neighboring nodes' votes on the warning event exceeds a preset consensus threshold, then a credible consensus is reached on the warning event; otherwise, the warning event is marked as an untrustworthy or low-trustworthy event.
5. The CDN edge node collaborative optimization method according to claim 4, characterized in that, The specific steps for executing multi-objective cooperative scheduling decisions include: For high-priority resources indicated in the scheduling decision factor, if they are not cached locally, they are added to the high-priority prefetch queue, and the prefetch bandwidth is determined according to the prediction intensity and the user's own load. The scheduling decision factor is defined as a comprehensive information set obtained by "combining the user's own state, the state of neighboring nodes, the local prediction results, and the early warning events verified by consensus". The high-priority resources include resources in the local hot resource list and the transfer target resources in the early warning events. Compare the real-time load risk values of the node itself with those of each neighbor node. If the node itself has a high risk and there are feasible neighbor nodes with low load, calculate the diversion ratio and divert some of the newly arriving requests to the feasible neighbor nodes with low load and reachable links. For a cache miss request, based on the resource cache status and link latency of neighboring nodes in the scheduling decision factors, the origin path with the optimal overall response latency is selected, and the origin path passes through a neighboring node that has cached the target resource.
6. The CDN edge node collaborative optimization method according to claim 5, characterized in that, The steps for auditing the effectiveness of decisions and updating credit scores include: During the verification period following the execution of the collaborative scheduling decision, actual traffic patterns and system performance indicators are collected. Assess the accuracy of early warning events by comparing the predicted increase in demand for target resources with the predicted intensity, the consistency between the predicted transfer path and the actual resource transfer path, and the difference between the early warning event initiation time and the actual resource transfer initiation time. Evaluate the effectiveness of scheduling decisions: analyze the hit rate improvement of prefetching operations, the effect of traffic offloading operations on their own load, and the improvement of user response speed by path optimization; Based on all evaluation results, the reputation scores of relevant nodes are updated according to preset rules: nodes that provide accurate warnings or reliable status information have their reputation scores increased; nodes that provide false warnings or invalid information have their reputation scores decreased.
7. The CDN edge node collaborative optimization method according to claim 1, characterized in that, The collaborative optimization method further includes: Each edge node periodically reports the de-identification operation package to the central cloud platform. The de-identification operation includes state sequences, predicted events, decision actions, and effect audit data. The central cloud platform aggregates data from the entire network, retrains the time series prediction and graph neural network models, and optimizes the parameters of the collaborative strategy. The central cloud platform calculates the parameter differences between the new model and the old version, generates a lightweight incremental update package, and distributes it to the edge nodes; Each edge node silently applies the incremental update package during periods of low load.
8. A CDN edge node collaborative optimization system, characterized in that, Performing the CDN edge node collaborative optimization method as described in any one of claims 1 to 7 includes: The edge data acquisition module is used to periodically collect the status data of each edge node in the target area according to the set status synchronization period, and to construct a dynamic resource transfer map based on the status data, which serves as the basic input for local prediction and global collaboration. A lightweight AI inference module is deployed on each edge node to perform local predictions in parallel based on its own state data and the dynamic resource transfer graph. When a high-confidence resource hotspot transfer event is predicted, an early warning event is generated. The resource hotspot transfer event represents a significant and predictable migration process of the corresponding request load from the current edge node to one or more other edge nodes within an adjacent time window for a specific resource object. The edge collaborative communication module is used to realize lightweight information synchronization and decision consensus among nodes, including broadcasting the warning event to neighboring nodes and performing consensus verification on the received warning event based on a distributed reputation mechanism to form a trusted warning consensus. The collaborative scheduling execution module is used by each edge node to perform multi-objective collaborative scheduling decisions by integrating its own status, the status of neighboring nodes, local prediction results, and the trusted early warning consensus. The central monitoring and global optimization module is used to audit the effectiveness of decisions within a preset verification period and dynamically update the reputation scores between edge nodes based on the audit results.
9. A computer device, characterized in that, It includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the CDN edge node collaborative optimization method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The system stores a computer program capable of being loaded by a processor and executing the CDN edge node collaborative optimization method as described in any one of claims 1 to 7.