An internet infrastructure service effect dynamic tracking method and system based on multi-index coordination and intelligent measurement multiplexing
By using dynamic association graphs and reinforcement learning algorithms, an adaptive system cognitive model is constructed, which solves the problems of fragmented multi-source data and rigid analysis models in Internet infrastructure. This enables efficient and accurate perception and decision-making of service effects, while reducing measurement overhead and system invasiveness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA INTERNET NETWORK INFORMATION CENTER
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
Smart Images

Figure CN122247883A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of information technology, specifically relating to a method and system for dynamically tracking the service effectiveness of Internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse. Background Technology
[0002] With the deepening of global digital transformation, internet infrastructure (including components such as networks, computing, storage, and platform services) has become the digital cornerstone of society, and its service efficiency directly determines the quality of user experience and business continuity. Against this backdrop, building an observable system capable of continuously, accurately, and efficiently tracking service effectiveness has become a core challenge in ensuring service quality and improving operational efficiency.
[0003] Currently, mainstream service monitoring systems generally adopt a layered and domain-specific isolation strategy. Network, system, and business teams focus on different dimensions of metrics, such as infrastructure performance, resource utilization, and application availability, resulting in data that comprehensively reflects the final service effect being fragmented into different "islands." More critically, the analytical models used to correlate these multi-source metrics often employ fixed weights or static correlation graphs, whose correlation patterns cannot adaptively adjust with the system's operational status and cannot adapt to the dynamically changing operating environment of internet infrastructure. When service experience deteriorates, operations personnel not only struggle to quickly pinpoint the root cause from a global perspective but also lack a "living" knowledge network that can reflect the system's true state in real time as a basis for analysis.
[0004] To achieve end-to-end service observability, systems typically need to deploy a large number of proactive probing tasks. However, these tasks lack a unified "intelligent brain" for global coordination and planning, often resulting in repetitive measurements on the same service path, leading to significant data redundancy and resource overhead. More seriously, under high-frequency, intensive measurement, the monitoring traffic itself may impact critical business links, creating a "measurement intrusion" problem. This directly interferes with the evaluation of actual service effectiveness, contradicting the original purpose of monitoring. Summary of the Invention
[0005] The purpose of this invention is to overcome the technical shortcomings of existing internet infrastructure service effectiveness evaluation systems, such as fragmented multi-source data, rigid analytical models, and a lack of coordination in proactive detection mechanisms. It provides a dynamic tracking method for internet infrastructure service effectiveness based on multi-indicator coordination and intelligent measurement reuse. Specifically, this invention aims to achieve the following objectives: 1) By automatically analyzing historical data, an initial correlation graph is established between indicators related to service effectiveness. A graph neural network is introduced to enable it to learn continuously, allowing the semantics and strength of the correlation represented by the graph to adapt to the dynamic changes of the system under different operating conditions, thus obtaining a dynamic correlation graph. This completely solves the problems of indicator fragmentation and static model rigidity, forming a unified and dynamic system cognitive model.
[0006] 2) Based on this dynamically evolving graph, reinforcement learning agents are used to plan the optimal sparse measurement set under resource constraints. Through the semantic propagation mechanism of the graph, the limited measured data is reused for collaborative inference of the global service state, achieving the effect of "measuring a few, knowing the whole", fundamentally solving the problems of measurement task redundancy and strong system invasiveness.
[0007] The technical solution adopted in this invention is as follows: A method for dynamically tracking the service effectiveness of internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse includes the following steps: Collect multi-source heterogeneous indicators of Internet infrastructure and preprocess them to obtain standardized time-series data; Using standardized time-series data, a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators is constructed using a graph neural network; Based on the dynamic association graph, the optimal active measurement action vector is obtained under resource constraints through reinforcement learning algorithm; By leveraging the optimal active measurement action vector, the global service status of the Internet infrastructure is inferred through the semantic propagation mechanism of the dynamic association graph.
[0008] Furthermore, the multi-source heterogeneous indicators include service performance indicators passively collected by the server and user experience indicators actively synthesized by the client.
[0009] Furthermore, the step of constructing a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators using a graph neural network includes: A potential causal relationship discovery algorithm is used to learn the dependencies between multi-source heterogeneous indicators from historical data and construct a weighted directed initial association graph with service effect transmission paths. Based on the initial association graph, a temporal graph neural network is used to perform online incremental updates on the semantic representation of nodes and the relational strength of edges, so as to capture and adapt to the dynamic association patterns under different operating conditions, thus obtaining a dynamic association graph.
[0010] Furthermore, the step of using a temporal graph neural network to perform online incremental updates of the semantic representations of nodes and the relational strengths of edges based on the initial association graph includes: The weight parameters of the graph attention network are dynamically adjusted by a recurrent neural network. At each time step, the graph embedding representation of the current time step and the weight of the previous time step are received, and the weight of the current time step is evolved. At each time step, the evolved weights are used to propagate and aggregate the currently input indicator data on the graph structure determined by the initial association graph, generating new features for core nodes that integrate their domain semantic information.
[0011] Furthermore, the step of obtaining the optimal active measurement action vector under resource constraints based on a dynamic association graph and a reinforcement learning algorithm includes: The embedding vector of the dynamic association graph, the real-time resource status, and the state confidence of each indicator node are jointly encoded into the state space; Construct a comprehensive reward function that considers the accuracy of inferring the effectiveness of integrated services, the coverage of key indicators, and the measurement cost; Under the constraint that the measurement cost is less than the dynamic resource budget, the comprehensive reward function is maximized to obtain the optimal active measurement action vector.
[0012] Furthermore, by monitoring the average load of the business link and the CPU utilization of key nodes, the dynamic maximum number of concurrent measurement tasks is calculated as the dynamic resource budget, and the dynamic resource budget is dynamically adjusted according to the real-time load of the business link.
[0013] Furthermore, the step of inferring the global service status of the Internet infrastructure using the optimal active measurement action vector through the semantic propagation mechanism of the dynamic association graph includes: Utilize the optimal active measurement action vector to obtain the measured data of the index; By utilizing the graph semantic propagation mechanism of dynamic association graphs, and taking measured data as information anchors, the service status of all associated nodes can be inferred through a single forward propagation.
[0014] A dynamic tracking system for the service effectiveness of internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse, comprising: The unified data acquisition and preprocessing module is used to collect multi-source heterogeneous indicators of Internet infrastructure and preprocess them to obtain standardized time-series data. The dynamic correlation graph construction module is used to construct a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators using standardized time-series data and graph neural networks. The intelligent measurement planning module is used to obtain the optimal active measurement action vector under resource constraints based on the dynamic correlation graph and through reinforcement learning algorithm. The collaborative perception module is used to infer the global service status of Internet infrastructure by utilizing the optimal active measurement action vectors and the semantic propagation mechanism of dynamic association graphs.
[0015] Furthermore, the system also includes a resource adaptation and control module, which calculates the dynamic maximum number of concurrent measurement tasks as a dynamic resource budget by monitoring the average load of the business link and the CPU utilization of key nodes, and dynamically adjusts the dynamic resource budget according to the real-time load of the business link, and applies the dynamic resource budget to the intelligent measurement planning module to obtain the optimal active measurement action vector.
[0016] The present invention also provides a computer device including a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program including instructions for performing the methods described above.
[0017] The present invention also provides a computer-readable storage medium storing a computer program that, when executed by a computer, implements the above-described method.
[0018] The beneficial effects of this invention are as follows: 1) This invention introduces a temporal graph neural network to construct and continuously evolve a dynamic association graph, which is no longer fixed. The semantics of its nodes and the strength of their relationships can be updated online and incrementally based on the latest system data, thereby accurately capturing changes in association patterns as the system transitions from "daily" to "major promotions" and then to "faults." This fundamentally solves the problem of rigidity in traditional static association models, providing a real and reliable dynamic context for service effect perception and decision-making.
[0019] 2) The core of this invention, intelligent measurement planning and collaborative perception, achieves the unification of decision-making and execution. Specifically, the decision engine, based on reinforcement learning (such as Proximal Policy Optimization, PPO), uses the dynamic correlation graph as the environmental state and outputs the optimal proactive measurement action vector. This measurement result is then reused to infer the global state through the semantic propagation mechanism of a graph neural network (such as the graph attention network GAT). This "planning-inference" closed loop enables the system to proactively and efficiently acquire the most comprehensive understanding of service effects with minimal measurement cost, achieving the strategic goal of "measuring a few things to know the whole picture." Compared to static correlation models, this invention, with its dynamic correlation capabilities, significantly improves the root cause localization accuracy in simulated fault scenarios; compared to measurement strategies without collaboration, it can greatly reduce measurement overhead while achieving the same inference accuracy. Attached Figure Description
[0020] Figure 1 This is a flowchart illustrating the steps of a method for dynamically tracking the service performance of Internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse, according to an embodiment of the present invention.
[0021] Figure 2 This is a flowchart illustrating the steps of a method for dynamically tracking the service performance of Internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse, according to another embodiment of the present invention.
[0022] Figure 3 This is a module composition diagram of an Internet infrastructure service performance dynamic tracking system based on multi-indicator collaboration and intelligent measurement reuse, according to an embodiment of the present invention.
[0023] Figure 4 This is a module composition diagram of a dynamic tracking system for the service effect of Internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse, which is another embodiment of the present invention. Detailed Implementation
[0024] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.
[0025] This invention provides a method for dynamically tracking the service effectiveness of internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse. The overall technical process of the method includes: first, uniformly accessing multi-source indicator data and constructing an initial correlation graph for service effectiveness evaluation; then, using a dynamic graph neural network model to enable the initial correlation graph to continuously evolve in order to adapt to changes in system state, resulting in a dynamic correlation graph; based on this dynamic correlation graph, planning the optimal measurement strategy under resource constraints through a reinforcement learning agent; finally, performing sparse measurement and utilizing the semantic propagation capability of the graph to perform collaborative inference of the service effectiveness across the entire system, forming a complete closed loop from service effectiveness perception to decision-making.
[0026] In one embodiment, the present invention provides a method for dynamically tracking the performance of internet infrastructure services based on multi-indicator collaboration and intelligent measurement reuse, such as... Figure 1 As shown, it includes the following steps: Step S11: Collect multi-source heterogeneous indicators of Internet infrastructure and preprocess them to obtain standardized time-series data; Step S12: Using standardized time-series data, construct a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators using a graph neural network; Step S13: Based on the dynamic association graph, obtain the optimal active measurement action vector under resource constraints through reinforcement learning algorithm; Step S14: Utilize the optimal active measurement action vector to infer the global service status of the Internet infrastructure through the semantic propagation mechanism of the dynamic association graph.
[0027] Step S11 of this invention implements unified data collection and preprocessing, serving as the data foundation. It is responsible for collecting multi-source heterogeneous indicators from the distributed service environment, including but not limited to business performance indicators passively collected by the server and user experience indicators actively synthesized by the client. This step cleans, aligns, and aggregates the raw data within time windows, outputting a unified time-series data matrix to provide structured input for upstream construction of a service effect correlation model.
[0028] Step S12 of this invention, which realizes the construction and evolution of dynamic correlation graphs, is the core of solving the problems of "fragmented service effect evaluation data" and "static model". Step S12 receives standardized time-series data and performs the following two main tasks: Association discovery: The Notears algorithm, a potential causal relationship discovery algorithm based on continuous optimization, is used to learn the dependencies between indicators from historical data and construct a weighted directed initial association graph with service effect transmission paths.
[0029] Dynamic graph evolution: By introducing a temporal graph neural network, based on the structure of the initial association graph, the semantic representation of nodes and the relational strength of edges are updated online incrementally, so that it can capture and adapt to the dynamic association patterns of the system under different operating states, thus obtaining a dynamic association graph.
[0030] Steps S13 and S14 of this invention realize intelligent measurement planning and collaborative perception, which are the decision-making and execution center for realizing "measurement reuse" and overcoming the problems of "task redundancy and intrusion". As a unified intelligent agent, it completes the complete closed loop from decision-making to perception.
[0031] In one embodiment, step S13, which implements intelligent measurement planning (decision-making), involves encoding the current state of the dynamic association graph, node confidence, and system resource constraints into a state space, and then using a reinforcement learning algorithm for sequence decision-making. Its core output is an optimal active measurement action vector that maximizes a comprehensive reward that integrates service effect inference accuracy, critical link coverage, and measurement cost, while satisfying the dynamic resource budget.
[0032] In one embodiment, step S14 achieves collaborative perception (execution and inference) by: scheduling probes to execute the optimal measurement action (optimal active measurement action vector) of decision planning to obtain high-confidence measured data. Subsequently, using the graph semantic propagation mechanism of the dynamic association graph, the measured data is used as "information anchors" to infer the service status of all associated nodes through a single forward propagation, thereby achieving the collaborative perception effect of "measuring a few to know the whole".
[0033] In one embodiment, the method of the present invention may further include step S15, such as... Figure 2As shown, this step enables resource adaptation and control, acting as a "smart regulator" to ensure low system invasiveness. By monitoring business load in real time, this step calculates and outputs a dynamic resource budget (such as the maximum number of concurrent measurement tasks), providing a dynamic resource budget for the intelligent measurement planning step. It ensures that the measurement scale is automatically tightened under high system load and expanded under low load, thus maintaining high efficiency and security throughout the entire service performance tracking process under resource constraints, forming a complete autonomous closed loop from data perception, intelligent decision-making, to resource control.
[0034] The key point of this invention is: 1) Adaptive Association Graph Construction Mechanism Based on Dynamic Graph Neural Networks: Its core lies in employing a dynamic graph neural network architecture. Based on the static graph topology determined by the initial association graph discovery, the semantics of nodes and the strength of associations in the service performance indicator association graph are continuously evolved. Specifically, the weight parameters of the graph attention network (GAT) are dynamically adjusted using a recurrent neural network (such as GRU). This evolutionary process can be characterized as follows: This mechanism receives the graph embedding representation of the current time step at each time step. and the GAT weights of the previous time step The weights at the current time step are evolved. This enables the model to learn and adjust online, allowing it to automatically capture and adapt to different operational states of the system. As an extension of this mechanism, the associated static graph topology can also be updated periodically by re-executing the association algorithm.
[0035] 2) Constrained Reinforcement Learning Decision Mechanism Integrating Dynamic Graph Semantics: The active measurement planning problem is constructed as a sequential decision-making process under resource constraints. Its innovation lies in embedding the structure of the dynamic association graph as a core component of the reinforcement learning agent's state space. The state space... It also includes the confidence scores of each node's state calculated from the information freshness. and dynamic resource budget The decision-maker uses reinforcement learning algorithms (such as Proximal Policy Optimization, PPO) to determine the optimal action vector. The number of '1's in the middle does not exceed Under hard constraints, learn to output the optimal active measurement action vector. This mechanism enables decision-making to move beyond isolated processes and instead allows for a deep understanding of the global interconnected topology between indicators. This allows for the prioritization of "pivotal indicators" that have the greatest impact on the final service outcome, the highest information value, and can maximize the inference of benefits through graph propagation. This lays a strategic decision-making foundation for achieving efficient "measurement reuse."
[0036] 3) Collaborative perception and state inference based on graph semantic propagation: An efficient closed loop of "decision-measurement-inference" is established. Specifically, a small number of measured index values selected by intelligent decision-making are used as high-confidence "information anchors" in the dynamic association graph. Above, the dynamically evolved graph attention weights are utilized. A forward propagation is performed. This process uses a weighted aggregation operation in the graph attention layer to spread the information of the measured nodes along the associated edges to the entire graph, thereby collaboratively inferring the state of all associated indicators. This achieves a collaborative perception effect of "a single sparse measurement can perceive the global service effect," fundamentally achieving high-precision insight with low intrusion.
[0037] In one embodiment of the present invention, a method for dynamically tracking the performance of Internet infrastructure services based on multi-indicator collaboration and intelligent measurement reuse is provided. The key steps of the overall process of this method are as follows: Unified data collection and preprocessing: Responsible for collecting multi-source, heterogeneous indicator data directly related to service effectiveness at each key level of the end-to-end service path from the internet. Its technical process is as follows: 1.1) Data Collection: Server-side / Instrumentation Layer: Through probes deployed in various locations and a centralized control plane, core metrics are automatically collected at a high sampling rate (e.g., 100%) to comprehensively depict the health status of the underlying support components of the service. For example, core metrics collected include: DNS resolution latency, CDN node response time, load balancer throughput, API gateway latency, authentication service error rate, and database query time. Client-side / User Experience Layer: Monitoring is synthesized at a certain user session sampling rate to collect metrics that directly reflect service performance. For example: page load time and settlement success rate.
[0038] 1.2) Data Alignment: The preprocessing module is responsible for tagging all data with a unified time-series label (timestamp, service, endpoint, user_id, trace_id) and aggregating them based on a time window (e.g., one data point per minute). Here, timestamp represents the timestamp, service represents the service identifier, endpoint represents the endpoint, user_id represents the user identifier ID, and trace_id represents the trace ID.
[0039] 1.3) Matrix Construction: Forming a unified time-series data matrix Where T is the number of time points, and M is the eight specific indicators mentioned above.
[0040] 2) Initial association map construction: The historical sequence data matrix of the M indicators provided in step 1) is constructed. As input, the NOTEARS algorithm is used to learn the latent causal relationships between indicators, transforming association discovery into a continuous optimization problem. The goal is to find a weighted adjacency matrix. (elements in the matrix) Let represent the dependence strength of index i on index j. This can be solved by minimizing the following loss function: in It is the first regularization strength coefficient, used to control the sparsity of the weighted adjacency matrix A.
[0041] Weighted adjacency matrix Satisfying the acyclic constraint of NOTEARS algorithm: in, Represents the acyclic constraint function; The Hadamard product is the element-wise product of two matrices. This operation aims to eliminate the influence of the direction sign of the edges and retain only the information of the strength of the edges. The trace operation represents the matrix exponent, which measures the total strength of all directed cycles in the entire directed graph. When the graph is acyclic, this value is equal to the number of nodes M.
[0042] Output an initial weighted directed graph. V is the set of nodes (corresponding to M indicators), E is the set of edges, and A is the weighted adjacency matrix. This graph... This initial relational graph, serving as a static relational skeleton, quantifies for the first time the fundamental driving relationships between underlying infrastructure performance (such as DNS resolution latency) and top-level user experience (such as page load time), providing an initial structure for subsequent dynamic graph evolution.
[0043] 3) Graph Evolution and Semantic Enhancement: This module is based on the initial association graph. The EvolveGCN model is used to dynamically evolve the semantics and association strength of graph nodes, accurately capturing the changes in the impact paths and strengths of infrastructure on service effectiveness under different operational states, forming a dynamic association graph. Its core mechanism is to dynamically evolve the weight parameters of the graph attention network GAT through GRU, enabling the graph attention network to adapt to the dynamic environment and thus adjust the weights of information propagation between nodes. The specific process is as follows: Graph attention propagation (node feature update): At each time step t, the dynamically evolved GAT weights are used. Use the currently input indicator data Based on the initial correlation map Information is propagated and aggregated on a defined graph structure to generate new features for core nodes that incorporate their domain semantic information.
[0044] Attention coefficient calculation: quantifies the importance of neighbor node j to the current center node i. First, the features of the two nodes are weighted... After linear transformation and concatenation, the data is then processed through a shared attention mechanism (based on learnable vectors). The unnormalized attention score is calculated using the LeakyReLU activation function, and finally normalized using the softmax function on all neighbors of node i. in, Indicates the attention coefficient. A vector representing the learnable parameters of the attention mechanism. Represents the original characteristics of node i. This represents the original characteristics of node j.
[0045] Feature weighted aggregation and update: based on the calculated attention coefficients For all neighbors of node i The transformed features are weighted and summed, and then passed through a nonlinear activation function. (e.g., ReLU) to obtain the final updated feature representation of node i. This feature incorporates the dynamic relational semantics of the current graph structure and can be used for subsequent tasks such as metric prediction.
[0046] Dynamic evolution of GAT weights (parameter update): To enable the model to adapt to dynamically related models, GRU is used to recursively update the GAT weight matrix. GRU is represented by a graph embedding at the current time step. GAT weights at the previous time step As input, new weight parameters are derived: ) In this context, graph embedding represents... By analyzing the node features at the current time step This is obtained by performing a global pooling operation, and is used to capture the overall structural information of the graph. Through the above process, the model outputs a dynamically evolving graph semantic sequence, namely a dynamic relational graph. Each state encodes the most critical dynamic relational model between the infrastructure and user experience at the corresponding moment. This mechanism, by adjusting the information propagation intensity on a fixed topology, enables the model to accurately capture changes in system state. For example, it can reveal that during peak sales periods, the impact of database query time on settlement success rate is significantly enhanced.
[0047] 4) Intelligent Measurement Planning and Collaborative Sensing: Under resource constraints, an optimal proactive measurement plan is formulated, and the state of the entire system is collaboratively inferred based on a small amount of measured data. Its core process consists of the following two steps.
[0048] 4.1) Intelligent measurement planning (decision-making).
[0049] Measurement planning is modeled as a sequential decision-making process, executed by a reinforcement learning agent. Its core elements include: states. ,action Reward function .
[0050] state : From dynamic correlation graph Embedding vectors, real-time resource status of the system and the current state confidence level of each indicator node. Together they constitute.
[0051] State confidence of each node It is obtained by calculating its information freshness. Information freshness refers to the degree of decay of an indicator node since its last "reliable state update". The higher the information freshness, the more representative the data is of the current system state. This invention uses an exponential model that conforms to the natural decay law for calculation: in, This represents the state confidence of index i at time t, which is equivalent to its information freshness at that time. The value of . Among them is the attenuation coefficient, which is a positive real number. It controls the rate at which the confidence level decays over time; a higher value indicates that the system is more sensitive to data aging. It can be preset according to the business's requirements for data timeliness. This represents the time interval elapsed from the most recent update of index i to the current time t.
[0052] The above-described method for calculating information freshness and state confidence is a preferred embodiment of the present invention. In practical systems, other equivalent methods can also be used, such as employing a linear decay model to calculate information freshness. ,in This indicates the upper limit of the data's validity period; beyond this time, the data is considered to have completely lost its reference value. State confidence level. By performing exponential smoothing calculation on information freshness: .
[0053] action : is an M-dimensional 0 / 1 vector used to specify the set of indicators actively measured in this round.
[0054] reward function It is used to evaluate the quality of actions, and its design takes into account the accuracy of service effect inference, coverage of key indicators, and measurement cost.
[0055] in, The accuracy of service effect inference is calculated by comparing the inferred value of the dynamic correlation map with the available real value. The smaller the error, the higher the score. It is a key indicator coverage, calculated by accumulating the importance weights of the measured indicators in the dynamic correlation graph (which can be set according to the graph centrality algorithm or business criticality); This is the measurement cost, calculated by summing the preset unit costs of the measured indicators. Weighting coefficients. , , It is a positive real number used to adjust the system's emphasis on the accuracy of service effect inference, coverage of key indicators, and measurement costs, guiding the agent to prioritize the measurement of indicators that contribute significantly to global inference, are of high business criticality, and have controllable costs under resource constraints.
[0056] The decision objective is to meet the measurement cost ( Less than dynamic resource budget Maximize a comprehensive reward function under constraints. .in, This represents the preset unit cost of performing one active measurement on index i. This represents the measurement action vector at time t. In this context, the decision component for indicator i is defined. Its value is a binary variable: when... When = 1, it indicates that the agent actively measures index i in its decision-making process; when When =0, it means that the agent does not measure index i in its decision-making.
[0057] 4.2) Collaborative sensing (measurement multiplexing).
[0058] This step leverages the semantic propagation capabilities of dynamic association graphs to achieve the effect of "measuring a few to understand the whole." Its execution flow is as follows: First, under the control center's scheduling, the probe executes actions... The specified synthetic monitoring acquires the set of measured data. For example, an agent might decide to only measure "DNS resolution latency" and "settlement success rate".
[0059] Then, the state fusion unit uses measured values. Replace or merge feature matrix Characteristics of corresponding indicators Generate the updated feature matrix ( ).
[0060] Next, the graph inference engine in dynamically related graphs ( Perform a graph attention propagation once on the graph, spreading the high-confidence measured information along the associated edges to the entire graph. For each node, its final state... It is derived from the weighted aggregation of its neighbors (where the attention coefficient is calculated by the GAT model): Finally, the dynamic association graph uses the inferred global state view { , ... The system updates its feature matrix, and the updated system state prepares for the next perception and decision-making cycle.
[0061] 5) Resource Adaptive Adjustment: This step acts as the system's "intelligent regulator," calculating a dynamic maximum concurrent measurement task count as a dynamic resource budget by monitoring the average load of the business links and the CPU utilization of key nodes. Based on the real-time load of the business chain, the measurement budget in step 4) is dynamically adjusted. This ensures that measurement traffic is automatically reduced during peak business periods, avoiding impact on the actual service experience and fundamentally solving the problem of low intrusion in the tracing process.
[0062] Another embodiment of the present invention provides a dynamic tracking system for the service effectiveness of Internet infrastructure based on multi-indicator collaboration and intelligent measurement multiplexing, such as... Figure 3 As shown, the system includes: The unified data acquisition and preprocessing module 21 is used to collect multi-source heterogeneous indicators of Internet infrastructure and preprocess them to obtain standardized time-series data. The dynamic correlation graph construction module 22 is used to construct a dynamic correlation graph representing the dependency relationship between multi-source heterogeneous indicators using standardized time-series data and a graph neural network. The intelligent measurement planning module 23 is used to obtain the optimal active measurement action vector under resource constraints based on the dynamic correlation graph and through reinforcement learning algorithm; The collaborative perception module 24 is used to infer the global service status of the Internet infrastructure by utilizing the optimal active measurement action vector and the semantic propagation mechanism of the dynamic association graph.
[0063] Another embodiment of the present invention provides a dynamic tracking system for the service effectiveness of Internet infrastructure based on multi-indicator collaboration and intelligent measurement multiplexing, such as... Figure 4 As shown, in addition to the modules mentioned above, the system also includes a resource adaptation and control module 25, which is used to calculate the dynamic maximum number of concurrent measurement tasks as a dynamic resource budget by monitoring the average load of the business link and the CPU utilization of key nodes, and dynamically adjust the dynamic resource budget according to the real-time load of the business link, and apply the dynamic resource budget to the intelligent measurement planning module to obtain the optimal active measurement action vector.
[0064] The above division of modules is merely illustrative. In practical applications, the functions described above can be assigned to different functional modules as needed to complete all or part of the functions described in the aforementioned method. The specific working process of each module can be found in the corresponding process in the aforementioned method embodiments, and will not be repeated here.
[0065] It should be understood that the methods and systems disclosed in the above embodiments of the present invention can be implemented in other ways. For example, the above module division can be implemented in other ways, multiple modules can be combined or integrated into another subsystem or system, or some features can be ignored or not executed. For example, in one embodiment, a dynamic tracking system for the service effect of Internet infrastructure based on multi-index collaboration and intelligent measurement reuse of the present invention may include: a unified data acquisition and preprocessing subsystem, a dynamic correlation graph construction and evolution subsystem, an intelligent measurement planning and collaborative perception subsystem, and a resource adaptation and control subsystem. The specific working process of each of the above subsystems can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0066] The various steps, modules, or subsystems in this invention can be implemented as software functional units and stored in a computer-readable storage medium, including several instructions to cause a computer device to execute some or all of the steps of the method described in this invention. For example, one embodiment of this invention provides a computer device (computer, server, etc.) including a memory and a processor. The memory stores a computer program configured to be executed by the processor, and the computer program includes instructions for performing the steps of the method of this invention. For example, another embodiment of this invention provides a computer-readable storage medium (such as ROM / RAM, disk, optical disk, etc.) storing a computer program. When the computer program is executed by a computer, it implements the steps of the method of this invention. For example, another embodiment of this invention provides a computer program product including a computer program. When the computer program is executed by a computer, it implements the steps of the method of this invention.
[0067] The specific embodiments of the present invention disclosed above are intended to help understand the content of the present invention and to implement it accordingly. Those skilled in the art will understand that various substitutions, changes, and modifications are possible without departing from the spirit and scope of the present invention. The present invention should not be limited to the content disclosed in the embodiments of this specification; the scope of protection of the present invention is defined by the claims.
Claims
1. A method for dynamically tracking the service effectiveness of internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse, characterized in that, Includes the following steps: Collect multi-source heterogeneous indicators of Internet infrastructure and preprocess them to obtain standardized time-series data; Using standardized time-series data, a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators is constructed using a graph neural network; Based on the dynamic association graph, the optimal active measurement action vector is obtained under resource constraints through reinforcement learning algorithm; By leveraging the optimal active measurement action vector, the global service status of the Internet infrastructure is inferred through the semantic propagation mechanism of the dynamic association graph.
2. The method according to claim 1, characterized in that, The multi-source heterogeneous metrics include business performance metrics passively collected by the server and user experience metrics actively synthesized by the client.
3. The method according to claim 1, characterized in that, The method of constructing a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators using a graph neural network includes: A potential causal relationship discovery algorithm is used to learn the dependencies between multi-source heterogeneous indicators from historical data and construct a weighted directed initial association graph with service effect transmission paths. Based on the initial association graph, a temporal graph neural network is used to perform online incremental updates on the semantic representation of nodes and the relational strength of edges, so as to capture and adapt to the dynamic association patterns under different operating conditions, thus obtaining a dynamic association graph.
4. The method according to claim 3, characterized in that, The method of using a temporal graph neural network to perform online incremental updates of the semantic representation of nodes and the relational strength of edges based on the initial association graph includes: The weight parameters of the graph attention network are dynamically adjusted by a recurrent neural network. At each time step, the graph embedding representation of the current time step and the weight of the previous time step are received, and the weight of the current time step is evolved. At each time step, the evolved weights are used to propagate and aggregate the currently input indicator data on the graph structure determined by the initial association graph, generating new features for core nodes that integrate their domain semantic information.
5. The method according to claim 1, characterized in that, The process of obtaining the optimal active measurement action vector under resource constraints based on a dynamic correlation graph and reinforcement learning algorithm includes: The embedding vector of the dynamic association graph, the real-time resource status, and the state confidence of each indicator node are jointly encoded into the state space; Construct a comprehensive reward function that considers the accuracy of inferring the effectiveness of integrated services, the coverage of key indicators, and the measurement cost; Under the constraint that the measurement cost is less than the dynamic resource budget, the comprehensive reward function is maximized to obtain the optimal active measurement action vector.
6. The method according to claim 5, characterized in that, By monitoring the average load of the business link and the CPU utilization of key nodes, the dynamic maximum number of concurrent measurement tasks is calculated as the dynamic resource budget, and the dynamic resource budget is dynamically adjusted according to the real-time load of the business link.
7. The method according to claim 1, characterized in that, The method of inferring the global service status of Internet infrastructure using the optimal active measurement action vector and the semantic propagation mechanism of dynamic association graph includes: Utilize the optimal active measurement action vector to obtain the measured data of the index; By utilizing the graph semantic propagation mechanism of dynamic association graphs, and taking measured data as information anchors, the service status of all associated nodes can be inferred through a single forward propagation.
8. A dynamic tracking system for the service effectiveness of internet infrastructure based on multi-indicator collaboration and intelligent measurement reuse, characterized in that, include: The unified data acquisition and preprocessing module is used to collect multi-source heterogeneous indicators of Internet infrastructure and preprocess them to obtain standardized time-series data. The dynamic correlation graph construction module is used to construct a dynamic correlation graph representing the dependencies between multi-source heterogeneous indicators using standardized time-series data and graph neural networks. The intelligent measurement planning module is used to obtain the optimal active measurement action vector under resource constraints based on the dynamic correlation graph and through reinforcement learning algorithm. The collaborative perception module is used to infer the global service status of Internet infrastructure by utilizing the optimal active measurement action vectors and the semantic propagation mechanism of dynamic association graphs.
9. The system according to claim 8, characterized in that, It also includes a resource adaptation and control module, which calculates the dynamic maximum number of concurrent measurement tasks as a dynamic resource budget by monitoring the average load of the business link and the CPU utilization of key nodes, and dynamically adjusts the dynamic resource budget according to the real-time load of the business link, and applies the dynamic resource budget to the intelligent measurement planning module to obtain the optimal active measurement action vector.
10. A computer device, characterized in that, It includes a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program including instructions for performing the method of any one of claims 1 to 7.