A front-end browser cache-based distributed communication node implicit failure prediction method and device

By storing network performance metrics in the front-end browser cache, the health of distributed communication nodes can be evaluated in real time, solving the problem of difficulty in identifying hidden faults in existing technologies. This enables pre-blocking of high-risk nodes and draft protection, improving system stability and user experience.

CN122247828APending Publication Date: 2026-06-19XIAMEN XINGZONG DIGITAL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN XINGZONG DIGITAL TECH CO LTD
Filing Date
2026-03-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to identify and predict hidden faults in distributed communication nodes during the operation and maintenance of cloud-dedicated switches, voice gateways, and unified communication platforms. This results in administrators only discovering failures after performing batch operations, and backend proactive detection is costly, while complex configurations lack local protection.

Method used

Network performance metrics are stored in the front-end browser cache. The health heat value of nodes is calculated using jitter, timeout rate, server error rate and reconnection penalty metrics to assess node health in real time and block the operation of high-risk nodes before user operation.

🎯Benefits of technology

It improves the fault tolerance and user experience of distributed systems, reduces backend detection costs, reduces the delay in fault exposure, and avoids the losses caused by complex configuration failures.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247828A_ABST
    Figure CN122247828A_ABST
Patent Text Reader

Abstract

This application provides a method and apparatus for predicting latent faults in distributed communication nodes based on front-end browser caching. The method includes: using a front-end browser to determine at least one target node to be operated on in a distributed communication node cluster; obtaining network performance indicators of each target node within a preset time window from the front-end browser cache, wherein the network performance indicators include at least jitter indicators, timeout rate, server error rate, and reconnection penalty indicators; the reconnection penalty indicator is determined based on the number of reconnections and a preset maximum reconnection number threshold; using preset weights to determine the health heatmap value of each target node based on the jitter indicators, timeout rate, server error rate, and reconnection penalty indicators; and identifying target nodes with health heatmap values ​​less than the heatmap threshold as high-risk nodes to block operations on high-risk nodes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of distributed communication node management technology, specifically to a method and apparatus for predicting latent faults in distributed communication nodes based on front-end browser caching. Background Technology

[0002] In the daily operation and maintenance of cloud private branch exchanges (PBXs), voice gateways, edge communication nodes, and unified communication platforms, administrators typically perform high-risk operations such as batch configuration, firmware upgrades, reboots, and route adjustments on nodes distributed across different regions and carrier networks through a unified network (Web) console. Existing solutions usually only display a binary "online / offline" status in the device list, which is insufficient to reflect "hidden faults" such as link jitter, intermittent timeouts, and unstable WebSocket heartbeats. Although the nodes may still be connected at the Transmission Control Protocol (TCP) layer, significant latency, occasional packet loss, or frequent reconnections may have already occurred at the application layer. When administrators initiate batch operations based on the "online" status, failures are likely to only be exposed after submission. Summary of the Invention

[0003] The purpose of this application is to provide a method and apparatus for predicting latent faults in distributed communication nodes based on front-end browser caching. The specific technical solution adopted is as follows: Firstly, a method for predicting latent faults in distributed communication nodes based on front-end browser caching is provided, the method comprising: The front-end browser is used to determine at least one target node to be operated on in a distributed communication node cluster; The network performance metrics of each target node within a preset time window are obtained from the front-end browser cache. The network performance metrics include at least jitter metrics, timeout rate, server error rate, and reconnection penalty metrics. The reconnection penalty metrics are determined based on the number of reconnections and a preset maximum reconnection threshold. The health heat value of each target node is determined using preset weights based on the jitter index, the timeout rate, the server error rate, and the reconnection penalty index. Target nodes with health thermal values ​​below the thermal threshold are identified as high-risk nodes, and operations targeting these high-risk nodes are blocked.

[0004] Secondly, a distributed communication node latent fault prediction device based on front-end browser caching is provided, the device comprising: The first determining module is used to determine at least one target node to be operated on in a distributed communication node cluster using the front-end browser; The acquisition module is used to acquire network performance indicators of each target node within a preset time window from the front-end browser cache. The network performance indicators include at least jitter indicators, timeout rate, server error rate, and reconnection penalty indicators. The reconnection penalty indicators are determined based on the number of reconnections and a preset maximum reconnection number threshold. The second determining module is used to determine the health heat value of each target node based on the jitter index, the timeout rate, the server error rate and the reconnection penalty index using preset weights. The third determination module is used to identify target nodes whose health thermal value is less than the thermal threshold as high-risk nodes, so as to block operations against the high-risk nodes.

[0005] Thirdly, an electronic device is provided, comprising: a memory and at least one processor, wherein the memory stores instructions; the at least one processor invokes the instructions in the memory to cause the electronic device to execute the aforementioned method for predicting latent faults in distributed communication nodes based on front-end browser caching.

[0006] Fourthly, a computer program product is provided, comprising: computer program code, which, when run on a computer, causes the computer to perform the methods described in the first aspect or any possible implementation thereof.

[0007] Fifthly, a computer-readable storage medium is provided that stores computer program code, which, when executed on a computer, causes the computer to perform the methods described in the first aspect or any possible implementation thereof.

[0008] This application offers the following advantages: it stores network performance metrics through front-end browser caching, dynamically assesses node health using time windows and sliding window mechanisms, and ultimately quantifies risk through heat values ​​to block high-risk node operations. Its core advantage lies in proactive front-end prediction and real-time blocking, effectively improving the fault tolerance of distributed systems and user experience. Attached Figure Description

[0009] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0010] Figure 1This application provides an embodiment of a distributed node heat prediction and early warning architecture based on front-end link caching. Figure 2 A schematic diagram illustrating the implementation process of a distributed communication node latent fault prediction method based on front-end browser caching, provided in an embodiment of this application; Figure 3 A timing diagram for pre-warning and interception of batch operations is provided in an embodiment of this application; Figure 4 A schematic diagram illustrating the implementation process of thermal value calculation and draft protection provided for an embodiment of this application; Figure 5 A schematic diagram of the structure of a distributed communication node latent fault prediction device based on front-end browser caching provided in an embodiment of this application; Figure 6 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0011] To further illustrate the technical means and effects adopted by this application to achieve the intended inventive objective, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of a distributed communication node latent fault prediction method based on front-end browser caching proposed in this application. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined from any suitable form.

[0012] In the description of the embodiments of this application, unless otherwise stated, " / " means "or". For example, A / B can mean A or B. The "and / or" in the text is merely a description of the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of this application, "multiple" means two or more.

[0013] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as implying or suggesting relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

[0014] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains.

[0015] The following is a standardized explanation of the terminology: Heat Score: A continuous health value from 0 to 100 calculated from recent link interaction data of the node.

[0016] Hidden faults: Nodes are still accessible at the basic connectivity level, but at the application interaction level, they exhibit unstable states such as high latency, intermittent timeouts, frequent reconnections, or high error rates.

[0017] Front-end passive probe: This refers to a browser not initiating additional special probe requests, but instead using data generated from normal business interactions as input for link evaluation.

[0018] Sliding window: refers to a system that retains only the most recent fixed time range or a fixed number of valid link records for real-time calculation of heat values ​​and automatically discards old data.

[0019] Pre-emptive warning: This refers to the front-end issuing warnings, blocking, or recommending execution strategies in advance based on heat map assessment results before a user actually submits a high-risk operation.

[0020] Draft protection: In high-risk network environments, the system automatically persists user input to the local machine periodically so that it can be recovered after a failed submission.

[0021] "Under observation state" refers to an intermediate state in which a high-risk judgment is not made temporarily due to insufficient sample size or too short data window. It is used to reduce the probability of false judgment during cold start.

[0022] In the daily operation and maintenance of cloud PBXs, voice gateways, edge communication nodes, and unified communication platforms, administrators typically perform high-risk operations such as batch configuration, firmware upgrades, restarts, and route adjustments on nodes distributed across different regions and carrier networks through a unified web console. Existing solutions usually only display a binary "online / offline" status in the device list, which is insufficient to reflect "hidden fault" states such as link jitter, intermittent timeouts, and unstable WebSocket heartbeats. The main problems are as follows: 1. The binary online status is highly misleading: Although nodes may still be connected at the Transmission Control Protocol (TCP) layer, significant latency, occasional packet loss, or frequent reconnections may be occurring at the application layer. Administrators initiating batch operations based on the "online" status are prone to discovering failures only after submission.

[0023] 2. Delayed fault detection: The existing platform often only executes the requests one by one and returns timeout or failure results after the user clicks "save" or "batch distribute", which leads to inconsistent configurations, interrupted upgrades or partial node disconnection.

[0024] 3. Lack of protection for complex input: When users edit long forms, complex routing rules, or interactive voice response (IVR) processes on nodes with poor network quality, the final submission failure may result in the loss of a large amount of input, causing significant operational frustration.

[0025] 4. High cost of high-frequency active probing at the backend: If the backend is relied upon to continuously and actively probe a large number of nodes, it will introduce additional bandwidth, signaling and server computing overhead, and it will be difficult to accurately reflect the true end-to-end link quality from the current administrator’s network to the target node.

[0026] Figure 1 This application provides an embodiment of a distributed node heat prediction and early warning architecture based on front-end link caching, as shown in the diagram. Figure 1 As shown in the diagram, the architecture includes a front-end browser and distributed communication nodes, where... "Distributed Communication Nodes". The box contains three small rectangles labeled "Node A Stable", "Node B Latent Jitter" and "Node C Offline". These three nodes are connected to the front-end browser in sequence by arrows, representing the business request / heartbeat RTT between the nodes and the front-end browser.

[0027] The front-end browser includes a global request and a WebSocket interceptor. Below, three arrows connect to three nodes set in the distributed communication nodes, indicating that requests from different types of nodes enter the interceptor.

[0028] The “Global Requests and WebSocket Interceptors” section has an arrow pointing to the “IndexedDB / Memory Sliding Window Linked Cache” rectangle below, indicating that requests processed by the interceptor will be cached.

[0029] Below that is the rectangle labeled "Local Thermal Prediction Engine J / T / R / E Weighted Calculation," which is connected to the cache above by an arrow, indicating that the cached data will be input into the thermal prediction engine for calculation.

[0030] The thermal prediction engine has two branches: The left branch points to the "List / Topology Thermal Rendering" rectangle, indicating that the calculation results can be used for thermal rendering of lists or topologies.

[0031] The right branch points to the "High-Risk Operation Pre-detection and Interception" rectangle, which then connects to the "Draft Auto-save and Restore Prompt" rectangle. This indicates that high-risk operations are pre-detected and intercepted, and a prompt function for automatic saving and restoring of drafts is provided.

[0032] Here, this embodiment does not require the backend to launch continuous active probing to a massive number of nodes. Instead, it treats each browser session (frontend browser) logged into the console as a lightweight edge probe. The frontend browser only collects request latency, timeout, and heartbeat data naturally generated during normal business interactions and performs heat value assessment locally. This approach avoids additional network overhead and obtains end-to-end quality conclusions that are closer to the current administrator's access path.

[0033] In a preferred embodiment, after a user clicks the "Batch Upgrade" or "Batch Distribute" button, the system does not immediately send a request to the backend. Instead, it first performs local clustering and risk percentage statistics on the selected node set. If the number of nodes in the high-risk range exceeds a preset proportion, the front-end browser performs the following actions: 1. Block the original commit event; 2. Output a list of high-risk nodes and their corresponding thermal values; 3. Automatically generate two sets: "Healthy Node Batch" and "Observation Node Batch"; 4. Allow users to choose to execute only on healthy batches, or retain all nodes but require secondary confirmation.

[0034] This mechanism changes the previous "send all and then check the results" interaction mode to "evaluate first and then execute" interaction mode.

[0035] Figure 1 It demonstrates the complete process from sending requests by distributed communication nodes to interception, caching, calculation, rendering, and security processing within the front-end browser.

[0036] This application's embodiments overcome the shortcomings of existing distributed communication node management, such as "blindly operating based solely on online status, faults only being exposed after submission, lack of local protection for complex configurations, and high costs for real-time backend detection," by providing a heat map prediction and early warning mechanism based on front-end historical link data caching. This mechanism reuses request latency, timeout, error, and heartbeat time-series data generated during normal browser business interactions to calculate node health heat map values ​​locally on the front end. Furthermore, it intervenes proactively before users perform high-risk operations through heat maps, early warning pop-ups, batch strategy recommendations, and draft protection.

[0037] The beneficial effects are as follows: 1. Move fault interception forward to before clicking: By performing a pre-emptive health assessment of the target node set on the front end, the scope of incidents caused by batch configuration failures and inconsistent cluster configurations can be reduced.

[0038] 2. Identify hidden faults that are “online but unstable”: By using multi-dimensional indicators such as heartbeat delay variance, timeout rate, and reconnection frequency, the node status is upgraded from binary online / offline to continuous heat value, improving the lead time for anomaly detection.

[0039] 3. Reduce losses from complex operation failures: When a high-risk link environment is detected, the local draft saving and restoration reminder mechanism is automatically enabled to avoid the loss of form and configuration inputs due to final submission failure.

[0040] 4. Avoid additional backend probing overhead: Use the browser as a passive edge probe and leverage existing business interaction byproducts for local computation, without relying on large-scale backend active polling.

[0041] 5. Reflects the true end-to-end perspective: The calculated heat values ​​directly correspond to the actual link experience between the current administrator's access environment and the target node, making them more suitable for guiding operation and maintenance decisions in the current session.

[0042] The system proposed in this application mainly consists of the following core modules: 1. Front-end link data acquisition and node attribution module: Encapsulate fetch, XMLHttpRequest, axios interceptors, or equivalent request mechanisms in the front-end global request layer. Fetch, XMLHttpRequest (XHR), and axios are three commonly used HTTP request tools, while the interceptor is the core function provided by axios, used to uniformly process logic before the request is sent or after the response is returned. Listen for WebSocket, SSE, or equivalent long-lived connection objects. WebSocket is a TCP-based standalone protocol that establishes a persistent connection through a single handshake, allowing bidirectional free communication between the client and server. SSE is a lightweight HTTP-based protocol that allows the server to push a stream of text data to the client through a one-way persistent connection.

[0043] Each request or heartbeat interaction is associated with the target node identifier (node_id), and the request start and end times, timeout flag, status code, disconnection time, number of reconnections, and heartbeat delay (Round-Trip Time, RTT) are recorded.

[0044] This front-end link data acquisition and node attribution module can be corresponding to... Figure 2 Implement the above functionality in the "Global Requests and WebSocket Interceptors".

[0045] 2. Sliding window local cache and clearing module: Maintain local link time-series cache at the node level, preferably write to IndexedDB, and combine with memory cache when necessary to improve read efficiency. IndexedDB is a client storage solution provided by browsers, which allows developers to store large amounts of structured data (including files, binary objects, etc.) in users' browsers and supports transactions, indexes and complex queries. Each node retains only data within a preset time window, preferably within the last 10 minutes or the last N valid interaction records; Expired data is subject to rolling eviction, limiting local storage usage and maintaining the real-time nature of heat values.

[0046] The local cache and cleanup module of this sliding window can be corresponding to... Figure 2 The above functionality is implemented in "IndexedDB / Memory Sliding Window Link Cache".

[0047] 3. Multidimensional thermal prediction calculation module: When a user opens a node details page, edits complex configurations, or prepares to initiate batch operations, the heat value calculation for the target node or set of target nodes is triggered. Let the delay of the most recent N heartbeats be . l 1, l 2...... l N Its mean is Calculate the mean using RTT μ And variance, then jitter index J It can be expressed as the following formula (1): (1); in, θj This represents the normalized threshold of the heartbeat delay variance, or it can be understood as the maximum heartbeat delay variance that the system can tolerate. l i This refers to the RTT of the i-th heartbeat. The total number of valid heartbeat records within the time window is the number of heartbeat samples N.

[0048] Let the total number of requests within the time window be The timeout number is The timeout rate can then be expressed using the following formula (2). T : (2); The system iterates through all recorded requests within a sliding time window. If a request has a "timeout flag," the timeout number is calculated. Q t Add 1. The total number of requests is the sum of all request records within the time window. Q .

[0049] The number of server errors is Then the server error rate can be represented by the following formula (3). E : (3); The system checks the status codes (such as HTTP status codes or WebSocket close codes) recorded for each request within the time window. If the status code indicates a server-side exception (e.g., HTTP 5xx series), then the server error count is calculated. Q e Add 1.

[0050] The number of reconnections is Then the reconnection penalty index can be represented by the following formula (4). R : (4); in, θ r This is the tolerable reconnection threshold (maximum number of reconnections threshold). The "number of reconnections" recorded in the upper part (usually referring to the action of re-establishing a connection after a disconnection in the underlying WebSocket or SSE) is accumulated within a specified time window and directly used as the formula. C r .

[0051] Each indicator is normalized and then weighted according to a preset weight. The nodal health thermal value is calculated using the following formula (5). : (5); in, ,and The lower the value, the higher the risk of the node link. Preferably, when the number of valid samples is lower than a preset threshold, it is not directly judged as high-risk, but marked as "to be observed" to avoid false judgments during cold start.

[0052] This multidimensional thermal prediction calculation module can correspond to Figure 2 The above functions are implemented in the "Local Thermal Prediction Engine J / T / R / E Weighted Calculation".

[0053] Traditional online status assessments typically rely solely on whether the last heartbeat has arrived, failing to identify sub-optimal states caused by network jitter. This application incorporates heartbeat delay variance into the thermal evaluation. When node A's heartbeat delay sequence is [50ms, 52ms, 49ms, 51ms, 50ms], the variance is small, indicating stable link connectivity. When node B's heartbeat delay sequence is [20ms, 800ms, 30ms, 1500ms, 40ms], although the heartbeat is not completely interrupted, its variance is significantly increased, indicating a high probability of hidden fault risk at this node. Based on this, the system prioritizes reducing node B's thermal value and issues a warning before users perform high-risk operations.

[0054] 4. Thermal rendering and batch operation pre-check module: Map heat values ​​to multi-level colors or animated cues, preferably green, yellow, orange, and red; Display node thermal status in the device list, topology map, geographical distribution map, or batch selection panel; When a user initiates operations such as batch upgrades, batch distributions, or batch restarts, the front end first assesses the proportion of nodes in the selected node set that are below the safety threshold. If the proportion exceeds the threshold, the direct submission is blocked, and suggestions such as "remove high-risk nodes," "execute in batches," and "retry later" are provided.

[0055] This thermal rendering and batch operation pre-check module can correspond to Figure 2 The above functions can be implemented in the "List / Topology Heat Rendering" function.

[0056] 5. High-risk environment adaptive degradation and draft protection module: When the thermal value of the target node is in the warning or danger range, an environmental warning banner will be displayed at the top of the page; For complex configuration forms, enable a local draft auto-save mechanism, preferably triggering serialization at time intervals or field changes, and writing to IndexedDB or equivalent local storage; When the final submission fails, the front-end intercepts the error and prompts the user that the draft has been saved locally to prevent the page from being cleared or the input from being lost.

[0057] This high-risk environment adaptive degradation and draft protection module can correspond to... Figure 2 The system includes the "High-Risk Operation Pre-detection and Interception" and "Automatic Draft Saving" functions.

[0058] 6. Recovery detection and one-click re-extraction module: The system continuously monitors changes in the thermal values ​​of the target nodes in the background; When the thermal value is detected to return to the healthy range for several consecutive evaluation cycles, the user is prompted whether to restore the previous draft and resubmit it. For high-risk nodes that have been removed, a retry queue can be generated so that batch operations can be re-executed later.

[0059] This recovery detection and one-click resubmission module can correspond to... Figure 2 The "Restore Prompt" function is included.

[0060] Here, the disconnection time recorded by the front-end link data acquisition and node attribution module can serve as the "time coordinate" for underlying data filtering and status control, supporting the operation of the calculation module and business logic, and playing a role in three key stages.

[0061] 1. Supporting data cleaning for "sliding windows" (core function).

[0062] This section explains the formula for calculating the total number of requests. Q Number of reconnections C r The condition for waiting is within the set time window (e.g., within the last 10 minutes). The specific rules are as follows: The system needs to record the "disconnection time" of each disconnection.

[0063] When executing the "sliding window local cache and cleanup module," the "disconnection time" is subtracted from the current time. If the disconnection occurred more than 10 minutes ago, it is discarded as expired data. The "disconnection time" serves as a "scale" for the time window, determining whether reconnection should be included in the current timeline. C r .

[0064] 2. Trigger the immediate switch for "Pre-warning and Draft Protection".

[0065] In the "High-Risk Environment Adaptive Degradation and Draft Protection Module" above, local draft saving is enabled during high-risk network conditions. Relying solely on periodic (e.g., every minute) calculations of heat values ​​using formulas may result in delays. Recording precise "disconnection timestamps" allows the front-end to immediately trigger an interception mechanism or increase the draft saving frequency the moment a WebSocket or long-term connection is broken, without waiting for the next formula calculation.

[0066] 3. Assist in calculating implicit indicators (scalability). The current formula only uses the number of reconnections. C r In actual engineering implementation: combining the "disconnection time point" and the subsequent "successful reconnection time point", the "single disconnection duration" can be calculated. The "disconnection duration percentage" is then added as a penalty weight to the heat value. H For calculations, the "time point of disconnection" is an essential basic data.

[0067] This application reuses front-end business interaction byproducts, performing heatmap calculations locally in the browser, reducing back-end costs and more closely reflecting the actual user experience of current administrators. Continuous health heatmap values ​​are constructed using multi-dimensional indicators such as heartbeat variance, timeout rate, reconnection frequency, and error rate, improving anomaly identification accuracy. Before a user clicks a high-risk operation, a node set assessment is performed, and blocking, batch splitting, and strategy recommendations are executed based on thresholds, significantly reducing the impact of potential incidents. The heatmap values, operational risks, and draft saving strategies are linked, giving complex inputs in high-risk node scenarios inherent local fault tolerance. Not only are drafts saved upon failure, but subsequent heatmap recovery monitoring triggers one-click recovery and resubmission, forming a complete interactive closed loop from warning, degradation, fallback to recovery.

[0068] This application provides a method for predicting latent faults in distributed communication nodes based on front-end browser caching, such as... Figure 2 As shown, this can be achieved through the following steps: Step S210: Use the front-end browser to determine at least one target node to be operated on in the distributed communication node cluster; Here, distributed communication nodes can be as follows: Figure 1 The box shows "Distributed Communication Nodes". It contains three small rectangles labeled "Node A Stable", "Node B Latent Jitter", and "Node C Offline". These three nodes are connected to the front-end browser sequentially via arrows.

[0069] During implementation, administrators can select multiple nodes and click "Batch Deploy" in the front-end console (front-end browser). An arrow pointing from the administrator to the front-end console indicates that this operation has been initiated. Operations to be performed include batch configuration, firmware upgrade, reboot, and route adjustment.

[0070] Step S220: Obtain the network performance indicators of each target node within a preset time window from the front-end browser cache, wherein the network performance indicators include at least jitter indicators, timeout rate, server error rate, and reconnection penalty indicators; the reconnection penalty indicators are determined based on the number of reconnections and a preset maximum reconnection number threshold. During implementation, four key metrics of the target node within a preset time window (such as the last 10 minutes) can be extracted from the browser cache (such as IndexedDB, LocalStorage).

[0071] jitter index J The variance is calculated based on the heartbeat RTT sequence to reflect the degree of network latency fluctuation.

[0072] Timeout rate T Number of timeout requests Q t Total number of requestsQ The ratio indicates the probability of a request timeout.

[0073] Server error rate E Server-side error count Q e Total number of requests Q The ratio can reflect abnormal situations on the server side.

[0074] Reconnection penalty indicators R Number of reconnections C r With tolerance threshold θ r The ratio (taking the minimum value of 1) can quantify the impact of disconnection and reconnection on the system.

[0075] Step S230: Determine the health heat value of each target node based on the jitter index, the timeout rate, the server error rate, and the reconnection penalty index using preset weights.

[0076] During implementation, a comprehensive health score for nodes can be generated by weighted aggregation of four types of indicators. The preset weights can be adjusted based on business priorities. For example, in scenarios with high real-time requirements, the jitter indicator may have a higher weight; in scenarios with high stability requirements, the reconnection penalty indicator may have a higher weight.

[0077] In some embodiments, the health thermal value is displayed in a node list or node map on the front-end browser page in the form of multi-level colors or animation.

[0078] Here, heat values ​​can be mapped to multi-level colors or animated cues, preferably green, yellow, orange, and red; the node heat status can be displayed in the device list, topology map, geographical distribution map, or batch selection panel.

[0079] Step S240: Identify the target node whose health thermal value is less than the thermal threshold as a high-risk node, so as to block the operation against the high-risk node.

[0080] During implementation, before sending a request, the system checks whether the target node is a high-risk node (using an axios interceptor). If so, the request is intercepted and the user is notified.

[0081] In some embodiments, a preset weight and a hot threshold corresponding to the current operation type can be determined based on a preset mapping relationship between the operation type and a preset weight and a hot threshold. The operation type includes at least: firmware upgrade, large file distribution, configuration saving, and read-only inspection.

[0082] Here, different thermal calculation weights and interception thresholds can be switched for different operations such as firmware upgrades, large file distribution, configuration saving, and read-only inspections, in order to match the sensitivity of different operations to bandwidth, latency, and stability.

[0083] In this embodiment, network performance metrics are stored in the front-end browser cache, and node health is dynamically assessed using time windows and sliding window mechanisms. Finally, risk is quantified through heatmap values, and operations of high-risk nodes are blocked. Its core advantage lies in proactive front-end prediction and real-time blocking, effectively improving the fault tolerance of the distributed system and the user experience.

[0084] In some embodiments, before performing step S210 "using the front-end browser to determine at least one target node to be operated on in the distributed communication node cluster", this application embodiment provides a method for obtaining and storing network performance indicators, which can be implemented through the following steps: Step A: Configure a front-end network request interception mechanism in the front-end browser; During implementation, the front-end link data collection and node attribution module can be used to encapsulate fetch, XMLHttpRequest, axios interceptors or equivalent request mechanisms at the front-end global request layer, and set up a front-end network request interception mechanism.

[0085] Step B: Use the aforementioned front-end network request interception mechanism to monitor the front-end long connection data transmission channel and obtain the request start and end time, timeout flag, status code, number of reconnections, and heartbeat delay. During implementation, the front-end link data collection and node attribution module can be used to monitor WebSocket, SSE or equivalent long connection objects, associate each request or heartbeat interaction with the target node identifier (node_id), and record the request start and end time, timeout flag, status code, disconnection time, number of reconnections and heartbeat delay.

[0086] Here, the start and end times of a request can be accurately calculated by recording startTime when the request is sent and endTime when the response arrives through an interceptor.

[0087] Timeout flag: If the request takes longer than a preset threshold (e.g., 3 seconds), the timeout flag is set to true, and the timeout counter is triggered. Q t Self-incrementing.

[0088] Status code capture: The response interceptor parses HTTP status codes (such as 200, 404, 500) or WebSocket close codes (such as 1000 normal close, 1006 abnormal disconnect), triggered by server-side errors (5xx). Q eThe counter increments automatically.

[0089] Reconnection count statistics: When WebSocket / SSE automatically reconnects after a disconnection, the number of reconnections is accumulated by listening to the onclose and onopen events. C r If the reconnection is successful, the time of success will be recorded for subsequent calculation of the disconnection duration.

[0090] Heartbeat Delay (RTT): In a long connection, heartbeat packets are sent periodically (e.g., every 2 seconds). An RTT sequence is generated by recording the difference between the heartbeat sending and receiving timestamps. l 1, l 2, ..., l N Used to calculate network jitter metrics J .

[0091] Step C: In response to triggering operation on at least one of the target nodes, determine the network performance metrics based on the request start and end times, the timeout flag, the status code, the number of reconnections, and the heartbeat delay, and store the network performance metrics in the front-end browser cache.

[0092] When a user opens a node details page, edits complex configurations, or prepares to initiate batch operations, a heat value calculation is triggered for the target node or set of target nodes. This calculation determines network performance metrics based on request start and end times, timeout flags, status codes, reconnection counts, and heartbeat latency, and stores these network performance metrics in the front-end browser cache.

[0093] In this embodiment, a complete distributed node health assessment system is constructed through front-end interception, indicator collection, calculation and storage, which provides real-time and accurate data support for blocking high-risk nodes in subsequent steps, and ultimately realizes the active fault tolerance and intelligent degradation operation of the distributed system.

[0094] In some embodiments, step C above, "determining the network performance metrics based on the request start and end times, the timeout flag, the status code, the number of reconnections, and the heartbeat delay," can be achieved through the following steps: Step C1: Within the preset time window, determine the jitter index based on N heartbeat delays, the average of the N heartbeat delays, and the preset maximum heartbeat delay variance, where N is an integer greater than or equal to 1; During implementation, let the delay of the most recent N heartbeats be denoted as . l 1, l 2...... l N Its mean is , Calculate the mean using RTT μ And variance, then jitter indexJ It can be expressed as the above formula (1).

[0095] Step C2: Determine the total number of requests based on the start and end times of the requests within the preset time window; Here, the start and end times of the requests can be used to accurately calculate the time taken for a single request. Dividing the total duration of the preset window by the time taken for a single request gives the total number of requests.

[0096] Step C3: Determine the number of timeout requests within the preset window based on the timeout flag; Here, the system iterates through all recorded requests within a sliding time window. If a request has a "timeout flag," the timeout number is... Q t Add 1. The total number of requests is the sum of all request records within the time window. Q .

[0097] Step C4: Determine the number of server error requests based on the server error status codes in the preset window; Here, the system checks the "status code" (such as HTTP status code or WebSocket close code) recorded for each request within the time window. If the status code indicates a server-side exception (e.g., HTTP 5xx series), then the server error count is... Q e Add 1.

[0098] Step C5: Determine the timeout rate based on the ratio of the number of timeout requests to the total number of requests; During implementation, let the total number of requests within the time window be... The timeout number is Therefore, the timeout rate can be expressed using the above formula (2). T .

[0099] Step C6: Determine the server error rate based on the ratio of the number of server-side erroneous requests to the total number of requests; During implementation, the server error rate can be represented by the above formula (3). E .

[0100] Step C7: Determine the reconnection penalty index based on the number of reconnections and the preset maximum reconnection threshold.

[0101] During implementation, the above formula (4) can be used to represent the reconnection penalty index. R .

[0102] In this embodiment, by determining multiple network performance metrics (jitter, timeout rate, server error rate, reconnection penalty, etc.), the network performance and health status of distributed communication nodes can be comprehensively evaluated from different perspectives. These metrics complement each other, more accurately reflecting the actual operating status of the nodes. This provides accurate data for subsequently determining the health heatmap value of target nodes and identifying high-risk nodes. Based on these precise network performance metrics, it is possible to more scientifically decide whether to block operations on certain nodes, thereby improving the stability and reliability of the distributed system and reducing problems such as service interruption and data loss caused by node failures.

[0103] In some embodiments, this application also provides a method for obtaining the percentage of disconnection time, which can be achieved through the following steps: Step D: Use the aforementioned front-end network request interception mechanism to monitor the front-end long connection data transmission channel and obtain the disconnection time and reconnection success time. During implementation, the WebSocket constructor or EventSource listener can be overridden to capture the timestamps of connection disconnection (onclose event) and successful reconnection (onopen event) to obtain the time of disconnection and successful reconnection.

[0104] In some embodiments, the PerformanceTiming API or performance.now() can be used to ensure that the timestamp accuracy is at the millisecond level, avoiding calculation deviations caused by system time errors.

[0105] Step E: In response to triggering operation on at least one of the target nodes, determine the duration of a single disconnection within the preset window based on the disconnection time and the reconnection success time. Here, within a preset time window (e.g., 10 minutes), the disconnection-reconnection event pairs of the target node are traversed. For example, the duration of a single disconnection can be calculated using the following formula (5): d = t2 - t1 (5); The disconnection time is t1, and the successful reconnection time is t2.

[0106] During implementation, a sliding window mechanism is used to filter out expired data (such as event pairs that are more than 10 minutes old) to ensure that only the disconnection duration within the current window is calculated.

[0107] Step F: Determine the percentage of disconnection time based on the duration of a single disconnection and the total duration of the preset window, so as to determine the percentage of disconnection time as the network performance indicator; Let the total duration of the preset window be T. total(e.g., 600 seconds), if the duration of a single disconnection is d, then the percentage of disconnection duration is the ratio of the cumulative duration of a single disconnection to the total duration of the preset window. For example, if the cumulative disconnection duration within the window is 60 seconds, then the percentage of disconnection duration is 0.1.

[0108] Correspondingly, step S230 above, "determining the health heat value of each target node based on the jitter index, the timeout rate, the server error rate, and the reconnection penalty index using preset weights," can be implemented through the following process: The health heat value of each target node is determined by using preset weights based on the jitter index, the timeout rate, the server error rate, the reconnection penalty index, and the proportion of disconnection duration.

[0109] During implementation, a weighting factor for the percentage of disconnection time can be added to the formula for calculating the health heat value to quantify the impact of the percentage of disconnection time on node health. For example, in real-time audio and video scenarios, the weighting factor for the percentage of disconnection time can be set relatively high (e.g., 0.3), because the duration of disconnection time directly affects the user experience.

[0110] In this embodiment, the introduction of the disconnection duration percentage not only enhances the accuracy of node health assessment, but also improves user experience and system reliability through dynamic weight configuration.

[0111] In some embodiments, step S240 above, "identifying the target node whose health thermal value is less than the thermal threshold as a high-risk node, so as to block operations against the high-risk node," can be achieved through the following steps: Step 241: Identify target nodes whose health thermal values ​​are less than the thermal threshold as high-risk nodes; During implementation, the health heat value H is compared with a preset threshold using JavaScript on the front end. If the health heat value is less than the heat threshold, the node is marked as a high-risk node and stored in the "High-Risk Node List" object repository in IndexedDB.

[0112] Step 242: If the proportion of high-risk nodes exceeds the proportion threshold, block operations on all target nodes. Here, if the proportion of high-risk nodes exceeds a certain threshold, blocking operations on all target nodes can prevent excessive system degradation due to misjudgment of a small number of nodes. For example, if the threshold is set to 5, blocking will only be triggered when the number of high-risk nodes is greater than 5, preventing system oscillations caused by fluctuations in individual nodes.

[0113] During implementation, if the proportion of high-risk nodes exceeds the system tolerance (e.g., 30%), all requests from target nodes will be blocked to avoid systemic risks. If the proportion of high-risk nodes does not exceed the threshold, only the operation of that high-risk node will be blocked, while other target nodes will operate normally.

[0114] Step 243: The output includes at least one of the following first prompt messages: remove high-risk nodes, perform operations in batches, and retry the operation later.

[0115] In some embodiments, a list of high-risk nodes can be displayed on the front-end interface, prompting the user to manually remove or automatically isolate them. For example, an axios interceptor can check the node status before sending a request; if a node is high-risk, it can be directly intercepted and a message "This node is unavailable" can be displayed.

[0116] In some embodiments, when there are many high-risk nodes, it is recommended to process requests in batches. For example, large requests can be split into multiple smaller requests and sent to different nodes in batches to reduce the load on a single node.

[0117] In some embodiments, for non-urgent operations, users are advised to retry later. For example, based on the percentage of disconnection time, if the percentage of disconnection time for a node is greater than 10%, the message "Network is unstable, please try again later" will be displayed.

[0118] In some embodiments, the prompt information can be displayed intuitively through front-end UI components (such as pop-ups and Toast notifications) and operation guidance can be provided (such as "one-click retry" and "switch node").

[0119] In this embodiment, high-risk nodes are accurately identified and isolated through dual screening using health heat value thresholds and quantity thresholds, avoiding service degradation caused by a "one-size-fits-all" approach. Flexible strategies such as phased execution and later retries are employed to maximize business continuity while blocking high-risk nodes.

[0120] In some embodiments, this application also provides a method for batch operation of healthy nodes, which can be implemented through the following steps: Step S250: Determine the target nodes whose health thermal value is greater than or equal to the thermal threshold as healthy nodes; Step S260: In response to the instruction to perform batch operations on the healthy nodes, perform batch operations on all the healthy nodes.

[0121] In this embodiment, the entire process management from accurate screening of healthy nodes to efficient batch operations is realized, and the efficient utilization of resources is achieved through the healthy node batch operation engine.

[0122] Figure 3 A timing diagram for pre-warning and interception of batch operations is provided in the embodiments of this application, such as Figure 3As shown, the execution entities in this sequence diagram include the administrator, the front-end console, the heat prediction engine, the local link cache, and the back-end / gateway cluster. The interaction flow based on these execution entities is as follows: 1. Administrator initiates operation: The administrator selects multiple nodes and clicks "batch distribution" in the front-end console. The arrow points from the administrator to the front-end console, indicating that this operation is initiated.

[0123] 2. The front-end console requests the set of target nodes to be evaluated: The front-end console sends a "request to evaluate the target node set" to the heat prediction engine, with the arrow pointing from the front-end console to the heat prediction engine.

[0124] 3. Thermal prediction engine data processing: The heat prediction engine requests "read the most recent window heartbeat / timeout / reset data" from the local link cache.

[0125] The thermal prediction engine retrieves "return node link time series samples" from the local link cache.

[0126] The thermal prediction engine calculates the node thermal values ​​and the proportion of high-risk nodes based on the information obtained above.

[0127] The heat forecasting engine sends "return warning results and suggested strategies" to the front-end console.

[0128] 4. Condition Judgment and Operation Selection: There is a conditional check that "the proportion of high-risk nodes exceeds the threshold". If the condition is met, the front-end console will "block the original commit event".

[0129] The front-end console can generate prompt messages to remind administrators to "delete high-risk nodes or execute in batches".

[0130] Based on the above prompts, administrators can send the command "Select to execute only healthy node batches" to the front-end console.

[0131] 5. Request execution: The front-end console receives the instruction and sends a batch request for healthy nodes to the back-end / gateway cluster.

[0132] The backend / gateway cluster "returns the execution result" to the frontend console.

[0133] Risk acceptable level: Another condition is that "the risk is within an acceptable range." If this condition is met: The front-end console can "directly send batch requests" to the back-end / gateway cluster.

[0134] The backend / gateway cluster "returns the execution result" to the frontend console.

[0135] Figure 3 This demonstrates the interaction flow and condition judgment logic between various components when an administrator performs a batch distribution operation of nodes in the system.

[0136] Figure 4 This application provides a schematic diagram illustrating the implementation process of thermal value calculation and draft protection in an embodiment of the present application. Figure 4 As shown, the flowchart includes: S401: Open the complex configuration page; Users open the complex configuration page and start the entire process.

[0137] S402: Identify the target node; The front-end console identifies the target node that the user wants to operate on.

[0138] S403: Read local link window data; The thermal prediction engine reads relevant data from the local link window to prepare for subsequent calculations.

[0139] S404: Calculate the nodal thermal value H; The thermal prediction engine calculates the thermal value H of the target node based on the read data. The thermal value can reflect the node's load, activity frequency, or other important indicators.

[0140] S405: Determine the interval where H is located; Determine the range of the heat value H and take different measures according to different ranges.

[0141] S406: Normal form rendering (the normal process continues in the branch where H>=80); If the heat value H is less than 80 (in subsequent judgments, if H is not in the high warning range), the form will be rendered normally, and the user can continue to operate.

[0142] S407: Displays a red environmental warning (H<60); If the heat value H is less than 60, a red environmental warning will be displayed, indicating that the user node is in a high-load or high-risk state.

[0143] S408: Yellow environmental warning is displayed (60<=H<80); If the heat value H is between 60 and 80, a yellow environmental warning will be displayed, indicating that the user node is in a medium load or risky state.

[0144] S409: Enable automatic saving of high-frequency drafts (after branches with 60<=H<80 and H<60). When the heat value H is less than 80, in addition to displaying a warning (red or yellow), the high-frequency draft auto-save function can also be enabled to prevent data loss.

[0145] In some embodiments, when a user edits a complex form on a node with an orange or red heat value, the system can automatically increase the frequency of saving drafts and write each draft to a local draft record bound to the node identifier, page type, and timestamp. If the formal submission fails, the system retains the current draft version and prompts the user to restore and resubmit in a non-blocking manner after the heat value recovers. By linking "Network Recovery Detection" with "Draft Recovery Entry," the cost for users to repeatedly enter complex configurations can be reduced.

[0146] S410: The user clicks submit; After the user completes the operation in the form and clicks the submit button, the front-end console begins to process the request.

[0147] S411: Determine if the request timed out / failed; If the request is determined not to have timed out or failed, proceed to step S142; if the request is determined to have timed out or failed, proceed to step S413.

[0148] S412: Clear the draft and indicate that saving was successful; If the request does not time out and does not fail, the draft is cleared and the user is notified that the save was successful.

[0149] S413: Keep the local draft and indicate that a fallback has been provided; If the request times out or fails, the local draft is retained, and the user is notified that a fallback has been taken.

[0150] S414: Background monitoring of heat value recovery; If the request times out or fails, monitor whether the heat value has returned to the healthy range.

[0151] S415: Determine whether H has continuously recovered to the healthy range; Determine whether the heat value H has continuously recovered to the healthy range, i.e.

[0152] S416: Prompt for one-click recovery and resubmission (H continuously recovers to the healthy range H>=80) If the heat value H continuously returns to the healthy range, the user can be prompted to restore with one click and resubmit the request.

[0153] Figure 4 It describes the complete process from when a user opens the configuration page to request processing and subsequent heat value monitoring, ensuring that corresponding feedback and safeguards are provided in different states.

[0154] This application provides a solution for implementing a heat prediction and early warning mechanism based on front-end historical link data caching in a scenario of batch upgrades of cross-regional voice gateways for large retail chain enterprises, as follows: Scenario Description: A large retail enterprise has deployed a large number of voice gateways in multiple countries and regions. The headquarters administrator needs to perform a batch firmware upgrade on approximately 100 gateways in one region through a unified web console. Due to cross-carrier and cross-regional access, some nodes, although still showing as online in the device list, are experiencing significant link jitter.

[0155] The implementation steps are as follows: Step 1: Visualizing latent faults; After the administrator logs into the console, the front end calculates the heat value for each store gateway based on the heartbeat RTT, request timeout, and reconnection data in the most recent time window, and displays the corresponding color status on a list or map.

[0156] Step 2: Pre-emptive interception and batch-based recommendations; The administrator selects 100 nodes in the target area and clicks the upgrade button. The front-end heat prediction engine first evaluates the selected node set and finds that several nodes are in a high-risk state. It then blocks the direct submission and prompts the administrator to prioritize the removal of high-risk nodes or to execute the upgrade in batches according to their health status.

[0157] Step 3: Prioritize executing healthy batches; When the administrator selects the "intelligent batching" strategy, the system will immediately upgrade healthy nodes in the first batch, while transferring high-risk nodes to the observation or retry queue.

[0158] Step 4: Draft protection as a fallback; The administrator then opened the advanced routing configuration page of one of the high-risk nodes. The system automatically displayed an environment warning and enabled local draft saving. If the final submission failed due to link timeout, the front end indicated that the draft had been safely saved locally. When the node subsequently recovered to a healthy heat value range, the system prompted the administrator to restore and resubmit the draft with one click.

[0159] This application also provides a multi-administrator front-end status sharing implementation method: in the scenario where multiple administrators are online at the same time, each browser can share node heat information through WebRTC or an equivalent data channel, so that administrators who have not yet directly accessed a certain node can also obtain early warning information in advance.

[0160] This application also provides an implementation method for dynamically switching weights based on operation type: for different operations such as firmware upgrade, large file distribution, configuration saving and read-only inspection, different thermal calculation weights and interception thresholds can be switched to match the sensitivity of different operations to bandwidth, latency and stability.

[0161] This application also provides an implementation method that combines intelligent diagnostic suggestions: In a system with background analysis capabilities, a summary of local thermal time series characteristics can be uploaded to the intelligent analysis module, which can then combine historical work orders, regional topology, or time patterns to provide the administrator with a more suitable operation window or processing suggestions.

[0162] This application also provides an offline retry queue implementation method: for operations that fail to submit but have idempotent characteristics, the request digest can be added to the local retry queue after user confirmation, and the node will be automatically prompted to re-execute after the node heat value is restored.

[0163] This application provides a distributed communication node latent fault prediction device based on front-end browser caching. Please refer to [link to relevant documentation]. Figure 5 The device 500 includes: The first determining module 510 is used to determine at least one target node to be operated on in a distributed communication node cluster using the front-end browser. The acquisition module 520 is used to acquire network performance indicators of each target node within a preset time window from the front-end browser cache. The network performance indicators include at least jitter indicators, timeout rate, server error rate, and reconnection penalty indicators. The reconnection penalty indicators are determined based on the number of reconnections and a preset maximum reconnection number threshold. The second determining module 530 is used to determine the health heat value of each target node based on the jitter index, the timeout rate, the server error rate and the reconnection penalty index using preset weights. The third determination module 540 is used to determine the target node whose health thermal value is less than the thermal threshold as a high-risk node, so as to block the operation against the high-risk node.

[0164] Figure 6 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. For example, as shown... Figure 6 As shown, the computer device 600 includes: a memory 601, a processor 602, and a computer program 603 stored in the memory 601 and running on the processor 602. When the processor 602 executes the computer program 603, the computer device can execute any of the aforementioned methods for predicting latent faults in distributed communication nodes based on front-end browser caching.

[0165] Furthermore, this application also protects a control device, which may include a memory and a processor. The memory stores executable program code, and the processor is used to call and execute the executable program code to perform a method for predicting latent faults in distributed communication nodes based on front-end browser caching provided in this application. This application can divide the control device into functional modules based on the above method example. For example, each module can correspond to a specific function, or two or more functions can be integrated into a processing module. The integrated module can be implemented in hardware. It should be noted that the module division in this application is illustrative and only represents a logical functional division; other division methods may exist in actual implementation. It should also be noted that all relevant content of each step involved in the above method embodiment can be referenced to the functional description of the corresponding functional module, and will not be repeated here. It should be understood that the control device provided in this application is used to execute the above-mentioned method for predicting latent faults in distributed communication nodes based on front-end browser caching, and therefore can achieve the same effect as the above-described implementation method. When using integrated units, the control device may include a processing module and a storage module. When the control device is applied to a block device, the processing module can be used to control and manage the actions of the block device. The storage module can be used to support the block device in executing mutual program code, etc. The processing module can be a processor or a controller, which can implement or execute various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. The processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of Digital Signal Processing (DSP) and a microprocessor, etc., and the storage module can be a memory.

[0166] Furthermore, the control device provided in the embodiments of this application may specifically be a chip, component, or module. The chip may include a connected processor and a memory. The memory stores instructions, and when the processor calls and executes the instructions, the chip can execute the distributed communication node latent fault prediction method based on front-end browser caching provided in the above embodiments. The embodiments of this application also provide a computer-readable storage medium storing computer program code. When the computer program code is run on a computer, it causes the computer to execute the aforementioned method steps to implement the distributed communication node latent fault prediction method based on front-end browser caching provided in the above embodiments.

[0167] This application also provides a computer program product. When the computer program product is run on a computer, it causes the computer to perform the above-mentioned related steps to realize the method for predicting latent faults in distributed communication nodes based on front-end browser caching provided in the above embodiments. The control device, computer-readable storage medium, computer program product, or chip provided in this application embodiment are all used to execute the corresponding methods provided above. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects in the corresponding methods provided above, and will not be repeated here. Through the description of the above embodiments, those skilled in the art can understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the control device can be divided into different functional modules to complete all or part of the functions described above. In the embodiments provided in this application, it should be understood that the disclosed control device and method can be implemented in other ways. For example, the control device embodiments described above are merely illustrative. For example, the division of modules or units is only a logical functional division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated into another control device, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interface, control device or unit, and can be electrical, mechanical or other forms.

[0168] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired results. In some embodiments, multiple task processing and parallel processing are possible or may be advantageous. The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. The above content is only a specific implementation of this application, but the protection scope of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the protection scope of this application.

Claims

1. A method for predicting latent faults in distributed communication nodes based on front-end browser caching, characterized in that, The method includes: The front-end browser is used to determine at least one target node to be operated on in a distributed communication node cluster; The network performance metrics of each target node within a preset time window are obtained from the front-end browser cache. The network performance metrics include at least jitter metrics, timeout rate, server error rate, and reconnection penalty metrics. The reconnection penalty metrics are determined based on the number of reconnections and a preset maximum reconnection threshold. The health heat value of each target node is determined using preset weights based on the jitter index, the timeout rate, the server error rate, and the reconnection penalty index. Target nodes with health thermal values ​​below the thermal threshold are identified as high-risk nodes, and operations targeting these high-risk nodes are blocked.

2. The method as described in claim 1, characterized in that, Before determining at least one target node to be operated on in the distributed communication node cluster using the front-end browser, the method further includes: Configure a front-end network request interception mechanism in the front-end browser; The aforementioned front-end network request interception mechanism is used to monitor the front-end long connection data transmission channel and obtain the request start and end time, timeout flag, status code, number of reconnections, and heartbeat delay. In response to triggering operation on at least one of the target nodes, the network performance metrics are determined based on the request start and end times, the timeout flag, the status code, the number of reconnections, and the heartbeat delay, and the network performance metrics are stored in the front-end browser cache.

3. The method as described in claim 2, characterized in that, The determination of the network performance metrics based on the request start and end times, the timeout flag, the status code, the number of reconnections, and the heartbeat delay includes: Within the preset time window, the jitter index is determined based on N heartbeat delays, the average of the N heartbeat delays, and the preset maximum heartbeat delay variance. J The expression is as follows, where N is an integer greater than or equal to 1; in, l 1, l 2...... l N This indicates a delay of N heartbeats. This represents the average value of the N heartbeat delays. θj This represents the maximum heartbeat delay variance; The total number of requests is determined based on the start and end times of the requests within the preset time window; The number of timeout requests is determined based on the timeout flag within the preset window; Within the preset window, the number of server error requests is determined based on the server error status code in the status code. The timeout rate is determined based on the ratio of the number of timeout requests to the total number of requests; The server error rate is determined based on the ratio of the number of server-side erroneous requests to the total number of requests. The expression for the reconnection penalty index R, based on the number of reconnections and the preset maximum reconnection threshold, is as follows: in, θ r This indicates the preset maximum reconnection threshold. Indicates the number of reconnections.

4. The method as described in claim 2, characterized in that, The method further includes: The aforementioned front-end network request interception mechanism is used to monitor the front-end long connection data transmission channel to obtain the disconnection time and reconnection success time. In response to triggering operation on at least one of the target nodes, the duration of a single disconnection is determined within the preset window based on the disconnection time and the reconnection success time. The percentage of disconnection time is determined based on the duration of a single disconnection and the total duration of the preset window, and the percentage of disconnection time is used as the network performance indicator. Correspondingly, determining the health heatmap value of each target node based on the jitter metric, the timeout rate, the server error rate, and the reconnection penalty metric using preset weights includes: The health heat value of each target node is determined by using preset weights based on the jitter index, the timeout rate, the server error rate, the reconnection penalty index, and the proportion of disconnection duration.

5. The method according to any one of claims 1 to 4, characterized in that, The step of identifying target nodes with health thermal values ​​less than a thermal threshold as high-risk nodes, and blocking operations targeting these high-risk nodes, includes: Target nodes whose health thermal values ​​are less than the thermal threshold are identified as high-risk nodes; If the proportion of high-risk nodes exceeds a certain threshold, operations targeting all of the target nodes will be blocked. The output includes at least one of the following initial prompts: remove high-risk nodes, perform operations in batches, and retry the operation later.

6. The method as described in claim 5, characterized in that, The method further includes: Target nodes whose health thermal values ​​are greater than or equal to the thermal threshold are identified as healthy nodes; In response to an instruction to perform batch operations on the healthy nodes, a batch operation is performed on all the healthy nodes.

7. The method as described in claim 5, characterized in that, The method further includes: The data to be operated on at the high-risk nodes is stored as a draft in the local storage space corresponding to the front-end browser; Continuously monitor the health thermal values ​​of the high-risk nodes; If the health thermal value is determined to be greater than or equal to the thermal threshold, a second prompt message is output to remind the user that the draft can be restored and the operation data resubmitted.

8. The method according to any one of claims 1 to 4, characterized in that, The method further includes: Based on the preset mapping relationship between operation type and preset weight and heat threshold, the preset weight and heat threshold corresponding to the current operation type are determined. The operation type includes at least: firmware upgrade, large file distribution, configuration saving and read-only inspection.

9. The method according to any one of claims 1 to 4, characterized in that, The method further includes: The health thermal value is displayed in a multi-level color or animation format in the node list or node map of the front-end browser page.

10. A distributed communication node latent fault prediction device based on front-end browser caching, characterized in that, The device includes: The first determining module is used to determine at least one target node to be operated on in a distributed communication node cluster using the front-end browser; The acquisition module is used to acquire the network performance indicators of each target node within a preset time window from the front-end browser cache, wherein the network performance indicators include at least jitter indicators, timeout rate, server error rate and reconnection penalty indicators. The second determining module is used to determine the health heat value of each target node based on the jitter index, the timeout rate, the server error rate and the reconnection penalty index using preset weights. The third determination module is used to identify target nodes whose health thermal value is less than the thermal threshold as high-risk nodes, so as to block operations against the high-risk nodes.