Anti-interference Diagnosis Method for Dynamic Networking of GIS Sensors Based on K-Nearest Neighbor Algorithm
By adopting a dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm, the problems of inaccurate link quality assessment and insufficient self-healing ability of GIS wireless sensor networks in strong electromagnetic interference environments are solved, and high-reliability and low-cost adaptive networking of communication is realized.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GLOBAL SCI & TECH (SHANGHAI) CO LTD
- Filing Date
- 2026-04-29
- Publication Date
- 2026-06-30
AI Technical Summary
Existing GIS wireless sensor networks suffer from inaccurate link quality assessment under strong electromagnetic interference and multipath effects, making it difficult to distinguish between environmental interference and node hardware failures, resulting in low communication reliability and an inability to effectively self-heal.
A dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm is adopted. By collecting multi-dimensional communication link feature data, performing Z-score standardization processing, and using a dynamic anti-interference training sample library and an improved K-nearest neighbor classification model, combined with time decay factor and Gaussian kernel function, the link status is determined, triggering adaptive frequency selection networking strategy and hierarchical fault early warning.
It significantly improves the accuracy of link quality assessment and communication stability, can accurately distinguish between environmental interference and hardware failures, realizes dynamic adaptive anti-interference networking, reduces operation and maintenance costs and extends network life cycle.
Smart Images

Figure CN122131216B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of GIS sensors, and in particular to a method for diagnosing interference in dynamic networking of GIS sensors based on the K-nearest neighbor algorithm. Background Technology
[0002] Gas-insulated gas-insulated switchgear (GIS) is a core component of smart substations, and its operational status directly impacts the safety and stability of the power grid. Therefore, real-time monitoring of GIS conditions such as partial discharge, temperature rise, and mechanical characteristics is crucial. With the development of Internet of Things (IoT) technology, wireless sensor networks (WSNs) are widely used to replace traditional wired monitoring methods due to their flexible wiring and low maintenance costs. However, GIS equipment typically operates under high voltage and high current conditions, and its metal casing creates a complex waveguide environment. Furthermore, switching operations generate extremely strong transient electromagnetic interference (EMI), posing a significant challenge to the communication reliability of wireless sensor networks.
[0003] In the prior art, CN107742922A discloses a GIS system and method with remote self-diagnosis function. This scheme integrates multiple sensors such as current, stroke, and SF6, and uses the DS evidence theory of multi-sensor information fusion to make decisions and diagnose the equipment status. Although this technology effectively solves the problem of the accuracy of fault diagnosis of GIS equipment itself and realizes the comprehensive evaluation of equipment status, it focuses on the application layer fusion of sensor data and the health diagnosis of the equipment itself. It lacks targeted solutions for the communication link stability, network reliability and fault diagnosis of the underlying wireless sensor network in the strong interference environment, and is difficult to deal with the problem of wireless signal transmission interruption in the complex electromagnetic environment of GIS.
[0004] To address the challenges of wireless networking, existing technology CN101909345B discloses a multi-hop dynamic self-organizing network method for wide-area sensor networks. This method generates a multi-hop routing tree by broadcasting network-type data packets and utilizing a step-by-step scheduling algorithm and link quality estimation (based on packet reception rate (PRR) and received signal strength (RSS)). This achieves node self-organization and energy-efficient operation in wide-area environments. However, the GIS environment differs significantly from the wide-area spatial environment. The multipath effect within the metal cavity of a GIS is extremely severe. Simply relying on a linear combination of RSS and PRR often fails to accurately reflect link quality, easily leading to "false strong signals." Furthermore, this method lacks an intelligent mechanism for identifying interference types. When faced with sudden, strong electromagnetic interference unique to GIS, it cannot distinguish between communication congestion, environmental interference, or hardware failure, resulting in delayed or ineffective route switching and affecting the real-time uploading of monitoring data.
[0005] Furthermore, for anti-interference communication, existing technology CN121356715A discloses an intelligent spectrum prediction system and method with dynamic anti-interference function. This scheme uses a lightweight AI prediction model (such as GRU or LSTM) to predict the probability of spectrum holes and combines it with an interference identification and classification module to achieve adaptive modulation and channel decision-making. Although this technology introduces artificial intelligence for interference prediction and spectrum management, improving anti-interference capability, its model is relatively complex, requires high computing resources for embedded nodes, and mainly focuses on spectrum resource allocation and modulation mode switching, without fully considering the topological characteristics of GIS sensor networks. In GIS monitoring scenarios, node locations are relatively fixed, but link status fluctuates drastically with the environment. Existing spectrum prediction technologies are difficult to directly apply to dynamic route reorganization based on link feature classification, and lack a joint diagnostic mechanism that associates communication link characteristics with node hardware status.
[0006] In summary, existing technologies in the field of GIS wireless monitoring mainly suffer from insufficient anti-interference networking capabilities, a single dimension for link quality assessment, and an inability to effectively distinguish between environmental interference and hardware failures. Consequently, they struggle to achieve highly reliable data transmission and network self-healing while ensuring low power consumption. Summary of the Invention
[0007] The main objective of this invention is to provide a dynamic networking anti-interference diagnosis method for GIS sensors based on the K-nearest neighbor algorithm. This application solves the technical problems of existing GIS wireless sensor networks in strong electromagnetic interference and multipath effect environments, which are caused by inaccurate link quality assessment and lack of interference identification capabilities, resulting in low communication reliability, inability to effectively distinguish between environmental interference and node hardware failures, and poor routing self-healing capabilities.
[0008] To solve the above-mentioned technical problems, the technical solution adopted by this invention is: a method for anti-interference diagnosis of dynamic networking of GIS sensors based on the K-nearest neighbor algorithm, the method comprising:
[0009] S1. Within a time sliding window with a set size and sliding step, collect raw data of multidimensional communication link characteristics between target sensor nodes and neighboring nodes within the GIS monitoring area, and perform Z-score standardization on the raw data of multidimensional communication link characteristics to map the data to a unified dimension space.
[0010] S2. Call the dynamic anti-interference training sample library pre-stored in the local storage unit. The dynamic anti-interference training sample library contains historical feature vector data with clear environmental state labels and has an online incremental learning and forgetting mechanism.
[0011] S3. Input the standardized multidimensional communication link feature raw data into the improved K-nearest neighbor classification model. By introducing the weighted Euclidean distance calculation formula with time decay factor, calculate the weighted distance between the data and the samples in the dynamic anti-interference training sample library. Use an adaptive strategy based on sample density to select K nearest neighbor samples, and use Gaussian kernel function weighted voting to determine the current communication link state category.
[0012] S4. Based on the determined communication link status category, trigger targeted adaptive frequency selection networking strategies or hierarchical fault early warning operations to achieve anti-interference diagnosis and adaptive adaptation of GIS sensor dynamic networking.
[0013] In the preferred scheme, the raw data of multidimensional communication link characteristics collected in S1 includes parameters in 5 dimensions: mean value of physical layer received signal strength, variance of physical layer received signal strength, volatility of link quality indicator, bit error rate of forward error correction coding, and channel idle assessment count.
[0014] The specific steps of Z-score normalization in S1 are as follows: for each dimension of feature data... Using the formula Perform the conversion, where This represents the mean of this dimension's features within the sliding time window. The standard deviation of this dimension feature; if If the original data remains unchanged, the size of the time sliding window is set to 5 to 10 seconds, and the sliding step size is set to 2 to 3 seconds.
[0015] In the preferred scheme, the improved K-nearest neighbor classification model in S3 executes the following algorithm logic when calculating the weighted distance:
[0016] Define the current feature vector as The first in the sample library The feature vectors of each sample are ,sample The sampling timestamp is The current timestamp is ;
[0017] First, calculate the time decay weight. ,in This is the time decay coefficient, with a value ranging from 0.01 to 0.1;
[0018] Then, the weighted Euclidean distance with the time decay factor introduced was calculated. ,in For the first Weights of each feature dimension, ;
[0019] This weight The specific allocation is as follows: mean physical layer received signal strength 0.25, variance of physical layer received signal strength 0.2, link quality indicator volatility 0.2, forward error correction coding bit error rate 0.2, and channel idle assessment count 0.15.
[0020] In the preferred scheme, the selection number K of the K nearest neighbor samples in S3 adopts an adaptive selection strategy:
[0021] Calculate the preset radius of the circle centered on the sample to be classified in the sample space. Sample density within ,in This represents the number of samples within that radius.
[0022] Set high density threshold and low density threshold ;
[0023] when When, the formula for calculating the K value is: ,in ;
[0024] when When, the formula for calculating the K value is: ,in ;
[0025] when When, the formula for calculating the K value is: ,symbol This indicates rounding down to the nearest integer.
[0026] In the preferred scheme, the weighted voting based on the Gaussian kernel function in S3 includes:
[0027] For the selected K nearest neighbor samples, the Gaussian kernel function formula is used. Calculate the first Voting weights of the nearest neighbor samples ,in The Gaussian kernel bandwidth is set to 1.0.
[0028] Calculate the cumulative weight of each category ,in For the first The set of nearest neighbor samples for each category;
[0029] If the cumulative weight of a certain category If the weight is greater than 1.2 times the cumulative weight of all other categories, then the category is determined to be the final communication link status category; if it is not satisfied, then 5 more samples are added and the calculation is repeated until the threshold is met.
[0030] In the preferred scheme, the initial construction and maintenance mechanism of the dynamic anti-interference training sample library in S2 includes:
[0031] Initial construction process: Collect link feature data continuously for 7 to 15 days within the GIS monitoring area, process it through Z-score standardization, manually label the environmental status, and remove outliers that are more than 3 times the standard deviation.
[0032] The environmental status labels include five categories: label 1 for normal communication, label 2 for narrowband interference in a specific frequency band, label 3 for GIS switch action impact interference, label 4 for persistent multipath fading and blocking, and label 5 for RF front-end hardware saturation.
[0033] Online incremental learning and forgetting mechanism: When the category determined by S3 is confirmed to be correct, the new sample is stored in the database; when the storage reaches the limit, the sample is removed according to the timestamp and the frequency of citation; the frequency of citation refers to the number of times the sample is selected as the nearest neighbor sample in each determination, and this number is cleared and calibrated at the end of each month.
[0034] In the preferred scheme, the adaptive frequency selection networking strategy triggered in S4 includes:
[0035] When the communication link status category is determined to be label 2, i.e., narrowband interference in a specific frequency band, the current wireless channel number will be added to the temporary blacklist.
[0036] The radio frequency driver is invoked to switch to a backup channel not in the temporary blacklist according to a pseudo-random frequency hopping sequence.
[0037] The temporary blacklist has an automatic removal mechanism. If a channel that has been added to the list does not detect narrowband interference again within 5 consecutive time sliding windows, it will be automatically removed from the temporary blacklist.
[0038] In the preferred scheme, the operation to trigger a graded fault early warning in S4 includes:
[0039] If the communication link status category is determined to be tag 5, i.e., the RF front-end hardware is saturated, it is defined as a level 1 fault. A fault code is immediately generated and an emergency maintenance request is sent through the out-of-band low-power Bluetooth channel.
[0040] If the communication link status category is determined to be label 4, i.e., persistent multipath fading and blocking, it is defined as a level 2 fault. The backup routing table is activated, and the link recovery status is checked every 3 time sliding windows.
[0041] If the communication link status category is determined to be tag 3, i.e., GIS switch action impact interference, it is defined as a level 3 transient event. Only the log is recorded and no alarm is triggered. However, if the event is detected in 3 consecutive sliding windows, it is upgraded to a level 2 fault.
[0042] In the preferred scheme, the step of collecting raw data on multidimensional communication link characteristics involves modifications to the RPL routing protocol stack:
[0043] The last 8 bytes of the DAO and DIO control messages in the RPL protocol are selected as reserved fields;
[0044] The 5-dimensional feature data after Z-score standardization are written into the reserved field in the following order: mean physical layer received signal strength, variance physical layer received signal strength, link quality indicator volatility, forward error correction coding bit error rate, and channel idle assessment count.
[0045] Each feature dimension occupies 1.6 bytes, and the receiving node extracts feature data strictly according to this byte order when parsing the message.
[0046] In the preferred embodiment, the method runs on an embedded microcontroller, and the software deployment steps include:
[0047] Run the FreeRTOS real-time operating system on the embedded microcontroller and create a high-priority link diagnostic task;
[0048] The link diagnostic task subscribes to the wireless driver's receive interrupts via a message queue;
[0049] When the wireless driver triggers a receive interrupt, it packages the physical layer status word and writes it into the message queue, waking up the link diagnostic task to perform classification calculations in S3.
[0050] In the preferred embodiment, the hardware-level data acquisition steps include:
[0051] Access the RSSI and LQI registers of the wireless RF chip via the SPI bus, read them N times consecutively, and calculate the average value.
[0052] Access the automatic gain control (AGC) status register of the RF chip to obtain the front-end gain level, and combine it with the RSSI mean for analysis to distinguish between weak signal and increased noise floor.
[0053] In the preferred embodiment, the method also includes a containerized deployment step at the edge computing gateway:
[0054] Install the Docker container engine on the edge computing gateway and pull an image containing a Python environment and the Scikit-learn library;
[0055] A global monitoring model is run within a Docker container, and diagnostic results from each sensor node are received via the MQTT protocol.
[0056] The hyperparameters of the K-nearest neighbor algorithm are periodically optimized using global data, and the optimized hyperparameters are then sent to the sensor nodes via OTA (Over-The-Air) upgrade technology.
[0057] In the preferred embodiment, the data storage and visualization steps include:
[0058] Deploy an InfluxDB time-series database on the edge computing gateway to store link characteristic data with timestamps;
[0059] Grafana is used to connect to InfluxDB to render a network topology heatmap of the GIS monitoring area, displaying the dynamic trajectory of the disturbed area.
[0060] In the preferred scheme, to address the waveguide effect of the GIS metal cavity, the weighted distance calculation formula in S3 adds a sixth dimension, "multipath delay spread":
[0061] Multipath delay spread characteristics are obtained by measuring the full width at half maximum (FWHM) of the correlation peak of the preamble in the wireless signal;
[0062] When calculating the weighted distance, weights are assigned to the multipath delay spread feature. Meanwhile, the weights of the original 5-dimensional features are each reduced by 0.03 to keep the total weight sum at 1.
[0063] In the preferred scheme, the adaptive frequency selection networking strategy incorporates an energy balancing algorithm:
[0064] Let the remaining battery voltage of the candidate next-hop node be... The average remaining battery voltage of all network nodes is ;
[0065] when When, a correction factor is introduced. Adjust the voting weight of this node to ;
[0066] when When, correction factor The weights remain unchanged; network energy gaps are prevented by reducing the probability of low-power nodes being selected as relays.
[0067] This invention provides a method for anti-interference diagnosis of dynamic networking of GIS sensors based on the K-nearest neighbor algorithm, which has the following advantages compared with the prior art:
[0068] This invention introduces an improved K-Nearest Neighbors (KNN) algorithm for multi-dimensional feature classification of communication links, significantly improving the accuracy of link quality assessment. Unlike traditional assessment methods that rely solely on signal strength or packet loss rate, this method integrates multi-dimensional features such as signal-to-noise ratio, link quality indication, and energy state, and combines them with a historical interference sample database. This effectively identifies multipath reflections and sudden interference characteristics in GIS metal cavity environments, thereby intelligently avoiding inferior links and significantly improving the stability and reliability of data transmission.
[0069] This invention possesses intelligent fault diagnosis and differentiation capabilities, effectively reducing operation and maintenance costs. The system utilizes algorithms to classify link characteristics, accurately distinguishing between temporary blockages caused by environmental electromagnetic interference and permanent failures caused by sensor node hardware malfunctions or logical deadlocks. This differentiated diagnostic mechanism avoids false alarms caused by environmental interference while ensuring that genuine hardware faults are detected and located promptly, significantly improving the system's intelligent operation and maintenance level.
[0070] This invention achieves dynamic adaptive anti-interference networking, ensuring communication continuity. Based on an adaptive frequency selection networking strategy and energy balancing mechanism triggered by classification results, the sensor network can quickly switch channels or reorganize routing paths when encountering strong interference. Furthermore, the selection of relay nodes balances link quality and remaining energy, avoiding the generation of "energy holes" in the network. While ensuring anti-interference performance, it effectively extends the lifespan of the entire wireless sensor network. Attached Figure Description
[0071] The present invention will be further described below with reference to the accompanying drawings and embodiments:
[0072] Figure 1 This is the overall flowchart of the GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm of this invention;
[0073] Figure 2 This is a diagram of the hardware and software deployment and edge collaboration architecture of the system of this invention. Detailed Implementation
[0074] Example 1
[0075] like Figure 1-2 As shown, a method for anti-interference diagnosis of dynamic networking of GIS sensors based on the K-nearest neighbor algorithm is presented. The method includes:
[0076] S1. Within a time sliding window with a set size and sliding step, collect raw data of multidimensional communication link characteristics between target sensor nodes and neighboring nodes within the GIS monitoring area, and perform Z-score standardization on the raw data of multidimensional communication link characteristics to map the data to a unified dimension space.
[0077] S2. Call the dynamic anti-interference training sample library pre-stored in the local storage unit. The dynamic anti-interference training sample library contains historical feature vector data with clear environmental state labels and has an online incremental learning and forgetting mechanism.
[0078] S3. Input the standardized multidimensional communication link feature raw data into the improved K-nearest neighbor classification model. By introducing the weighted Euclidean distance calculation formula with time decay factor, calculate the weighted distance between the data and the samples in the dynamic anti-interference training sample library. Use an adaptive strategy based on sample density to select K nearest neighbor samples, and use Gaussian kernel function weighted voting to determine the current communication link state category.
[0079] S4. Based on the determined communication link status category, trigger targeted adaptive frequency selection networking strategies or hierarchical fault early warning operations to achieve anti-interference diagnosis and adaptive adaptation of GIS sensor dynamic networking.
[0080] This embodiment provides a dynamic networking anti-interference diagnosis method for GIS sensors based on the K-nearest neighbor algorithm. This method first introduces a time-sliding window mechanism during the data acquisition phase. Specifically, the system sets a fixed time length as the window size and an update frequency as the sliding step size. Within the GIS monitoring area, the target sensor node continuously listens for and records the communication interaction data between itself and its neighboring nodes. This data is not a single-dimensional numerical value, but rather constitutes multi-dimensional communication link characteristic raw data, including, but not limited to, physical layer and link layer parameters such as received signal strength indication, link quality indication, signal-to-noise ratio, and data packet arrival rate. To eliminate differences in the dimensions and orders of magnitude of different dimensional feature data and to avoid features with larger numerical values dominating subsequent distance calculations, this embodiment performs Z-score normalization on the acquired raw data. This process is calculated according to the following formula:
[0081]
[0082] In this formula, x' represents the standardized eigenvalue. Represents the original collected feature data. This represents the arithmetic mean of the feature data within the current time sliding window. This represents the standard deviation of the feature data within the current time sliding window. Through this process, feature data of all dimensions are mapped to a unified dimension space with a mean of 0 and a standard deviation of 1, thereby ensuring the fairness and accuracy of subsequent classification algorithms.
[0083] After data preprocessing, the system enters the sample retrieval and database maintenance phase. This method pre-configures a dynamic anti-interference training sample database in the local storage unit. Unlike traditional static databases, this sample database not only stores historical feature vector data but also associates each sample with a specific environmental state label, such as normal communication state, narrowband interference state, pulse interference state, or equipment failure state. To adapt to the complex and time-varying electromagnetic environment within GIS, the sample database employs an online incremental learning and forgetting mechanism. Online incremental learning refers to adding samples that are encountered and confirmed as new types of interference or typical states during actual operation to the database to enrich the training set. The forgetting mechanism involves the system periodically checking the timestamps or usage frequency of samples, removing outdated samples that have exceeded a set threshold or have not been matched for a long time from the database. This mechanism ensures that the sample database remains lightweight and reflects the latest environmental characteristics, preventing storage overflow and improving retrieval efficiency.
[0084] Subsequently, this embodiment employs an improved K-nearest neighbor classification model to accurately determine the link status. The standardized multidimensional communication link feature raw data is used as the input vector to be classified into the model. When calculating the distance between the vector to be classified and samples in the sample library, this method abandons the traditional Euclidean distance and instead adopts a weighted Euclidean distance calculation formula that incorporates a time decay factor. This improvement aims to assign higher weights to recent samples because in time-varying systems, historical data closer to the current moment has greater reference value. The specific distance calculation formula is as follows:
[0085] ;
[0086] In this formula, Represents the current feature vector to be classified. With the sample library Sample The weighted distance between them. Representing the current moment, Representative sample The difference between the two times the data was collected reflects the timeliness of the sample. This is the time decay coefficient, used to adjust the sensitivity of the time dimension to the influence of distance. The larger the value, the greater the penalty for old samples. The total number of dimensions representing the features. Indicates the first Each feature dimension. For the first The weight coefficients of each feature dimension are used to distinguish the contribution of different physical features to link status determination. and These represent the current vector and the sample vector at the [number]th [time]. The values in each dimension.
[0087] After calculating the distance, the algorithm needs to select K nearest neighbor samples. This embodiment does not use a fixed value for K, but rather an adaptive strategy based on sample density. The system calculates the sample distribution density around the sample to be classified, automatically increasing the K value in sparsely distributed areas to obtain sufficient reference information, and automatically decreasing the K value in densely distributed areas to reduce noise interference. After selecting K neighbors, the system performs weighted voting using a Gaussian kernel function, rather than a simple majority vote. The Gaussian kernel function maps distance to voting weights, with closer samples having greater weight. The Gaussian kernel weighting formula is as follows:
[0088] ;
[0089] In this formula, Representing the The voting weight of each nearest neighbor sample. The weighted distance is the one calculated above. This is the bandwidth parameter of the Gaussian kernel function, used to control the rate at which the weights decay with increasing distance. The system accumulates the weights of samples from each class. The category with the highest total weight is determined as the current state category of the communication link.
[0090] Finally, the system executes decision-making operations based on the determined communication link status category. If the determination result indicates that the current link is subject to interference at a specific frequency, the system will trigger an adaptive frequency selection networking strategy, automatically controlling the radio frequency module to switch to a preset backup clean channel and re-establishing the network topology connection, thereby avoiding the interference frequency band. If the determination result indicates that the link anomaly is caused by a hardware failure of the sensor node itself or a depleted battery, a tiered fault warning operation is triggered, sending a specific fault code to the control center to instruct maintenance personnel to perform repairs. Through the above steps, this method achieves closed-loop anti-interference diagnosis and self-healing of GIS sensor networks in complex electromagnetic environments.
[0091] The beneficial effects of this invention are as follows: First, by using Z-score standardization and multi-dimensional feature fusion, the differences between features of different dimensions are eliminated, making the link quality assessment more comprehensive and objective. Second, by introducing the distance formula of the time decay factor and the dynamic forgetting mechanism of the sample library, the algorithm becomes time-sensitive, adaptable to the rapidly changing electromagnetic environment inside GIS, and improves the response speed to sudden interference. Third, the adaptive K-value selection based on sample density and the Gaussian kernel weighted voting strategy effectively solves the classification bias problem caused by uneven sample distribution, significantly improving the accuracy of fault diagnosis and interference identification. Fourth, by directly linking the diagnostic results to the specific actions of frequency selection networking or fault early warning, an automated closed loop from perception, diagnosis to execution is realized, greatly improving the reliability and maintenance-free capability of the GIS wireless monitoring system, and solving the technical problem of frequent communication interruptions and difficulty in self-healing under strong interference environments in existing technologies.
[0092] Example 2
[0093] To further illustrate with reference to Example 1, the raw data of multidimensional communication link characteristics collected in S1 includes parameters in 5 dimensions: mean physical layer received signal strength, variance of physical layer received signal strength, link quality indicator volatility, forward error correction coding bit error rate, and channel idle assessment count.
[0094] The specific steps of Z-score normalization in S1 are as follows: for each dimension of feature data... Using the formula Perform the conversion, where This represents the mean of this dimension's features within the sliding time window. The standard deviation of this dimension feature; if If the original data remains unchanged, the size of the time sliding window is set to 5 to 10 seconds, and the sliding step size is set to 2 to 3 seconds.
[0095] In step S1 of this embodiment, in order to comprehensively and accurately characterize the communication link status in the complex electromagnetic environment of GIS, the system does not randomly select parameters when collecting raw data of multi-dimensional communication link characteristics. Instead, it specifically selects five highly complementary physical layer and link layer parameters. These five parameters include the mean physical layer received signal strength, the variance of the physical layer received signal strength, the volatility of the link quality indicator, the bit error rate of forward error correction coding, and the channel idle assessment count. Among them, the mean physical layer received signal strength is mainly used to characterize the basic signal energy level of the link, reflecting the distance attenuation between nodes; the variance of the physical layer received signal strength is a key indicator for measuring signal stability, and high variance usually means that there is a severe multipath effect or dynamic obstacle blockage; the volatility of the link quality indicator reflects the dynamic changes in the data demodulation success rate; the bit error rate of forward error correction coding is an indicator that intuitively reflects the impact of the current channel interference level on data bit transmission; the channel idle assessment count focuses on reflecting the degree of spectrum congestion, that is, the frequency of the channel being occupied by other signals per unit time, which is particularly crucial for identifying external co-channel interference.
[0096] For the multidimensional raw data collected above, due to significant differences in the physical dimensions, value ranges, and orders of magnitude of the parameters in each dimension—for example, received signal strength is typically a negative decibel-milliwatt value, while the bit error rate is a decimal between 0 and 1—if directly input into the K-nearest neighbor model for distance calculation, features with larger values will mask the effect of features with smaller values. Therefore, this embodiment introduces a Z-score normalization mechanism in step S1 to eliminate the influence of dimensions. This processing is applied to each dimension of feature data. All scores are converted using the following standard score calculation formula: ;
[0097] In this formula, x' represents the standardized feature data value. This represents the original feature data values collected. This represents the arithmetic mean of the feature data in this dimension within the currently defined time sliding window. This represents the standard deviation of the feature data for that dimension within the currently defined sliding time window. Through this formula, feature data for all dimensions are mapped to a standard normal distribution space with a mean of 0 and a standard deviation of 1, thus ensuring the fairness of the Euclidean distance calculation. Furthermore, considering that in some extremely stable ideal communication environments, feature data for a certain dimension may remain constant within the window period, leading to an increase in the standard deviation... If the standard deviation equals 0, the denominator of the above formula will be zero, leading to a calculation error. Therefore, this embodiment includes a protection logic: when the calculated standard deviation... At that time, simply keep the original data. Either leave it unchanged or map it to 0 to ensure the robustness of the algorithm.
[0098] To balance the responsiveness to environmental changes with the stability of statistical features, this embodiment finely divides the data acquisition timeline and employs a time-sliding window mechanism. The size of this sliding window is set to 5 to 10 seconds. This range is chosen because if the window is too short, insufficient data sample size leads to significant random errors in statistical features such as the mean and variance; if the window is too long, it smooths out short-term, sudden interference features, causing a lag in the system's perception of environmental changes. Simultaneously, the sliding step size is set to 2 to 3 seconds, meaning that every 2 to 3 seconds, the system updates the statistical features within the window using the latest acquired data. This overlapping sliding mechanism ensures the continuity of link status monitoring and avoids missing critical interference events due to window jumps.
[0099] The beneficial effects of this embodiment are as follows: First, by constructing a five-dimensional feature space including the mean and variance of received signal strength, bit error rate, and channel idle count, this method breaks through the limitations of traditional link evaluation that relies solely on a single signal strength index. It can holistically reconstruct the link status from multiple perspectives, such as signal energy, signal stability, demodulation quality, bit error rate, and spectrum occupancy, significantly improving the distinguishability of complex interference types within GIS, namely narrowband interference, impulse interference, and multipath fading. Second, the Z-score standardization effectively solves the problem of fusion of multi-source heterogeneous data, preventing classifier weight bias caused by different units of measurement and ensuring the accuracy of distance calculation in the K-nearest neighbor algorithm. Third, the specific time sliding window and step size settings find the best balance between statistical significance and system real-time performance, enabling the system to filter out occasional noise fluctuations while keenly capturing the starting point of environmental interference, providing high-quality data support for subsequent dynamic networking and anti-interference decisions.
[0100] In the preferred scheme, the improved K-nearest neighbor classification model in S3 executes the following algorithm logic when calculating the weighted distance:
[0101] Define the current feature vector as The first in the sample library The feature vectors of each sample are ,sample The sampling timestamp is The current timestamp is ;
[0102] First, calculate the time decay weight. ,in This is the time decay coefficient, with a value ranging from 0.01 to 0.1;
[0103] Then, the weighted Euclidean distance with the time decay factor introduced was calculated. ,in For the first Weights of each feature dimension, ;
[0104] This weight The specific allocation is as follows: mean physical layer received signal strength 0.25, variance of physical layer received signal strength 0.2, link quality indicator volatility 0.2, forward error correction coding bit error rate 0.2, and channel idle assessment count 0.15.
[0105] The following is a detailed explanation and description of the algorithm logic for calculating the weighted distance in the improved K-nearest neighbor classification model in step S3 above.
[0106] In step S3 of this embodiment, the improved K-nearest neighbor classification model determines the communication link status through specific algorithmic logic. This process deeply integrates time-dimensional features and multi-dimensional physical layer spatial features. To quantify the correlation between the current network status and historical experience data, the system first defines necessary mathematical variables. The currently collected and standardized communication link feature vector is set as... This vector represents the real-time communication status of sensor nodes within the GIS monitoring area at the current moment. Simultaneously, the first vector stored in the dynamic anti-interference training sample library is defined as... The feature vector of each sample is This historical sample records the specific environmental state characteristics at a particular moment in the past. To incorporate a time dimension, the system reads the samples separately. Sampling timestamp when collected and stored in the database And the current timestamp when the system executes the current diagnostic task. .
[0107] In the first stage of distance calculation, the algorithm introduces a time decay mechanism to address the problem that traditional static K-nearest neighbor algorithms cannot adapt to the time-varying electromagnetic environment within GIS. The system calculates the time decay weights based on an exponential decay model. The calculation formula is as follows:
[0108] ;
[0109] In this formula, the base of the exponential function This ensures that the weights exhibit a smooth, non-linear change trend as the time difference increases. Parameters Defined as the time decay coefficient, its value is strictly limited to between 0.01 and 0.1. This coefficient is a key hyperparameter used to adjust the model's rate of forgetting historical data. When When a larger value is taken, the time decay weight decreases rapidly with increasing time difference, meaning the model tends to rely more on recent sample data and is more sensitive to environmental changes; conversely, when... Taking a smaller value allows the model to retain historical memory over a longer period, which helps improve the robustness of judgments when the environment is relatively stable. Through this calculation, each historical sample obtains a time weight value that is positively correlated with its "freshness," thus establishing an evaluation criterion at the algorithm level that recent data is superior to older data.
[0110] In the second stage of distance calculation, the system performs a weighted Euclidean distance calculation incorporating a time decay factor. This step not only measures the geometric distance in the feature space but also integrates the weight of the time dimension, forming a comprehensive spatiotemporal fusion metric. The specific calculation formula is as follows:
[0111] ;
[0112] In this formula, This represents the combined distance metric after incorporating the effects of time. The part within the square root is the standard weighted Euclidean distance, where... Represents the current feature vector In the The values of each dimension Representing the historical sample vectors In the The values of each dimension. For the first The weight coefficients for each feature dimension are used to distinguish the differences in importance of different physical layer parameters in the interference recognition task. (Summarization symbol) This represents the summation of all five feature dimensions and the total weights of all features. It must be equal to 1 to ensure the normalization property of the distance calculation. This formula uses time decay weights. Multiplying by the spatial feature distance enables dynamic adjustment of the influence of historical samples, so that the role of a sample in classification decision depends not only on its feature similarity, but also on the time of its generation.
[0113] For feature weights The specific allocation in this embodiment is optimized based on the signal transmission characteristics in a GIS environment to maximize the accuracy of fault identification. The specific allocation scheme is as follows:
[0114] The average received signal strength at the physical layer is assigned a weight of 0.25. This parameter directly reflects the signal's energy level and is a fundamental indicator for judging link connectivity, therefore it is given the highest weight.
[0115] The variance of the received signal strength at the physical layer is assigned a weight of 0.2. This parameter characterizes the stability of the signal and plays an important indicative role in identifying multipath effects and dynamic obstacle occlusion within GIS metal cavities.
[0116] The link quality indicator volatility is assigned a weight of 0.2. This parameter reflects the stability of the data demodulation process and can effectively distinguish between steady-state interference and transient impulse interference.
[0117] The forward error correction coding bit error rate is assigned a weight of 0.2. This parameter is a direct indicator of data transmission reliability and is crucial for determining whether bit flips are caused by strong electromagnetic interference.
[0118] The channel idle assessment count is assigned a weight of 0.15. This parameter reflects the degree of congestion in the frequency band and is used to help identify the presence of co-channel interference sources.
[0119] By differentiating the weights mentioned above, the algorithm can focus on feature dimensions that are more discriminative of interference types, thereby achieving accurate clustering of link states in a multidimensional space.
[0120] The algorithm logic employed in this embodiment has significant beneficial effects. First, by introducing time decay weights based on an exponential function, the model possesses the ability to process time-varying data streams, automatically reducing the interference of outdated samples on current decisions and effectively solving the "concept drift" problem, enabling diagnostic results to keep pace with the dynamic changes in the electromagnetic environment within the GIS in real time. Second, by refining the weight allocation of the five-dimensional feature parameters, the algorithm overcomes the recognition accuracy bottleneck caused by the equal weighting of features in traditional methods, accurately capturing subtle feature differences of different types of interference (such as narrowband interference, impulse interference, and multipath fading), significantly improving the accuracy of fault classification. Finally, this distance calculation method has moderate computational complexity, making it suitable for real-time operation on embedded sensor nodes or edge gateways with limited computing resources, meeting the real-time requirements of online monitoring systems in industrial settings.
[0121] In the preferred scheme, the selection number K of the K nearest neighbor samples in S3 adopts an adaptive selection strategy:
[0122] Calculate the preset radius of the circle centered on the sample to be classified in the sample space. Sample density within ,in This represents the number of samples within that radius.
[0123] Set high density threshold and low density threshold ;
[0124] when When, the formula for calculating the K value is: ,in ;
[0125] when When, the formula for calculating the K value is: ,in ;
[0126] when When, the formula for calculating the K value is: ,symbol This indicates rounding down to the nearest integer.
[0127] The following is a detailed explanation and description of the adaptive selection strategy for the K nearest neighbor samples in step S3 above.
[0128] In step S3 of this embodiment, to address the problem that a fixed K value in the traditional K-nearest neighbor algorithm cannot adequately account for uneven sample distribution, the system employs an adaptive K-value selection strategy based on sample density. Before making a classification decision, the system first needs to evaluate the local data distribution of the sample to be classified in the feature space. Specifically, the system uses the current sample to be classified as the center and a pre-set K-value... A local hypersphere region is constructed with radius . Then, the system counts the number of training samples falling within this region, denoted as . Based on these statistical results, the system uses the following formula to calculate the sample density of the region. : ;
[0129] In this formula, Pi For the preset radius, This represents the total number of samples within that radius. This formula quantifies the density of feature distribution around the samples to be classified by calculating the number of samples per unit area or unit projected surface. This density value... This directly reflects the frequency of current communication link characteristics in the historical database. A higher density value means that the current state is a typical state that frequently appears in the historical sample library, while a lower density value means that the current state may be a sparse state with boundary conditions or an occasional anomalous state.
[0130] Based on the calculated sample density In this embodiment, two key density thresholds are set to divide different decision intervals, namely the high density threshold. and low density threshold In this embodiment, The value is set to 0.8 to define the central region where the sample distribution is extremely dense; The value is set to 0.3 to define sparse sample distribution edge regions or noisy regions. The system determines the sample density based on the sample density. Based on the comparison results with these two thresholds, the selection of the K value is dynamically adjusted in three cases to optimize the classifier performance. The system presets a maximum upper limit for the K value. The value is 30, and the minimum lower limit is... It is 5.
[0131] In the first case, when the calculated sample density Greater than the high density threshold When the value of K is 0, it indicates that the sample to be classified is located in the core cluster region of each category, surrounded by a large number of densely packed samples of the same category. In this case, to obtain a smoother classification boundary and effectively suppress the interference of individual noise points, the system tends to choose a larger K value. The formula for calculating the K value is as follows:
[0132] ;
[0133] This formula shows that the K value is directly proportional to the density; the higher the density, the larger the K value, but it is affected by... The limitations of this approach are mitigated by a larger K value. A larger K value allows for full utilization of the rich surrounding sample information, eliminating the impact of minor local fluctuations on classification results through a broad voting mechanism, thereby improving robustness in identifying typical link states.
[0134] In the second case, when the calculated sample density Less than the low density threshold When the value of K is too low, it indicates that the sample to be classified is located in a sparse region of the feature space, possibly on the edge of the class or an outlier. In this case, blindly using a large K value can easily introduce distant outlier samples into the voting process, leading to classification errors. Therefore, the system should choose a smaller K value, focusing only on the few closest samples. The formula for calculating the K value is as follows:
[0135] ;
[0136] This formula ensures that the K value remains constant in low-density regions. The value of K increases slightly with a small increase in density. A small K value ensures that the classifier is highly sensitive to local features, accurately capturing fault signals or interference patterns that are not obvious but have unique characteristics, preventing them from being overwhelmed by the majority class samples.
[0137] In the third case, when the calculated sample density Between low density threshold and high density threshold When the K-value is between a sparse and a dense region, it indicates that the sample to be classified is in a transitional region of the feature space. To ensure a smooth transition of the K-value from the sparse to the dense region and to avoid instability in the classification results due to abrupt changes in the K-value, the system uses linear interpolation to calculate the K-value. The specific calculation formula is as follows:
[0138] ;
[0139] In this formula, the symbol This indicates a floor function, rounding down to the nearest integer, i.e., rounding to the nearest integer not exceeding the value within the parentheses. This formula is based on density. The relative position within the threshold interval, and The K value is dynamically adjusted to balance local sensitivity and noise resistance, thus achieving a dynamic balance in the classification strategy.
[0140] This embodiment employs the aforementioned adaptive K-value selection strategy, which yields significant benefits. First, this method effectively addresses the uneven sample distribution problem caused by the randomness of environmental interference in GIS sensor networks. In GIS monitoring data, data under normal conditions are often highly clustered, while data from specific faults or interference may be sparse. By adaptively adjusting the K-value, this method utilizes a large K-value in dense areas to improve the model's tolerance to noise, and a small K-value in sparse areas to improve the recall rate of abnormal states, significantly reducing false alarm and false negative rates. Second, the piecewise function calculation method retains the specificity of processing different density regions while ensuring the continuity of model parameters through the design of the transition region, avoiding system oscillations caused by parameter mutations and improving the overall stability of the diagnostic system. Finally, the algorithm has clear logic, controllable computational overhead, and is suitable for efficient operation on embedded devices, meeting the real-time requirements of GIS online monitoring.
[0141] In the preferred scheme, the weighted voting based on the Gaussian kernel function in S3 includes:
[0142] For the selected K nearest neighbor samples, the Gaussian kernel function formula is used. Calculate the first Voting weights of the nearest neighbor samples ,in The Gaussian kernel bandwidth is set to 1.0.
[0143] Calculate the cumulative weight of each category ,in For the first The set of nearest neighbor samples for each category;
[0144] If the cumulative weight of a certain category If the weight is greater than 1.2 times the cumulative weight of all other categories, then the category is determined to be the final communication link status category; if it is not satisfied, then 5 more samples are added and the calculation is repeated until the threshold is met.
[0145] The following is a detailed explanation and description of the weighted voting strategy based on the Gaussian kernel function in step S3 above.
[0146] In step S3 of this embodiment, after selecting K nearest neighbor samples, to overcome the shortcomings of traditional majority voting methods that only consider the number of samples and ignore the differences in sample similarity, the system introduces a weighted voting mechanism based on the Gaussian kernel function. For each selected nearest neighbor sample, the system uses the Gaussian kernel function to convert its distance with the sample to be classified into a voting weight. The specific calculation formula is as follows:
[0147] ;
[0148] In this formula, Representing the The contribution weight of the nearest neighbor sample to the classification result, with a value ranging from 0 to 1. This is the weighted Euclidean distance calculated in the previous steps, incorporating a time decay factor. The smaller the value of this distance, the higher the similarity between the samples. The bandwidth parameter of the Gaussian kernel function is fixed at 1.0 in this embodiment. This parameter determines the rate at which the weights decay with increasing distance. Setting it to 1.0 aims to provide a standardized radial baseline for the normalized feature space, so that samples that are closer in distance can obtain high weights close to 1, while the weights of samples that are farther away decay exponentially and rapidly. This ensures that the classification results are mainly determined by highly similar samples and suppresses the interference of marginal samples.
[0149] After calculating the independent weights of all K nearest neighbor samples, the system performs a statistical operation to accumulate the category weights. For each existing communication link state category, the system iterates through the K nearest neighbor samples and sums the weights of all samples belonging to that category. The calculation formula is as follows:
[0150] ;
[0151] In this formula, Indicates the first The total score of each category, i.e., the cumulative weight. This represents belonging to the k nearest neighbor sample. The system uses a cumulative calculation to transform discrete, quantity-based voting into continuous intensity scoring based on similarity. This provides a more nuanced reflection of the confidence level that a sample belongs to a specific state. For example, even if a category has a smaller number of samples, if these samples are very close to the data to be classified, their cumulative weight may still exceed that of another category, which has a larger number of samples but is farther away, thus making a judgment that is more consistent with physical reality.
[0152] To further enhance the robustness of fault diagnosis and anti-interference decision-making and prevent misjudgments when classification boundaries are ambiguous, this embodiment sets a strict classification decision threshold mechanism. When determining the final communication link state category, the system requires that the advantages of the candidate categories be significant. The specific decision logic is as follows: if a certain category... Cumulative weight If the weight is greater than 1.2 times the sum of the weights of all other categories, the system considers the classification result to have sufficient confidence and directly determines this category as the final communication link state category. This 1.2-fold coefficient is a safety margin factor used to ensure that the dominant category has an absolute advantage in weight.
[0153] If the current calculation result fails to meet the above threshold condition, i.e., the maximum cumulative weight does not reach 1.2 times the sum of the weights of other categories, it indicates that the current distribution of the K samples is relatively messy or at the decision boundary, and the classification confidence is insufficient. In this case, the system will not force a result output, but will trigger an iterative optimization mechanism. The system will automatically increase the number of nearest neighbor samples, specifically by adding 5 samples to the current K value. Then, based on the updated sample set, the above weight calculation and threshold determination process is re-executed. This iterative process continues until the threshold condition is met or the preset iteration limit is reached.
[0154] This embodiment employs the aforementioned weighted voting and dynamic iteration strategy, which yields significant benefits. First, by utilizing a Gaussian kernel function for nonlinear mapping, linear distances in Euclidean space are transformed into nonlinear similarity probabilities, better aligning with the nonlinear attenuation characteristics of signal features in complex electromagnetic environments like GIS, effectively enhancing the identification of atypical fault samples. Second, setting a relative threshold of 1.2 times constructs a "firewall" for anti-interference diagnosis, significantly reducing the false positive rate caused by noise fluctuations and ensuring that subsequent network switching or alarm operations are only triggered when features are highly consistent. Finally, the iterative mechanism of dynamically increasing samples provides the algorithm with adaptive resolution in ambiguous regions. By expanding the search range and introducing more contextual information, it solves the problem of fixed K values easily getting trapped in local optima at complex boundaries, ensuring the high reliability and stability of the diagnostic system under various extreme conditions.
[0155] In the preferred scheme, the initial construction and maintenance mechanism of the dynamic anti-interference training sample library in S2 includes:
[0156] Initial construction process: Collect link feature data continuously for 7 to 15 days within the GIS monitoring area, process it through Z-score standardization, manually label the environmental status, and remove outliers that are more than 3 times the standard deviation.
[0157] The environmental status labels include five categories: label 1 for normal communication, label 2 for narrowband interference in a specific frequency band, label 3 for GIS switch action impact interference, label 4 for persistent multipath fading and blocking, and label 5 for RF front-end hardware saturation.
[0158] Online incremental learning and forgetting mechanism: When the category determined by S3 is confirmed to be correct, the new sample is stored in the database; when the storage reaches the limit, the sample is removed according to the timestamp and the frequency of citation; the frequency of citation refers to the number of times the sample is selected as the nearest neighbor sample in each determination, and this number is cleared and calibrated at the end of each month.
[0159] The following is a detailed explanation and description of the mechanism for the initial construction and maintenance of the dynamic anti-interference training sample library in step S2 of the above technical solution.
[0160] In step S2 of this embodiment, the initial construction of the dynamic anti-interference training sample library is a crucial step in ensuring the cold start performance of the algorithm. To ensure that the sample library can cover the complete cyclical characteristics of GIS equipment operation, the system executes a strict data acquisition process during the initial construction phase. Specifically, the system continuously collects link feature data within the GIS monitoring area, with the collection duration strictly set to 7 to 15 days. The technical basis for this duration setting is that the operation of substation GIS equipment typically exhibits significant weekly cyclical characteristics, including weekday load fluctuations and routine weekend inspections. The minimum collection duration of 7 days ensures that the samples cover a complete weekly operating cycle, preventing model overfitting due to sample incompleteness; while the upper limit of 15 days is to obtain sufficient redundant data while avoiding excessively large initial sample sizes that could lead to embedded device storage overflow or excessively long training times. The collected data is first processed by Z-score standardization, followed by outlier removal. The system adopts the Laida criterion, i.e., the 3-standard deviation criterion, and discards data that deviates from the mean by more than 3 standard deviations as gross errors or measurement noise. This step effectively removes outliers caused by momentary sensor malfunctions or extremely abnormal electromagnetic pulses, ensuring the purity and statistical representativeness of the initial training sample set.
[0161] After sample cleaning, the data needs to be manually labeled to give it physical meaning. The environmental status labeling system constructed in this embodiment includes five mutually exclusive categories, covering the most typical communication states in a GIS environment. Label 1 is defined as normal communication, corresponding to a stable electromagnetic environment and good link quality; Label 2 is defined as narrowband interference in a specific frequency band, corresponding to persistent external co-channel or adjacent-channel interference sources, such as walkie-talkie signal crosstalk; Label 3 is defined as GIS switch action impulse interference, corresponding to ultra-fast transient overvoltage electromagnetic radiation generated when disconnecting switches or circuit breakers operate, manifested as broadband pulses over a very short time; Label 4 is defined as persistent multipath fading blocking, corresponding to deep fading caused by multiple reflections and superpositions of signals inside the GIS metal cavity, manifested as a long-term low signal-to-noise ratio; Label 5 is defined as RF front-end hardware saturation, corresponding to excessively high received signal strength causing the low-noise amplifier to enter the nonlinear region. At this time, although the signal strength is high, the bit error rate is extremely high, which is a hardware adaptability problem rather than external interference. Through these five refined labels, the system can not only determine the quality of communication but also identify the root cause of the problem, providing a basis for subsequent targeted decision-making.
[0162] To address the evolution of the electromagnetic environment throughout the GIS lifecycle, the sample library incorporates an online incremental learning and forgetting mechanism. For online incremental learning, not all real-time data is added to the library. The system only stores samples that have been classified in step S3 and confirmed through subsequent data transmission (such as ACK confirmation or manual verification) as new samples. This verification mechanism prevents misclassified samples from contaminating the sample library, ensuring the cumulative effect of model accuracy. When the local storage unit reaches its preset capacity limit, the system triggers the forgetting mechanism, removing the lowest-value samples to free up space. The removal strategy is based on two dimensions: the sample's timestamp and its citation frequency. The system prioritizes removing samples with the oldest timestamp and the lowest citation frequency. Citation frequency refers to the cumulative number of times the sample has been selected as the nearest neighbor in all K-nearest neighbor determinations, reflecting the sample's representativeness to the current environment.
[0163] It is particularly important to note that, to prevent historically common but no longer prevalent "outdated typical samples" from occupying high citation frequencies and remaining unremoved, the system introduces a periodic frequency reset calibration mechanism. Specifically, at the end of each month, the citation frequency counters for all samples in the database are reset to zero. This mechanism forces the weighting evaluation system of the sample database to be updated over a rolling time window, ensuring that high-frequency samples are always active samples reflecting recent (within the past month) environmental characteristics, rather than historically accumulated outdated samples. This gives the system the ability to adaptively track long-term, slow environmental changes.
[0164] The beneficial effects of this embodiment are as follows: First, the full-cycle data collection of 7 to 15 days, combined with a cleaning strategy of 3 times the standard deviation, constructs an initial knowledge base that is both statistically representative and highly pure, solving the problem of misjudgment caused by data bias in the cold start phase. Second, the five-category labeling system not only covers environmental interference but also incorporates multipath effects and hardware saturation, making the diagnostic results physically interpretable and supporting subsequent differentiated anti-interference strategies (such as reducing transmission power for hardware saturation and frequency hopping for narrowband interference). Third, the forgetting mechanism based on monthly clearing of citation frequency cleverly solves the contradiction between "catastrophic forgetting" and "historical inertia" in incremental learning, enabling embedded devices to maintain a dynamic sample library that is highly matched to the current electromagnetic environment under limited storage resources, ensuring the continuous robustness of the GIS monitoring system over a long operating cycle of several years.
[0165] In the preferred scheme, the adaptive frequency selection networking strategy triggered in S4 includes:
[0166] When the communication link status category is determined to be label 2, i.e., narrowband interference in a specific frequency band, the current wireless channel number will be added to the temporary blacklist.
[0167] The radio frequency driver is invoked to switch to a backup channel not in the temporary blacklist according to a pseudo-random frequency hopping sequence.
[0168] The temporary blacklist has an automatic removal mechanism. If a channel that has been added to the list does not detect narrowband interference again within 5 consecutive time sliding windows, it will be automatically removed from the temporary blacklist.
[0169] The following is a detailed explanation and description of triggering the adaptive frequency selection networking strategy in step S4 above.
[0170] In step S4 of this embodiment, the system designs an adaptive frequency selection networking strategy that combines immediate response and dynamic recovery for communication link states classified as label 2, i.e., specific frequency band narrowband interference. When the classification result of step S3 confirms that the current link is subjected to continuous narrowband interference in a specific frequency band, the system determines that the current working channel no longer has the reliability to continue communication and then triggers the channel locking mechanism. The embedded microcontroller first reads the wireless channel number currently being used by the wireless radio frequency module and writes the number into a temporary blacklist data structure located in the random access memory. The purpose of this temporary blacklist is to logically isolate the interfered frequency band. In the subsequent route discovery and next-hop node selection process, any channel in the blacklist will be marked as unavailable, thereby preventing the network protocol stack from blindly attempting to retransmit data on the interfered channel and avoiding the ineffective consumption of node energy due to continuous packet loss.
[0171] After channel locking is completed, the system immediately initiates a channel switching procedure to restore network connectivity. The system calls the underlying RF driver interface to perform the frequency switching operation. To avoid co-channel interference with neighboring networks and improve anti-interception capabilities, the backup channel is not selected sequentially, but rather generated based on a preset pseudo-random frequency hopping sequence. Before performing the frequency hopping action, the algorithm module performs a crucial filtering operation, comparing the candidate target channels generated by the pseudo-random sequence with records in a temporary blacklist. If a candidate channel exists in the blacklist, the system automatically skips that channel and generates the next candidate channel in the sequence until a clean channel not listed in the blacklist is found. After confirming the target channel, the system writes new frequency division coefficients to the frequency synthesizer of the RF chip, switches the carrier frequency to the backup channel, and sends a synchronization beacon frame to notify neighboring nodes to complete the synchronization switch.
[0172] To maximize the use of limited spectrum resources and prevent the permanent blocking of all channels due to transient interference, the temporary blacklist in this embodiment has an automatic removal and resource recovery mechanism. This mechanism is based on a time-sliding window counter. For each channel added to the blacklist, the system continuously monitors it with a low duty cycle in the background using idle time slots. If, within five consecutive time-sliding window monitoring periods, the characteristic data on the channel no longer contains narrowband interference characteristics as determined by step S3, it is determined that the interference source has disappeared or the frequency has drifted. At this time, the system triggers an unlock command, automatically removing the channel number from the temporary blacklist and returning it to the available channel resource pool. Setting five consecutive time-sliding windows as the recovery threshold is to strike a balance between the certainty of confirming the disappearance of interference and the timeliness of resource recovery, preventing repeated channel oscillations caused by intermittent interruptions of the interference source.
[0173] This embodiment employs the aforementioned adaptive frequency selection and blacklist maintenance strategy, which yields significant benefits. First, by combining the blacklist mechanism with pseudo-random frequency hopping, the system can avoid common fixed-frequency interference devices in industrial environments within milliseconds, significantly improving the survivability of the communication link. Second, the automatic clearing mechanism based on a continuous sliding window endows the system with self-repair and dynamic resource management capabilities, avoiding the waste of spectrum resources and ensuring that the GIS wireless sensor network can always maintain optimal channel combinations during long-term operation, adapting to dynamic changes in the electromagnetic environment without manual intervention.
[0174] In the preferred scheme, the operation to trigger a graded fault early warning in S4 includes:
[0175] If the communication link status category is determined to be tag 5, i.e., the RF front-end hardware is saturated, it is defined as a level 1 fault. A fault code is immediately generated and an emergency maintenance request is sent through the out-of-band low-power Bluetooth channel.
[0176] If the communication link status category is determined to be label 4, i.e., persistent multipath fading and blocking, it is defined as a level 2 fault. The backup routing table is activated, and the link recovery status is checked every 3 time sliding windows.
[0177] If the communication link status category is determined to be tag 3, i.e., GIS switch action impact interference, it is defined as a level 3 transient event. Only the log is recorded and no alarm is triggered. However, if the event is detected in 3 consecutive sliding windows, it is upgraded to a level 2 fault.
[0178] The following is a detailed explanation and description of the graded fault warning operation in step S4 above. In step S4 of this embodiment, the system does not stop at simple state classification, but constructs a refined graded response mechanism based on the classification results to deal with communication anomalies of different natures within the GIS. The system classifies the abnormal situation into three levels according to the determined communication link state category, and executes targeted processing strategies for each level.
[0179] First, for cases identified as tag 5, indicating RF front-end hardware saturation, the system defines it as a Level 1 fault, the highest priority critical fault state. This situation typically occurs when the sensor node is too close to a high-voltage discharge point, or when the strength of external interference signals exceeds the linear operating range of the RF chip's low-noise amplifier, causing the receiver circuit to enter a saturation cutoff state. At this point, the main communication link is completely paralyzed, and no valid data can be demodulated. To ensure that alarm information can still be issued in the event of a main link failure, this embodiment employs out-of-band communication technology. The system immediately generates a Level 1 fault code containing the fault type and node ID, and activates the pre-set Bluetooth Low Energy communication module. Utilizing the differences in frequency band and modulation method between Bluetooth Low Energy and the main communication frequency band (such as ZigBee or Wi-Fi), an independent emergency data channel is established to send an emergency maintenance request to the handheld terminal or aggregation gateway. This design ensures that critical hardware failure information can still be transmitted even in extreme electromagnetic environments, avoiding the creation of monitoring blind spots.
[0180] Secondly, for cases identified as Label 4, i.e., persistent multipath fading and blocking, the system defines it as a Level 2 fault, which is a functional impairment at the network topology level. Signal reflection within the GIS metal casing may cause deep signal nulls at certain locations, or equipment movement may obstruct the original line-of-sight path. For such issues, the system focuses on the network's self-healing capabilities. Once the fault is confirmed, the system immediately activates the backup routing table, abandoning the currently unavailable next-hop node and switching to an alternative path with suboptimal link quality but guaranteed connectivity for data transmission. To conserve system resources and monitor environmental recovery, the system employs a periodic detection mechanism: every three time sliding windows, the system briefly attempts to probe the connectivity of the original faulty link. If the multipath fading disappears, the system evaluates whether to switch back to the original path, thereby achieving dynamic optimization and self-repair of the network topology.
[0181] Finally, for cases identified as Tag 3, i.e., GIS switch-on impulse interference, the system defines it as a Level 3 transient event. This is an electromagnetic phenomenon accompanying normal operation of GIS equipment, rather than a fault in the communication system itself. To reduce false alarm rates and alleviate the burden on maintenance personnel, the system adopts a silent logging strategy for such events, recording only the timestamp and characteristic values of the event in local non-volatile memory without triggering audible and visual alarms or remote push notifications. However, considering that if the "transient" interference persists, it may indicate a persistent partial discharge or arcing phenomenon within the GIS, rather than a normal switch-on action, the system introduces an event escalation mechanism. If the impulse interference characteristic is detected within three consecutive time sliding windows, the system logic determines that the interference no longer has transient characteristics, escalates its nature to a Level 2 fault, and initiates backup routing or reports an anomaly according to the Level 2 fault handling procedure, thereby achieving in-depth mining and intelligent assessment of potential hidden dangers.
[0182] This embodiment employs the aforementioned tiered fault early warning operation, which yields significant beneficial effects. First, by introducing an out-of-band low-power Bluetooth channel to handle Level 1 faults, the "silent" problem when the main channel is suppressed by strong interference or hardware saturation is resolved, significantly improving the system's survivability and alarm reliability. Second, the mechanism combining a backup routing table with periodic recovery detection endows the sensor network with strong self-healing capabilities, enabling it to maintain data transmission continuity in the complex dynamic topology of GIS and cope with multipath effects without manual intervention. Third, the transient event filtering and escalation mechanism cleverly balances the contradiction between missed and false alarms, filtering out a large number of meaningless switching interferences, avoiding "crying wolf" alarm fatigue, while ensuring timely detection of continuous abnormal discharges, greatly improving the operational efficiency and diagnostic accuracy of the GIS intelligent monitoring system.
[0183] In the preferred scheme, the step of collecting raw data on multidimensional communication link characteristics involves modifications to the RPL routing protocol stack:
[0184] The last 8 bytes of the DAO and DIO control messages in the RPL protocol are selected as reserved fields;
[0185] The 5-dimensional feature data after Z-score standardization are written into the reserved field in the following order: mean physical layer received signal strength, variance physical layer received signal strength, link quality indicator volatility, forward error correction coding bit error rate, and channel idle assessment count.
[0186] Each feature dimension occupies 1.6 bytes, and the receiving node extracts feature data strictly according to this byte order when parsing the message.
[0187] The following is a detailed explanation and description of the steps involved in modifying the RPL routing protocol stack and data encapsulation in this technical solution.
[0188] In this embodiment, to achieve real-time transmission of multi-dimensional communication link feature data without increasing additional network load, the system makes targeted low-level modifications to the standard IPv6 Low Power Lossy Network Routing Protocol (RPL) stack. Specifically, this method does not define new application layer messages, but cleverly utilizes the DAO (Target Advertisement Object) and DIO (DoDAG Message Object) control messages in the RPL protocol used to maintain network topology. When constructing these two control messages, the system locks the reserved space or padding area at the end of the message, forcibly designating the last 8 bytes (64 bits in total) as a dedicated feature carrier field. DAO and DIO messages are chosen as carriers because these two messages are used throughout the entire process of wireless sensor network deployment and maintenance, ensuring that the update frequency of feature data is synchronized with the refresh frequency of the routing topology, thereby guaranteeing the timeliness of anti-interference diagnosis.
[0189] For the 8-byte reserved field, the system performs strict data encapsulation operations. Since the data after Z-score normalization in step S1 is a floating-point number with five features, standard 32-bit floating-point storage would consume 20 bytes, far exceeding the message load margin. Therefore, this embodiment employs a high-compression bit-level mapping mechanism. The system evenly distributes the total 8 bytes (64 bits) of space across the five feature dimensions, meaning each feature dimension occupies 1.6 bytes (approximately 12.8 bits). In the actual encoding process, the system first maps the normalized floating-point values to fixed-point integers and then compresses them into their respective allocated bit segments using a quantization algorithm. The data writing order strictly follows a preset protocol, in the following order: mean physical layer received signal strength, variance of physical layer received signal strength, link quality indicator volatility, forward error correction coding bit error rate, and channel idle assessment count. This compact, continuous bit stream arrangement maximizes space utilization, ensuring that all key features can be transmitted within a single control message cycle.
[0190] At the receiving node, the system deploys a corresponding reverse parsing program. When the receiving node captures a DAO or DIO message, it doesn't stop at standard routing information processing. Instead, through bitmasking and bit shifting operations, it strictly separates five binary fragments of feature data from the last eight bytes of the message in the aforementioned order. Subsequently, the system performs dequantization calculations according to the quantization rules of the sending end, restoring these binary fragments to standardized feature values for use in the subsequent K-nearest neighbor classification model. This process requires strict consistency in byte order and bit definitions between the receiving and sending ends; any misalignment will result in complete distortion of the feature data.
[0191] This embodiment employs the aforementioned RPL protocol modification and data encapsulation strategy, which yields significant benefits. First, it achieves "zero-overhead" data acquisition by leveraging existing routing control messages, avoiding the additional energy consumption and channel occupancy associated with sending independent probe messages—crucial for battery-powered GIS sensor nodes. Second, the 1.6-byte high compression ratio encoding mechanism cleverly resolves the conflict between massive feature data and limited wireless bandwidth, reducing the transmission load by over 60% while maintaining feature data accuracy, significantly improving the network's effective throughput. Third, binding link feature data with routing control information ensures strong coupling between network topology updates and link status diagnosis, guaranteeing that subsequent anti-interference routing decisions are always based on the latest link status, avoiding routing oscillations or suboptimal path selections caused by information asynchrony.
[0192] In the preferred embodiment, the method runs on an embedded microcontroller, and the software deployment steps include:
[0193] Run the FreeRTOS real-time operating system on the embedded microcontroller and create a high-priority link diagnostic task;
[0194] The link diagnostic task subscribes to the wireless driver's receive interrupts via a message queue;
[0195] When the wireless driver triggers a receive interrupt, it packages the physical layer status word and writes it into the message queue, waking up the link diagnostic task to perform classification calculations in S3.
[0196] The following is a detailed explanation and description of the embedded software deployment steps in this technical solution.
[0197] In this embodiment, to ensure the efficient operation of the K-nearest neighbor-based anti-interference diagnostic method on embedded nodes with limited computing resources, the system employs a task scheduling architecture based on the FreeRTOS real-time operating system. The first step in software deployment is to port and run the FreeRTOS real-time operating system on the embedded microcontroller. Leveraging the preemptive scheduling feature of this operating system, a core task specifically designed for link quality analysis and status determination is created, named the link diagnosis task. To ensure the system can react quickly in strong interference environments, this link diagnosis task is assigned an extremely high execution priority, higher than that of regular data acquisition, route maintenance, and human-computer interaction tasks. This priority configuration ensures that when a wireless signal arrives or the communication link status changes abruptly, the microcontroller can immediately suspend processing other low-priority tasks and prioritize CPU resources to execute the anti-interference diagnostic logic, thereby controlling the diagnostic latency to the millisecond level.
[0198] To resolve the conflict between the high-frequency triggering of underlying hardware interrupts and the time-consuming computation of complex upper-layer algorithms, this embodiment designs an asynchronous decoupling mechanism based on a message queue. During the initialization phase, the link diagnostic task establishes and subscribes to a dedicated message queue through the API interface provided by the operating system. This message queue acts as a buffer bridge between the underlying driver and the upper-layer application. At the system's underlying level, the wireless driver is configured in interrupt-driven mode. When the wireless RF chip detects an over-the-air signal and completes physical layer decoding, it triggers a hardware receive interrupt.
[0199] In the receive interrupt service routine of the wireless driver, the system does not directly perform complex K-nearest neighbor classification calculations to avoid prolonged occupation of the interrupt context, which could lead to system crashes or data loss. Instead, the interrupt service routine performs only a lightweight data retrieval operation, quickly reading the physical layer status word from the RF chip registers. This physical layer status word is a compact data structure that encapsulates key physical layer indicators of the currently received data packet, including received signal strength indication, link quality indication, preamble correlation value, and automatic gain control gain level. Once the retrieval is complete, the interrupt service routine packages the physical layer status word and writes it to the aforementioned message queue in a non-blocking manner.
[0200] When a new physical layer status word is written to the message queue, the operating system kernel detects this event and immediately wakes up the link diagnostic task that is in a blocked waiting state. At this time, the link diagnostic task switches from the blocked state to the ready state and is immediately put into operation by the scheduler due to its high priority. The woken-up task retrieves the physical layer status word from the queue, parses it back to the original data of the multi-dimensional communication link features described in step S1, and then calls the improved K-nearest neighbor classification model in step S3 to perform real-time distance calculation and status classification.
[0201] The software deployment strategy employed in this embodiment offers significant advantages. First, the use of an interrupt-triggered bottom-half processing mechanism—a "interrupt triggering + task processing" model—effectively decouples hardware event response from complex algorithm processing. This ensures that the underlying interrupts can respond quickly to each wireless data packet, preventing packet loss due to slow processing, while also guaranteeing that the K-nearest neighbor algorithm has a full CPU time slice for accurate calculation. Second, the high-priority preemptive scheduling based on FreeRTOS ensures real-time anti-interference diagnostics, enabling the system to trigger frequency selection or alarm operations with deterministic low latency when encountering sudden interference, meeting the stringent timeliness requirements of industrial-grade control systems. Third, the message queue mechanism provides a natural buffering function, smoothly handling peak loads when sudden high-volume data arrives, enhancing the robustness and stability of the embedded system under extreme conditions.
[0202] In the preferred embodiment, the hardware-level data acquisition steps include:
[0203] Access the RSSI and LQI registers of the wireless RF chip via the SPI bus, read them N times consecutively, and calculate the average value.
[0204] Access the automatic gain control (AGC) status register of the RF chip to obtain the front-end gain level, and combine it with the RSSI mean for analysis to distinguish between weak signal and increased noise floor.
[0205] The following is a detailed explanation and description of the hardware-level data acquisition steps in the above technical solution.
[0206] In this embodiment, hardware-level data acquisition is the cornerstone of the entire anti-interference diagnostic system, and its execution relies on high-speed serial communication between the embedded microcontroller and the wireless RF transceiver chip. To obtain physical layer parameters that accurately reflect the current electromagnetic environment, the system does not rely on snapshot data at a single moment but employs a statistically based continuous sampling mechanism. Specifically, the embedded microcontroller, acting as the master device, initiates register access requests to the wireless RF chip, acting as the slave device, via the Serial Peripheral Interface (SPI) bus. The system locks the Received Signal Strength Indicator (RSSI) register and Link Quality Indicator (LQI) register mapped internally by the RF chip. These two registers store the power level of the currently received RF signal and the estimated symbol error rate during demodulation, respectively. To eliminate the instantaneous numerical jitter caused by fast fading, multipath effects, and random Gaussian white noise prevalent in wireless channels, the system executes a continuous reading strategy, i.e., continuously initiating access requests within extremely short time intervals. This is the first read operation.
[0207] Read The raw data is then fed into the microcontroller's internal computing unit for mean filtering. For the Received Signal Strength Indication (RSSI), the system uses the following arithmetic mean formula for calculation:
[0208] ;
[0209] In this formula, This represents the average signal strength after smoothing. Representing the The instantaneous signal strength value read in the next reading. The sampling number is typically set to 5 to 10 to strike a balance between smoothing effect and acquisition latency. The same averaging logic is also applied to the calculation of the Link Quality Indicator (LQI). Through this hardware-level oversampling and software-level averaging filtering, the system effectively filters out high-frequency random noise, obtaining a statistically more stable link state representation value, providing high signal-to-noise ratio input data for subsequent upper-layer algorithms.
[0210] More importantly, this embodiment introduces an analysis mechanism using the Automatic Gain Control (AGC) status register in its hardware data acquisition. This is a core innovation that addresses the problem of traditional methods relying solely on approximate RSSI values failing to accurately distinguish between signal attenuation and ambient noise rise. The AGC circuit in the wireless RF chip automatically adjusts the gain stage of the front-end low-noise amplifier based on the strength of the input signal to ensure the signal falls within the optimal dynamic range of the analog-to-digital converter. The system reads the current front-end gain stage recorded in the AGC status register via the SPI bus. and compared with the calculated Perform joint analysis. This joint analysis is based on the following physical logic: if the read... The value is low, and at the same time At maximum gain, the RF front-end is operating at full speed, but the signal is still weak. The system interprets this as a "weak signal," usually caused by excessive communication distance or physical obstruction. Conversely, if... The value is low or medium, but However, it is in a low to medium gain state, which usually means that the noise floor level in the environment is abnormally raised, causing the AGC circuit to actively reduce the gain in order to prevent noise floor saturation. At this time, the system judges it as "noise floor rise", which is usually caused by external strong electromagnetic interference sources.
[0211] This embodiment employs the aforementioned hardware-level data acquisition strategy, which yields significant advantages. First, by continuously reading and averaging via the SPI bus, the impact of random fluctuations in the wireless channel on the measurement results is suppressed at the source. This ensures that the collected feature data better reflects the long-term trend of the link rather than transient jitter, improving data reliability. Second, the introduction of AGC status as an auxiliary judgment dimension breaks the ambiguity of the traditional single-dimensional RSSI, enabling precise differentiation between the two distinct physical phenomena of "weak signal" and "strong noise." This differentiation capability is crucial for anti-interference diagnosis, as it directly determines subsequent decision-making: for weak signals, transmission power should be increased or routing topology optimized; for increased noise floor, frequency hopping should be implemented to avoid interference bands, thereby greatly improving the system's intelligence in coping with complex electromagnetic environments.
[0212] In the preferred embodiment, the method also includes a containerized deployment step at the edge computing gateway:
[0213] Install the Docker container engine on the edge computing gateway and pull an image containing a Python environment and the Scikit-learn library;
[0214] A global monitoring model is run within a Docker container, and diagnostic results from each sensor node are received via the MQTT protocol.
[0215] The hyperparameters of the K-nearest neighbor algorithm are periodically optimized using global data, and the optimized hyperparameters are then sent to the sensor nodes via OTA (Over-The-Air) upgrade technology.
[0216] The following is a detailed explanation and description of the containerized deployment and global optimization steps of the edge computing gateway in the above technical solution.
[0217] In this embodiment, to compensate for the limitations of the underlying sensor nodes' limited computing resources and field of view, an edge computing gateway is introduced as the "brain" of the local network, and advanced containerization technology is used for software deployment. Specifically, a Docker container engine is first installed on the edge computing gateway's operating system. Docker technology achieves lightweight isolation of the application runtime environment through namespaces and control groups. The system pulls a pre-built Docker image. The base layer of this image contains a simplified Linux file system, while the application layer fully encapsulates the Python interpreter runtime environment and the Scikit-learn machine learning algorithm library. This deployment method not only solves the complex dependency library compatibility issues but also decouples the algorithm environment from the underlying hardware, allowing the gateway software to be seamlessly ported to hardware platforms with different architectures, greatly reducing the maintenance costs of on-site deployment and subsequent version iterations.
[0218] Inside the Docker container, a global monitoring model runs, establishing communication connections with various sensor nodes distributed across the GIS monitoring area via the MQTT (Message Queuing Telemetry Transport) protocol. The MQTT protocol employs a publish-subscribe mechanism, making it well-suited for use in low-bandwidth, unstable network industrial environments. The edge gateway, acting as an MQTT broker or subscriber, receives real-time link status diagnostic results, characteristic statistics, and some difficult-to-determine sample data uploaded by each sensor node. This data converges at the gateway, forming a global dataset covering the entire network's spatiotemporal dimensions. This gives the gateway a more macroscopic global perspective than a single node, enabling it to capture regional interference drift or systemic fault trends that are difficult to detect with single-point monitoring.
[0219] Based on aggregated global data, a Python program running within a container periodically triggers the hyperparameter optimization process of the K-nearest neighbor algorithm. Since the K-nearest neighbor model on sensor nodes typically uses preset fixed hyperparameters, such as the number of neighbors... Value or time decay coefficient These parameters may no longer be optimal after drastic environmental changes. Edge gateways utilize grid search or random search algorithms from the Scikit-learn library to retrain and validate these hyperparameters on a global dataset, seeking parameter combinations that maximize the current global classification accuracy. For example, when high-density noise is detected across the entire network, the algorithm might calculate a larger... To enhance noise resistance; when the pace of environmental changes accelerates, the algorithm will adjust to a larger value. This can accelerate the forgetting of historical data.
[0220] Finally, the optimized hyperparameters are sent to the sensor nodes via OTA (Over-The-Air) updates, completing the closed-loop control of "perception-upload-optimization-deployment". The edge gateway broadcasts control messages containing the new parameter configurations to all relevant nodes via the downlink channel of the MQTT protocol. Upon receiving the update command, the sensor nodes do not need to be reset or restarted; they directly update the K-nearest neighbor algorithm configuration variables in memory and immediately apply the new hyperparameters for subsequent link diagnostics.
[0221] In the preferred embodiment, the data storage and visualization steps include:
[0222] Deploy an InfluxDB time-series database on the edge computing gateway to store link characteristic data with timestamps;
[0223] Grafana is used to connect to InfluxDB to render a network topology heatmap of the GIS monitoring area, displaying the dynamic trajectory of the disturbed area.
[0224] In the preferred scheme, to address the waveguide effect of the GIS metal cavity, the weighted distance calculation formula in S3 adds a sixth dimension, "multipath delay spread":
[0225] Multipath delay spread characteristics are obtained by measuring the full width at half maximum (FWHM) of the correlation peak of the preamble in the wireless signal;
[0226] When calculating the weighted distance, weights are assigned to the multipath delay spread feature. Meanwhile, the weights of the original 5-dimensional features are each reduced by 0.03 to keep the total weight sum at 1.
[0227] The following is a detailed explanation and description of the data storage visualization and optimization schemes for specific GIS scenarios in the above technical solutions.
[0228] In this embodiment, to achieve long-term storage and intuitive display of massive communication link characteristic data, the system deploys a complete data storage and visualization component at the edge computing gateway. Specifically, the system selects InfluxDB time-series database as the core storage engine. InfluxDB is designed specifically for processing timestamped measurement data, possessing extremely high data write throughput and efficient compression storage algorithms. The edge gateway writes the link characteristic data, diagnostic results, and corresponding sampling timestamps uploaded by each sensor node into the InfluxDB database in a time-series format. This storage method not only preserves the instantaneous state of the data but also comprehensively records the dynamic history of communication quality evolution over time within the GIS monitoring area, providing a solid data foundation for subsequent trend analysis and fault backtracking.
[0229] To transform abstract database records into a monitoring view that operations and maintenance personnel can intuitively understand, this embodiment utilizes the Grafana visualization platform connected to InfluxDB. Grafana extracts link status data for a specific time period from the database using its query language and uses its rendering engine to create a network topology heatmap of the GIS monitoring area. In this heatmap, different colored blocks represent different areas of communication quality or interference intensity; for example, red represents areas with strong interference, and green represents areas with good communication. The heatmap dynamically updates over time, thus forming a dynamic trajectory of the interference area. This dynamic trajectory visually demonstrates the movement path or diffusion trend of the interference source within the GIS substation. For example, it clearly shows whether electromagnetic interference generated by partial discharge spreads to adjacent chambers over time, thereby assisting operations and maintenance personnel in quickly locating the source of interference.
[0230] Furthermore, considering the unique physical structure of GIS equipment, namely its enclosed metal cavity structure, this embodiment proposes a preferred algorithm improvement scheme. The metal casing of GIS acts as a natural waveguide for radio waves, causing severe reflection and refraction of signals during transmission within the cavity, resulting in significant multipath effects. To quantify the impact of this physical phenomenon on communication quality, the improved K-nearest neighbor classification model in step S3 adds a sixth feature, multipath delay spread, to the original five-dimensional features. Multipath delay spread reflects the dispersion of the time difference between the arrival of a wireless signal at the receiver via different paths, and is a key physical quantity for distinguishing between direct signals and multipath reflected signals.
[0231] This embodiment employs a method based on physical layer preamble analysis to obtain the multipath delay spread characteristic. The physical layer header of a wireless communication frame typically contains a fixed preamble sequence for synchronization. After receiving the signal, the receiving node performs correlation operations on the preamble to obtain the correlation peak. In an ideal multipath-free environment, the correlation peak should be a sharp pulse; however, in a GIS cavity with severe multipath effects, the correlation peak broadens due to the superposition of signals with different delays. The system numerically characterizes the degree of multipath delay spread by measuring the half-width at half-maximum (HWHM) of the correlation peak of the wireless signal preamble, i.e., the width at half the peak height. A larger HWHM value indicates a more severe multipath effect.
[0232] To integrate this new feature into the classification model, the system adaptively adjusted the weighted distance calculation formula and reallocated the feature weights when calculating the weighted distance. The new distance calculation formula is expanded to a weighted summation form containing six feature dimensions. In weight allocation, the multipath delay extended feature is assigned weights. This indicates that this feature accounts for 10% of the decision-making weight when determining the link state. To maintain the sum of all feature weights... Under this normalization condition, the system reduced the original 5-dimensional feature weights accordingly. Specifically, the weight of each feature dimension was reduced by 0.03. After the adjustment, the weight of the mean physical layer received signal strength became 0.22, the weights of the variance of physical layer received signal strength, link quality indicator volatility, and forward error correction coding bit error rate all became 0.17, and the weight of the channel idle assessment count became 0.12. The adjusted weight combination introduced a new physical dimension while maintaining the relative balance of the original feature system.
[0233] This embodiment employs the aforementioned data storage visualization and targeted optimization strategies, yielding significant benefits. First, the introduction of InfluxDB and Grafana creates a visualized operation and maintenance interface, making hidden electromagnetic interference visible to the naked eye. Operation and maintenance personnel can intuitively determine whether the interference originates from a fixed or mobile source through the dynamic trajectory of the heatmap, greatly improving operational efficiency. Second, the addition of multipath delay spread characteristics to address the GIS waveguide effect effectively solves the problem of inaccurately judging link quality based solely on signal strength within a metal cavity. In GIS environments, a special phenomenon often occurs where the received signal strength is very high (because the metal reflection prevents signal leakage) but the bit error rate is extremely high (due to severe multipath self-interference). The introduction of multipath delay spread characteristics can accurately identify this "pseudo-strong signal" state, distinguishing communication blockage caused by multipath effects from external noise floor rise interference. This guides the system to adopt more appropriate adaptive strategies, such as reducing the transmission rate to combat multipath fading, rather than incorrectly increasing the transmission power.
[0234] In the preferred scheme, the adaptive frequency selection networking strategy incorporates an energy balancing algorithm:
[0235] Let the remaining battery voltage of the candidate next-hop node be... The average remaining battery voltage of all network nodes is ;
[0236] when When, a correction factor is introduced. Adjust the voting weight of this node to ;
[0237] when When, correction factor The weights remain unchanged; network energy gaps are prevented by reducing the probability of low-power nodes being selected as relays.
[0238] The following is a detailed explanation and description of the adaptive frequency selection networking strategy combined with the energy balancing algorithm in the above technical solution.
[0239] In this embodiment, the adaptive frequency selection networking strategy is not limited to routing solely based on the physical quality of the communication link, but deeply integrates an energy balancing algorithm to address the common "hotspot effect" problem in wireless sensor networks. When executing the next-hop relay node selection decision step, the system introduces the node's remaining energy as a key constraint variable. Specifically, the system acquires the remaining battery voltage of each candidate next-hop node in real time, denoted as... Simultaneously, the average remaining battery voltage of all surviving nodes in the current network is obtained through network-wide broadcasting or gateway statistics, denoted as... .
[0240] To scientifically determine whether a node is in a "low battery" state, this embodiment sets a relative threshold coefficient of 0.8. The system uses the remaining battery voltage of candidate nodes... Compare it to 0.8 times the average voltage of the entire network. When the judgment result meets... When this occurs, it indicates that the node's remaining energy is significantly lower than the network average. Continuing to burden it with heavy data relay tasks could easily lead to premature power depletion and its demise. Therefore, the algorithm introduces a correction coefficient. The probability of this node being selected is penalized and suppressed. The formula for calculating the correction coefficient is: ;
[0241] because Calculated It must be a positive decimal less than 0.8. Subsequently, the system uses this correction factor to adjust the node's original voting weight in the routing election. The correction is made, and the calculation formula is as follows: In this formula, w' represents the raw score calculated solely based on link quality (such as RSSI, SNR, etc.), while w' represents the final score after energy weighting. Because The final score w' will be significantly lowered, putting the low-power node at a disadvantage when competing for the best relay, thus forcing the routing algorithm to switch to selecting neighboring nodes with slightly worse link quality but more energy.
[0242] Conversely, when the judgment result satisfies When this occurs, it indicates that the node's energy reserves are at a normal or superior level, sufficient to handle routine relay loads. At this point, the system will adjust the correction factor. Set to a constant of 1 to maintain the node's original voting weight. Unchanged, that is This ensures that, under conditions of sufficient energy, routing still strictly adheres to the principle of prioritizing communication link quality, thereby guaranteeing reliable data transmission and low latency.
[0243] This embodiment employs the aforementioned energy balancing algorithm, which offers significant advantages. Firstly, by introducing a dynamic threshold based on the average voltage of the entire network, the algorithm can adapt to different power levels throughout the network's lifecycle, avoiding the problem of fixed voltage thresholds failing in the early or late stages of the network. Secondly, it utilizes a correction coefficient... By softly adjusting the weights of low-power nodes rather than hard-removing them, the possibility of maintaining connectivity using low-power nodes in extreme cases (such as when no other path is available) is preserved, thus improving the network's resilience. Thirdly, and most importantly, it effectively prevents the formation of "energy holes." Energy holes are usually caused by nodes near the sink node or on critical paths dying prematurely due to excessive data forwarding. This algorithm achieves a more uniform distribution of energy consumption across the entire network by actively reducing the load on low-power nodes, thereby significantly extending the effective lifespan of the entire GIS wireless monitoring system.
[0244] Example 3
[0245] Further explanation in conjunction with Example 1, such as Figure 1-2 As shown, this embodiment provides a sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm for monitoring gas-insulated fully enclosed switchgear (GIS). The system mainly consists of several embedded wireless sensor nodes deployed in the GIS metal cavity environment and an edge computing gateway.
[0246] At the sensor node, the hardware employs an embedded microcontroller paired with a wireless RF chip, connected via a high-speed SPI bus. The software runs the FreeRTOS real-time operating system and creates a high-priority link diagnostic task. When the wireless RF chip receives a signal, it triggers a hardware receive interrupt. The interrupt service routine continuously reads the RF chip's RSSI and LQI registers via the SPI bus and calculates the average. Simultaneously, it reads the AGC status register to obtain the front-end gain level and measures the full width at half maximum (FWHM) of the signal preamble-related peaks to obtain multipath delay spread characteristic data. These physical layer status words are then packaged and written to a message queue, waking up the link diagnostic task.
[0247] The link diagnostic task processes the collected data based on a set time sliding window (window size 5 to 10 seconds, sliding step size 2 to 3 seconds), extracting 6-dimensional raw data of communication link characteristics, including the mean of physical layer received signal strength, variance of physical layer received signal strength, link quality indicator volatility, forward error correction coding bit error rate, channel idle assessment count, and multipath delay spread. Each dimension of the data is then Z-score normalized and mapped to a unified dimension space. To achieve efficient transmission, this normalized feature data is encoded with a high compression ratio and written into the 8-byte reserved field at the end of the DAO or DIO control message of the RPL routing protocol.
[0248] The node has a pre-built dynamic anti-interference training sample library. When the library is initially built, it is necessary to collect 7 to 15 days of data in the GIS monitoring area and manually label it as five types of environmental status labels: normal communication, specific frequency band narrowband interference, GIS switch action impact interference, continuous multipath fading blockage, and RF front-end hardware saturation. The library maintenance adopts an online incremental learning and forgetting mechanism based on timestamps and reference frequency.
[0249] The link diagnosis task inputs the standardized current feature vector into an improved K-nearest neighbor classification model for real-time computation. First, the distance between the current vector and historical samples in the sample library is calculated using a weighted Euclidean distance formula with a time decay coefficient (0.01 to 0.1). At this point, the weights of the six feature dimensions are distributed as follows: the multipath delay extension feature weight is 0.1, and the weights of the other five features are each reduced by 0.03 to maintain a total weight of 1. Then, based on the sample density within a preset radius around the sample to be classified, and according to a high-density threshold of 0.8 and a low-density threshold of 0.3, the optimal K value is adaptively selected between 5 and 30. Next, a Gaussian kernel function with a bandwidth of 1.0 is used to perform weighted voting on the selected K nearest neighbor samples. If the cumulative weight of the dominant category exceeds 1.2 times the sum of the cumulative weights of all other categories, it is considered the final state; otherwise, five more samples are added for recalculation.
[0250] Based on the determined link state category, the node triggers a tiered response strategy. If it is determined to be narrowband interference in a specific frequency band, an adaptive frequency selection networking strategy is triggered, adding the current channel to a temporary blacklist with an automatic clearing mechanism, and invoking the RF driver to switch to the backup channel in a pseudo-random sequence. During routing or switching, combined with an energy balancing algorithm, if the remaining battery voltage of a candidate node is less than 0.8 times the average of the entire network, its voting weight is reduced proportionally to prevent energy depletion. If it is determined to be RF front-end hardware saturation, it is defined as a Level 1 fault, and an emergency maintenance request is sent through the out-of-band low-power Bluetooth channel. If it is determined to be persistent multipath fading blocking, it is defined as a Level 2 fault, and the backup routing table is activated, with recovery status checked every 3 sliding windows. If it is determined to be GIS switch action impact interference, it is defined as a Level 3 transient event, only logged, but if detected 3 times consecutively, it is upgraded to a Level 2 fault. If it is determined to be normal communication, the current connection is maintained and correct samples are fed back to the sample database.
[0251] At the edge computing gateway, Docker containerization is employed. The gateway receives diagnostic results and feature data uploaded by each sensor node via the MQTT protocol and stores the timestamped data in the InfluxDB time-series database. Grafana is then used to connect to the database and render a network topology heatmap of the GIS monitoring area to display the dynamic trajectory of interference. Simultaneously, a Python global supervised model running within the container periodically optimizes the hyperparameters of the K-nearest neighbor algorithm using aggregated global data and distributes the optimized parameters to each sensor node via OTA (Over-The-Air) updates, achieving system-level collaborative optimization.
[0252] The above embodiments are merely preferred technical solutions of the present invention and should not be considered as limitations on the present invention. The scope of protection of the present invention should be limited to the technical solutions described in the claims, including equivalent substitutions of the technical features described in the claims. That is, equivalent substitutions and improvements within this scope are also within the scope of protection of the present invention.
Claims
1. A method for anti-interference diagnosis of dynamic networking of GIS sensors based on the K-nearest neighbor algorithm, characterized by: The method includes: S1. Within a time sliding window with a set size and sliding step, collect raw data of multidimensional communication link characteristics between target sensor nodes and neighboring nodes within the GIS monitoring area, and perform Z-score standardization on the raw data of multidimensional communication link characteristics to map the data to a unified dimension space. The raw data of multidimensional communication link characteristics collected in S1 includes parameters in 5 dimensions: mean physical layer received signal strength, variance of physical layer received signal strength, link quality indicator volatility, forward error correction coding bit error rate, and channel idle assessment count. The specific steps of Z-score normalization in S1 are as follows: for each dimension of feature data... Using the formula Perform the conversion, where The mean of the dimensional features within the time sliding window. Let be the standard deviation of the dimensional feature; if If so, then keep the original data unchanged; S2. Call the dynamic anti-interference training sample library pre-stored in the local storage unit. The dynamic anti-interference training sample library contains historical feature vector data with environmental state labels and has an online incremental learning and forgetting mechanism. S3. Input the standardized multidimensional communication link feature raw data into the improved K-nearest neighbor classification model. By introducing the weighted Euclidean distance calculation formula with time decay factor, calculate the weighted distance between the data and the samples in the dynamic anti-interference training sample library. Use an adaptive strategy based on sample density to select K nearest neighbor samples, and use Gaussian kernel function weighted voting to determine the current communication link state category. Calculate the preset radius of the circle centered on the sample to be classified in the sample space. Sample density within ,in The number of samples within the radius; S4. Based on the determined communication link status category, trigger targeted adaptive frequency selection networking strategies or hierarchical fault early warning operations to achieve anti-interference diagnosis and adaptive adaptation of GIS sensor dynamic networking.
2. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 1, characterized in that: The size of the time sliding window is set to 5 to 10 seconds, and the sliding step size is set to 2 to 3 seconds.
3. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 2, characterized in that: S3 The improved K-nearest neighbor classification model executes the following algorithm logic when calculating the weighted distance: Define the current feature vector as The first in the sample library The feature vectors of each sample are ,sample The sampling timestamp is The current timestamp is ; First, calculate the time decay weight. ,in This is the time decay coefficient, with a value ranging from 0.01 to 0.1; Then, the weighted Euclidean distance with the time decay factor introduced was calculated. ,in For the first Weights of each feature dimension, ; Weight The specific allocation is as follows: mean physical layer received signal strength 0.25, variance of physical layer received signal strength 0.2, link quality indicator volatility 0.2, forward error correction coding bit error rate 0.2, and channel idle assessment count 0.
15.
4. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 1, characterized in that: The selection of the number K nearest neighbor samples in S3 adopts an adaptive selection strategy: Set high density threshold and low density threshold ; when When, the formula for calculating the K value is: ,in ; when When, the formula for calculating the K value is: ,in ; when When, the formula for calculating the K value is: ,symbol This indicates rounding down to the nearest integer.
5. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 3, characterized in that: S3 includes weighted voting based on the Gaussian kernel function, which includes: For the selected K nearest neighbor samples, the Gaussian kernel function formula is used. Calculate the first Voting weights of the nearest neighbor samples ,in The Gaussian kernel bandwidth is set to 1.
0. Calculate the cumulative weight of each category ,in For the first The set of nearest neighbor samples for each category; If the cumulative weight of a certain category If the weight is greater than 1.2 times the cumulative weight of all other categories, then the category is determined to be the final communication link status category; if it is not satisfied, then 5 more samples are added and the calculation is repeated until the threshold is met.
6. The GIS sensor dynamic networking anti-interference diagnosis method based on K-nearest neighbor algorithm according to claim 1, characterized in that: S2 It also includes an initial construction and maintenance mechanism for a dynamic anti-interference training sample library, the steps of which are: Initial construction process: Collect link feature data continuously for 7 to 15 days within the GIS monitoring area, process it through Z-score standardization, manually label the environmental status, and remove outliers that are more than 3 times the standard deviation. The environmental status labels include 5 categories: label 1 is normal communication, label 2 is narrowband interference, label 3 is GIS switch action impact interference, label 4 is persistent multipath fading blockage, and label 5 is RF front-end hardware saturation. Online incremental learning and forgetting mechanism: When the category determined by S3 is confirmed to be correct, the new sample is stored in the database; when the storage reaches the limit, the sample is removed according to the timestamp and the frequency of citation; the frequency of citation refers to the number of times the sample is selected as the nearest neighbor sample in each determination, and this number is cleared and calibrated at the end of each month.
7. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 1, characterized in that: The adaptive frequency selection networking strategy triggered in S4 includes: When the communication link status category is determined to be label 2, i.e., narrowband interference, the current wireless channel number will be added to the temporary blacklist. The radio frequency driver is invoked to switch to a backup channel not in the temporary blacklist according to a pseudo-random frequency hopping sequence. The temporary blacklist has an automatic removal mechanism. If a channel that has been added to the list does not detect narrowband interference again within 5 consecutive time sliding windows, it will be automatically removed from the temporary blacklist.
8. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 1, characterized in that: The actions that trigger graded fault warnings in S4 include: If the communication link status category is determined to be tag 5, i.e., the RF front-end hardware is saturated, it is defined as a level 1 fault. A fault code is immediately generated and an emergency maintenance request is sent through the out-of-band low-power Bluetooth channel. If the communication link status category is determined to be label 4, i.e., persistent multipath fading and blocking, it is defined as a level 2 fault. The backup routing table is activated, and the link recovery status is checked every 3 time sliding windows. If the communication link status category is determined to be tag 3, i.e., GIS switch action impact interference, it is defined as a level 3 transient event. Only the log is recorded and no alarm is triggered. However, if the event is detected in 3 consecutive sliding windows, it is upgraded to a level 2 fault.
9. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 2, characterized in that: The steps for collecting raw data on multidimensional communication link characteristics involve modifications to the RPL routing protocol stack: The last 8 bytes of the DAO and DIO control messages in the RPL protocol are selected as reserved fields; The 5-dimensional feature data after Z-score standardization are written into the reserved field in the following order: "mean value of physical layer received signal strength, variance of physical layer received signal strength, volatility of link quality indicator, bit error rate of forward error correction coding, and channel idle assessment count". Each feature dimension occupies 1.6 bytes, and the receiving node extracts feature data strictly according to this byte order when parsing the message.
10. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 1, characterized in that: This method runs on an embedded microcontroller, and the deployment steps include: Run the FreeRTOS real-time operating system on the embedded microcontroller and create a high-priority link diagnostic task; The link diagnostic task subscribes to the wireless driver's receive interrupts via a message queue; When the wireless driver triggers a receive interrupt, it packages the physical layer status word and writes it into the message queue, waking up the link diagnostic task to perform classification calculations in S3.
11. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 10, characterized in that: It also includes data acquisition steps at the hardware level: Access the RSSI and LQI registers of the wireless RF chip via the SPI bus, read them N times consecutively, and calculate the average value. Access the automatic gain control (AGC) status register of the RF chip to obtain the front-end gain level, and combine it with the RSSI mean for analysis to distinguish between weak signal and increased noise floor.
12. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 1, characterized in that: The method also includes containerized deployment steps at the edge computing gateway: Install the Docker container engine on the edge computing gateway and pull an image containing a Python environment and the Scikit-learn library; A global monitoring model is run within a Docker container, and diagnostic results from each sensor node are received via the MQTT protocol. The hyperparameters of the K-nearest neighbor algorithm are periodically optimized using global data, and the optimized hyperparameters are then sent to the sensor nodes via OTA (Over-The-Air) upgrade technology.
13. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 12, characterized in that: It also includes data storage and visualization steps, including: Deploy an InfluxDB time-series database on the edge computing gateway to store link characteristic data with timestamps; Grafana is used to connect to InfluxDB to render a network topology heatmap of the GIS monitoring area, displaying the dynamic trajectory of the disturbed area.
14. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 3, characterized in that: For the waveguide effect of GIS metal cavities, the weighted distance calculation formula in S3 adds a sixth dimension feature, "multipath delay spread": Multipath delay spread characteristics are obtained by measuring the full width at half maximum (FWHM) of the correlation peak of the preamble in the wireless signal; When calculating the weighted distance, weights are assigned to the multipath delay spread feature. Meanwhile, the weights of the original 5-dimensional features are each reduced by 0.03 to keep the total weight sum at 1.
15. The GIS sensor dynamic networking anti-interference diagnosis method based on the K-nearest neighbor algorithm according to claim 7, characterized in that: The adaptive frequency-selective networking strategy incorporates an energy balancing algorithm: Let the remaining battery voltage of the candidate next-hop node be... The average remaining battery voltage of all network nodes is ; when When, a correction factor is introduced. Adjust the voting weight of this node to ; when When, correction factor The weights remain unchanged; network energy gaps are prevented by reducing the probability of low-power nodes being selected as relays.