Iot sensing data diagnosis method based on cloud edge end cooperation

By employing a cloud-edge-device collaborative IoT data diagnostic method, and utilizing hierarchical computing and self-supervised SVM and XGBoost algorithms, the problems of false triggering and false alarms in IoT systems are solved, achieving efficient, real-time data diagnostics and adaptive updates.

CN116842452BActive Publication Date: 2026-06-12ZHEJIANG E VISION ELECTRONICS TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG E VISION ELECTRONICS TECH
Filing Date
2022-12-07
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing IoT systems are prone to false triggers and false alarms during data diagnostics, and the large amount of data transmission places high demands on server hardware resources and network bandwidth, making it difficult to maintain diagnostic quality and efficiency.

Method used

By adopting a cloud-edge-device collaborative approach, the system performs layered computing through bottom-level data preprocessing and filtering, middle-level data diagnosis, and upper-level diagnostic quality supervision. It utilizes embedded microcontrollers, ARM/X86 architecture chips, and servers for hierarchical computing, and combines SVM models and XGBoost algorithms for data filtering and diagnostic quality supervision, thereby achieving self-supervision and model retraining.

🎯Benefits of technology

It effectively reduces false triggering and false alarms in IoT systems, reduces the computational pressure on servers, and ensures diagnostic quality and adapts to real-time updates of the monitored objects.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116842452B_ABST
    Figure CN116842452B_ABST
Patent Text Reader

Abstract

The embodiment of the application discloses a kind of based on cloud edge end collaborative internet of things sensing data diagnostic method.Method includes: by bottom layer circuit board based on screening surface to internet of things sensing data is screened;By intermediate layer edge host using SVM model, screening result is diagnosed, to judge whether the detection object state corresponding to internet of things sensing data is normal;By upper layer server using mathematical principle is different from XGBoost algorithm of SVM model, whether diagnosis quality meets the requirement;If diagnosis quality does not meet the requirement, then by upper layer server retraining SVM model, re-determine bottom layer screening surface, to keep diagnosis system can be adapted to the latest detection object state in time.The present application has the advantages of automatic diagnosis, self-supervision, self-evaluation, automatic adjustment, under the premise of guaranteeing data diagnosis quality, greatly reduce the false triggering false alarm and other phenomena of internet of things system, avoid dense and huge acquisition data to cause huge pressure to any level.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to data diagnostic methods, and more specifically to IoT sensing data diagnostic methods based on cloud-edge-device collaboration. Background Technology

[0002] The Internet of Things (IoT) system is used to monitor machines, equipment, and processes online, and data is uploaded to the cloud via 4G / 5G networks. This enables functions such as operational status diagnosis, predictive maintenance, and rapid fault location. It has been increasingly widely used in various fields and industries, such as automated steel strip rolling systems, electrical safety monitoring systems, water treatment systems, air purification systems, and electrical fire early warning systems. Because these systems are basically in a continuous operation state, any downtime due to a fault will usually cause significant economic losses.

[0003] During online monitoring, IoT systems generate a large amount of real-time sensing data, which often exhibits the following characteristics: First, the vast majority of the data represents normal operating status, with abnormal data often submerged within this large volume of normal data. Second, IoT sensing data often contains significant background noise and interference signals. The background noise is often Gaussian white noise caused by industrial background noise, while the interference signals are often spike-like signals generated by the sensors themselves. Therefore, IoT sensing data is prone to various false triggers and false alarms. For example, in electrical fire early warning systems, even when there is no actual fire hazard, sensor interference signals may trigger an alarm signal to the fire control room, forcing security guards to urgently check the actual situation on site. In practical applications, due to the impact of false triggers and false alarms, many security guards choose to increase the alarm threshold to reduce the number of alarms. However, this behavior actually violates electrical fire safety standards and regulations, constituting a potential safety hazard.

[0004] Diagnosing IoT sensing data typically involves using AI algorithms, such as multi-layer neural networks, to determine whether the data is normal or abnormal. This process involves collecting a large amount of real-world data, manually labeling each data point as normal or abnormal, and then using the labeled dataset to train multi-layer neural network and other algorithm models. Finally, the algorithm is run on servers, including both on-site servers and cloud servers, to achieve data diagnosis. While this method has proven effective, it still has drawbacks. In practical applications, all data is transmitted to the server, and all AI algorithms are executed on the server. To maintain the efficiency of data diagnosis, the server's hardware resources are highly demanding. Furthermore, if cloud servers are used for data diagnosis, sufficiently large network bandwidth is required.

[0005] Therefore, it is necessary to design a new method to significantly reduce false triggering and false alarms in IoT systems such as electrical fires while ensuring the quality of data diagnosis. This would avoid putting enormous pressure on any level due to the dense and massive amount of data collected, and ensure that the diagnostic system can adapt to the latest status of the monitored objects in a timely manner to guarantee diagnostic quality. Summary of the Invention

[0006] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method, device, computer equipment and storage medium for diagnosing IoT sensing data.

[0007] To achieve the above objectives, the present invention adopts the following technical solution: a cloud-edge-device collaborative IoT sensing data diagnosis method, including bottom-level data preprocessing and data filtering, middle-level data diagnosis, upper-level diagnostic quality supervision, and upper-level model retraining;

[0008] The underlying data preprocessing and data filtering are performed by a circuit board with an embedded microcontroller as its core, the intermediate data diagnosis is performed by an intermediate edge host with an ARM or X86 architecture chip as its core, and the upper-layer diagnostic quality supervision and upper-layer model retraining are performed by an upper-layer server in the form of a single unit or a cluster.

[0009] The further technical solution is as follows: the underlying data preprocessing and data filtering process includes:

[0010] First, the underlying circuit board collects IoT sensing data and processes missing data within a single collection cycle; data with missing data is then deleted.

[0011] Next, singularities are removed from the remaining IoT sensing data; the data is bubble sorted from largest to smallest, and then some of the maximum and minimum values ​​are removed.

[0012] The remaining IoT sensing data is subjected to noise filtering again; Kalman filtering or median filtering is performed on the data; the data after noise filtering is stored in the storage module of the underlying circuit board.

[0013] Finally, the remaining IoT sensing data is filtered. Based on the parameter dimensions issued by the upper-layer server compared to the hyperplane filtering line or surface of the middle-layer edge host SVM model, normal data is removed, and suspected abnormal data is uploaded to the edge host. The parameter dimensions of the filtering line or surface are one level lower than the parameter dimensions of the middle-layer edge host SVM model hyperplane.

[0014] The further technical solution is as follows: the intermediate layer data diagnosis process includes:

[0015] First, the intermediate layer edge host executes the SVM model to diagnose the screening results; the diagnosed IoT data is classified into two categories: normal and abnormal, to determine whether the corresponding detection object is in a normal state; during the execution of the SVM model by the intermediate layer edge host, it only performs calculations according to the predetermined SVM model mathematical formula, and does not undertake the training process of the SVM model; it does not undertake the derivation and optimization process of the SVM model mathematical formula.

[0016] Next, the middle-layer edge host randomly selects a portion of its diagnosed data and diagnostic results and uploads it to the upper-layer server according to the dynamic sampling ratio specified by the upper-layer server.

[0017] The further technical solution is as follows: the specific process of the upper-level diagnostic quality supervision is as follows:

[0018] First, using the XGBoost classification algorithm, which has a completely different mathematical principle from the SVM model, the data uploaded by the intermediate layer edge host is classified as normal or abnormal.

[0019] Secondly, the XGBoost calculation results are compared with the SVM model diagnostic results; the XGBoost calculation results are used as the standard value to determine the number of diagnostic results that differ from the standard value of the intermediate layer edge host SVM model diagnostic results.

[0020] Next, the error rate is calculated based on the stated quantity to determine the diagnostic quality;

[0021] The diagnostic quality is divided into four intervals according to the error rate from small to large: down, maintain, up, and recall. If the error rate is in the down interval, the dynamic sampling ratio of the intermediate layer edge host is reduced by a certain value. If the error rate is in the maintain interval, the dynamic sampling ratio is maintained. If the error rate is in the up interval, the dynamic sampling ratio is increased by a certain value. If the error rate is in the recall interval, the upper layer model is retrained.

[0022] Finally, if the error rate falls within one of the three ranges—decreasing, maintaining, or increasing—then the intermediate layer edge host is instructed to output the diagnostic results. A further technical solution is as follows: the upper-layer model retraining includes two parts: SVM model retraining and re-determining the selection line / surface. The specific process is as follows:

[0023] First, configure the raw data, that is, send a command to the underlying circuit board to upload all the data stored therein, after missing item processing, singular point removal and noise filtering, to the cloud platform via communication.

[0024] Secondly, the retrieved original data is clustered into two categories using the K-Means clustering algorithm. The category with more data is labeled as normal, and the category with fewer data is labeled as abnormal. These two categories of data are then used as the sample set for retraining the SVM model.

[0025] Next, the SVM model is trained using the training sample set, that is, the hyperplane of the SVM model is re-found in order to obtain the core parameters of the SVM model.

[0026] Then, based on the latest SVM model hyperplane, we find its limit value in each parameter dimension by traversing it, and make a screening line / screening surface with the parameter dimension one degree lower than the hyperplane based on the limit value.

[0027] Finally, the core parameters of the SVM model are sent to the middle layer edge host to update its internal SVM model; the core parameters of the screening line / screening surface are sent to the bottom board to update its internal screening line / screening surface.

[0028] The beneficial effects of this invention compared to existing technologies are as follows: This invention filters IoT sensing data, uses an SVM model to diagnose the filtering results, and uses the XGBoost classification algorithm to calculate the diagnostic quality of the diagnostic results. Based on the diagnostic quality, it determines whether the SVM model needs to be retrained. This significantly reduces false triggering and false alarms in IoT systems such as electrical fires while ensuring the quality of data diagnosis. It avoids the huge pressure that dense and massive data collection puts on any level and can keep the diagnostic system able to adapt to the latest status of the detected objects in a timely manner, thus ensuring diagnostic quality.

[0029] The present invention will be further described below with reference to the accompanying drawings and specific embodiments. Attached Figure Description

[0030] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0031] Figure 1 This is a schematic diagram illustrating an application scenario of the IoT sensing data diagnostic method provided in an embodiment of the present invention.

[0032] Figure 2 This is a full flowchart of the IoT sensing data diagnosis method provided in an embodiment of the present invention;

[0033] Figure 3A flowchart illustrating the software implementation of the IoT sensing data diagnostic method provided in this embodiment of the invention;

[0034] Figure 4 A software implementation flowchart of the IoT sensing data diagnostic method provided in this embodiment of the invention;

[0035] Figure 5 A software implementation flowchart of the IoT sensing data diagnostic method provided in this embodiment of the invention;

[0036] Figure 6 A software implementation flowchart of the IoT sensing data diagnostic method provided in this embodiment of the invention;

[0037] Figure 7 This is a schematic diagram of the four-interval division of diagnostic quality provided in an embodiment of the present invention;

[0038] Figure 8 This is a schematic diagram of normal data removal provided in an embodiment of the present invention. Figure 1 ;

[0039] Figure 9 This is a schematic diagram of normal data removal provided in an embodiment of the present invention. Figure 2 . Detailed Implementation

[0040] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0041] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0042] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0043] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0044] Please see Figure 1 , Figure 1 This diagram illustrates an application scenario of the IoT sensing data diagnostic method provided in this invention. The method comprises three hardware layers: a bottom-layer circuit board, a middle-layer edge host, and an upper-layer server. Specifically: the bottom-layer circuit board uses an embedded microcontroller as its core, accepting connections from various IoT sensors or devices, serving as the entry point for IoT sensing data; the middle-layer edge host uses an ARM or x86 architecture chip as its core; and the upper-layer server can be a single server, a server cluster, or a virtual server cluster deployed in the cloud. This invention, through the collaborative work of the cloud, edge, and terminal layers, distributes the computational pressure layer by layer, avoiding excessive pressure on any layer, especially the upper-layer server layer, caused by dense and massive data collection.

[0045] Please see Figure 2 , Figure 2 This is a flowchart illustrating the IoT sensing data diagnostic method provided in this embodiment of the invention. In this method, data preprocessing and filtering are performed by the bottom-layer circuit board, data diagnosis is performed by the middle-layer edge host, and diagnostic quality supervision and model retraining are performed by the upper-layer server. The cloud, edge, and terminal layers work together to capture abnormal states of the IoT system in real time without omission, and are particularly adept at capturing minute amounts of abnormal data from a large amount of normal data in real time.

[0046] Figure 3 This is a flowchart illustrating the software implementation of the IoT sensing data diagnostic method provided in an embodiment of the present invention. Figure 3 As shown, the method includes the following steps S110 to S170.

[0047] S110, Acquire IoT sensing data.

[0048] In this embodiment, the sensing data refers to the data acquired by the Internet of Things sensing sensors.

[0049] For example, building safety power monitoring mainly determines whether the power usage is normal or abnormal by detecting three parameters: operating current, leakage current, and cable temperature. In other words, the sensing data includes data on these three parameters.

[0050] S120. Filter the IoT sensing data to obtain the filtering results.

[0051] In this embodiment, the filtering result refers to the remaining perceptual data after processing missing data, removing singularities, and filtering noise to remove normal data.

[0052] In one embodiment, please refer to Figure 3The above-mentioned step S120 may include steps S121 to S122.

[0053] S121. Perform missing data processing, singularity removal, and noise filtering on the IoT sensing data to obtain preprocessing results.

[0054] In this embodiment, the preprocessing result refers to the perceived data after processing missing data, removing singularities, and filtering noise.

[0055] In this embodiment, the missing item handling refers to the fact that in building safety power monitoring, each test should have three parameters: operating current, leakage current, and cable temperature. However, if a certain test only has two parameters, operating current and cable temperature, and the leakage current parameter is missing, then the test result is missing data. The handling method can be to directly delete the missing data or to fill in the missing data.

[0056] Singularity removal refers to eliminating a few data points that are occasionally exceptionally large or small. This can be achieved by performing a bubble sort on all the data and then removing the first and last elements, or by using wavelet algorithms to remove singularities. It should be noted that if multiple consecutive data points have very large or very small values, these points are not considered singularities, and the bubble sort method will not remove these data points.

[0057] Noise filtering refers to removing Gaussian white noise and other noise from the accompanying data. Various filtering methods can be used, such as median filtering and Kalman filtering, as well as data smoothing methods such as moving average.

[0058] S122. Remove normal data from the preprocessing results to obtain the screening results.

[0059] Specifically, the filtering and removal of normal data involves eliminating data that is "clearly identifiable as normal," while sending "substantially abnormal data" and "seemingly abnormal data that may also be normal" to the middle-layer edge host for further diagnosis. Therefore, the underlying data filtering essentially reduces the computational burden on the middle-layer edge host. Although it cannot completely distinguish between normal and abnormal data, it can still remove a large amount of normal data that could cause unnecessary computation for the middle-layer edge host as much as possible.

[0060] The process and mathematical principles of normal data removal steps are as follows: Figure 8 and Figure 9 As shown, this is mainly achieved through iteration. As mentioned earlier, each set of input parameters x in the SVM model contains n data points, also known as n dimensions. To remove normal data, it is necessary to find the limiting values ​​of the hyperplane in each dimension and create filtering lines or surfaces based on these limiting values. For example... Figure 8As shown in the diagram, the input parameters have two dimensions. First, the limit value of the hyperplane is found in the z2 dimension, and a filter line 1 is drawn parallel to the z1 dimension. All data exceeding filter line 1 is removed. In this embodiment, this is the data above filter line 1, i.e., data exceeding the limit value in the z2 dimension. Then, the limit value of the hyperplane is found in the z1 dimension, and a filter line 2 is drawn parallel to the z2 dimension. All data exceeding filter line 2 is removed. In this embodiment, this is the data to the left of filter line 2. It can be seen that the data removed by filter lines 1 and 2 is definitely normal data, while the data that is not removed includes not only abnormal data but also a small portion of normal data, requiring further diagnosis by the SVM model in the intermediate layer data diagnosis section.

[0061] For example Figure 9 As shown in the figure, the SVM input parameters have three dimensions. However, since the limit value of the hyperplane could not be obtained in the z1 and z2 dimensions, it could only be obtained in the z3 dimension. Therefore, a filtering surface parallel to the z1z2 plane is drawn through this limit value. Data exceeding this filtering surface (in this example, data smaller than the filtering surface) will be discarded. Figure 8 and Figure 9 As can be seen, the parameter dimension of the filtering line or filtering surface is one level lower than the parameter dimension of the hyperplane of the intermediate layer edge host SVM model, for example... Figure 8 The Chinese Super League plane is 2-dimensional, while the corresponding screening line is 1-dimensional. Figure 9 The Chinese Super League plane is 3-dimensional, while the corresponding screening plane is 2-dimensional.

[0062] In addition, the remaining data is uploaded and sent to the middle-layer edge host via various wired or wireless communication methods.

[0063] S130. Use an SVM model to diagnose the screening results to determine whether the status of the detected object corresponding to the IoT sensing data is normal, so as to obtain a diagnostic result.

[0064] In this embodiment, the diagnostic result refers to the judgment result of whether the state of the detected object corresponding to the IoT sensing data is normal.

[0065] The essence of the SVM model is to find a hyperplane in a multidimensional space that can separate data (especially nonlinear data) with the maximum margin, thereby achieving data classification. The equation of this hyperplane can be written in the following form: w Tx + b = 0, where x represents the input parameters of the SVM model, w represents the normal vector of the classification hyperplane, T represents the transpose of the vector, and b represents the offset of the hyperplane relative to the origin. The input parameters x typically include multiple parameters. For example, in a building safety power monitoring system, the input parameters x include three parameters: operating current, leakage current, and cable temperature. Therefore, the input parameters x are usually written in the form (z1, z2, z3, ..., zn).

[0066] Assuming the output parameter of the SVM model is y, in this invention, the output parameter y can only take two values: normal or abnormal. For example, in a building safety electricity monitoring system, y = +1 can represent a normal electricity status, and y = -1 can represent an abnormal electricity status. Therefore, the sample set of the SVM model can be represented as {(x1,y1),(x2,y2),…,(xn,yn)}, where: xi∈Rd, yi∈{+1,-1} are category labels, i = 1,2,…,n.

[0067] The SVM model diagnostic steps involve using the filtered underlying data as x, based on the determined parameters w and b, to determine w. T Whether x+b is greater than 0 or less than 0 determines whether the data set is normal or abnormal. The specific values ​​of parameters such as w and b are derived from the SVM model retraining step in the diagnostic model retraining section.

[0068] S140. Calculate the diagnostic quality based on the diagnostic results and the screening results.

[0069] In this embodiment, diagnostic quality refers to the accuracy or error rate of diagnosing the screening results using an SVM model.

[0070] In one embodiment, please refer to Figure 5 The above-mentioned step S140 may include steps S141 to S143.

[0071] S141. Extract the diagnostic results and the screening results according to the proportion to obtain the sampling results.

[0072] In this embodiment, the sampling results refer to partial diagnostic results and corresponding screening results.

[0073] Specifically, the input parameters x and their corresponding diagnostic results y, which have undergone the SVM model diagnosis described above, are randomly sampled according to a dynamic sampling ratio (e.g., 10%) specified by the upper-layer server and uploaded to the upper-layer server. The remaining data that is not sampled is stored in the middle-layer edge host for later use by the upper-layer server. To enhance the supervision of the SVM model's diagnostic quality, the sampling ratio can be increased, and vice versa.

[0074] S142. The sampling results are verified using the XGBoost classification algorithm to obtain the verification results.

[0075] In this embodiment, the verification result refers to verifying the sampling results uploaded by the intermediate layer edge host using the XGBoost classification algorithm. That is, based on the same input parameter x, the XGBoost model, which is different from the SVM model, is used for calculation. Finally, the y calculated by the XGBoost model is compared with the y calculated by the SVM model to determine whether they are the same, thereby evaluating the diagnostic quality of the SVM model.

[0076] The reason for evaluating the diagnostic quality of the SVM model is that the intermediate SVM model (including parameters w and b) is relatively fixed, while the objects being monitored in an IoT system are changing. For example, in a building safety electricity monitoring system, as cables age, their operating current, leakage current, cable temperature, and other parameters will slowly change. In this case, the original SVM model may no longer be suitable for diagnosing whether the parameters are normal. Therefore, an algorithm different from the SVM model is needed to supervise and evaluate the diagnostic quality of the intermediate SVM model, and the XGBoost model with a decision tree kernel can undertake this task.

[0077] The XGBoost model is characterized by high accuracy and high efficiency, while also achieving overall algorithm optimization. The basic idea of ​​the XGBoost model is as follows: First, an initial decision tree is used to classify the data. However, the prediction results of the initial decision tree are often not completely accurate, producing some residuals based on the prediction results and the true values ​​of the training samples. Therefore, a new decision tree is used to fit the residuals. The new decision tree may produce new residuals, so another decision tree is used. This process is iterated until the pre-defined conditions are met. Finally, the prediction result is a weighted sum of the prediction results of each tree.

[0078] The XGBoost classification algorithm execution steps in the upper-layer diagnostic quality supervision part are as follows: It integrates historical data sampled and uploaded from the middle-layer edge hosts into a training sample set, which also has the form {(x1,y1),(x2,y2),…,(xn,yn)}, where: xi∈Rd, yi∈{+1,-1} are class labels, i=1,2,…,n. After training the XGBoost model using the historical data sample set, it classifies the most recently sampled and uploaded data as normal / abnormal to supervise the diagnostic quality of the SVM model in subsequent steps.

[0079] The XGBoost model trains its classifiers by adding the residual values ​​from each round to the weak classifiers generated in the next round. Through repeated training iterations and by using different weight assignments to reduce bias, the weak classifiers are eventually aggregated into a single classifier with ideal accuracy. The following section details how the XGBoost model is implemented through continuous learning and layering of learners.

[0080] First, a weak learner needs to be initialized. Where f0(x) represents the initial learner; during initialization, c is the average value of the labels of all training samples; This indicates that for a given c, the minimum value of the loss function is obtained; L(y) i c) represents the loss function; y i This represents the data in the training set. To iteratively train m = 1, 2, ..., M regression trees, we first need to calculate the residual r for each group of samples. mi calculate, The residual r obtained in the previous step mi The new true values ​​from the next set of samples are used to train the next new regression tree, and the best-fit value c is calculated. mj , j represents the number of leaf nodes in the regression tree; R mj This is the leaf node region. The enhanced learner is f. m (x), In the above formula, I is an indicator function, taking the value of 0 or 1; when the condition xR is satisfied... mj If I = 1, then I = 1; otherwise, I = 0.

[0081] The final XGBoost model is obtained.

[0082] S143. Calculate the diagnostic quality based on the verification results and the diagnostic results in the sampling results.

[0083] In one embodiment, please refer to Figure 6 The above step S143 may include steps S1431 to S1432.

[0084] S1431. Using the verification result as the standard value, determine the number of diagnostic results in the sampling results that are different from the standard value;

[0085] S1432. Determine the error rate based on the quantity to obtain diagnostic quality.

[0086] In this embodiment, the calculation results of the SVM model and the XGBoost model are compared one by one. Specifically, for each set of parameters xi, the judgment result of the XGBoost model is used as the standard value to determine whether the judgment result of the SVM model is consistent with the judgment result of the XGBoost model. Then, the error rate of the SVM model in the data sampled and uploaded by the intermediate layer edge host is counted. If the error rate of the SVM model exceeds a certain threshold (e.g., 30%), it will enter the subsequent diagnostic model retraining part. Similarly, the error rate threshold of the SVM model can also be modified according to the actual situation. When the data diagnostic quality requirements are strict, the threshold can be set lower, and vice versa.

[0087] S150. Determine whether the diagnostic error rate is within the recall interval.

[0088] Figure 7 This is a schematic diagram of the four-interval division of diagnostic quality provided in an embodiment of the present invention. In this embodiment, the diagnostic quality is divided into four intervals according to the error rate from small to large: down, maintain, up, and recall. If the error rate is in the down interval, the dynamic sampling ratio of the intermediate layer edge host is reduced by a certain value. If the error rate is in the maintain interval, the dynamic sampling ratio is maintained. If the error rate is in the up interval, the dynamic sampling ratio is increased by a certain value. If the error rate is in the recall interval, the upper-layer model retraining step is initiated.

[0089] S160. If the diagnostic quality does not meet the requirements, the SVM model is retrained, and step S130 is executed.

[0090] In one embodiment, step S160 described above may include steps S161 to S163.

[0091] S161. Configure raw data.

[0092] In this embodiment, the original dataset configuration for the SVM model retraining part involves sending a command to the intermediate-layer edge host to transmit all IoT detection parameters xi stored on the intermediate-layer edge host after the last diagnostic model retraining to the upper-layer server. It should be noted that only the IoT detection parameters xi need to be uploaded at this point; the SVM model's diagnostic results yi for these parameters do not need to be uploaded, as the SVM model's diagnostic results are now considered unreliable.

[0093] S162. The training sample set is re-determined based on the original data using the K-Means clustering algorithm.

[0094] Specifically, the diagnostic process uses the K-Means clustering algorithm to automatically divide the parameters xi passed in the above steps into two main categories: normal and abnormal. The results of the K-Means clustering algorithm will be used as the training sample set for subsequent retraining of the SVM model.

[0095] The K-Means clustering algorithm operates on the parameter xi as follows: Two initial samples are selected as initial cluster centers a = a1, a2. It should be noted that since this embodiment only distinguishes between normal and abnormal data, two initial cluster centers are set here, corresponding to the normal class e1 and the abnormal class e2, respectively. For each sample xi in the dataset, its distance to the two cluster centers is calculated, and the sample data point xi is assigned to the class corresponding to the cluster center with the smallest distance. After all samples xi have been classified, the cluster centers of the normal class and the abnormal class (i.e., the centroids of all samples belonging to that class) are recalculated according to the following formula. Repeat steps two and three above until the change in the cluster centers of the normal and abnormal clusters is less than a certain threshold.

[0096] After the K-Means clustering algorithm is used, all parameters xi are divided into two categories: normal and abnormal. In other words, each parameter xi is relabeled with yi (a label value of +1 indicates normal, and a label value of -1 indicates abnormal). Thus, the new dataset (xi, yi) can be used as the training sample set for retraining the SVM model.

[0097] S163. The SVM model is trained using the training sample set to obtain the core parameters of the SVM model.

[0098] In this embodiment, the hyperplane of the SVM model is re-found using the training sample set to obtain the core parameters of the SVM model.

[0099] Retraining the SVM model using a new training sample set is essentially the process of finding a new hyperplane for the SVM model. The optimal hyperplane of the SVM model is the one that maximizes the margin between the two classes of data; in other words, it maximizes the distance from the nearest sample point to the hyperplane. For example, given a sample point P, the distance d from this point to the hyperplane can be described by the following formula:

[0100] In the above formula, ||W|| is the Euclidean norm of the hyperplane.

[0101] For linearly separable data, SVM model training is a constrained optimization problem, as shown in the following formula: In the above two equations, n is the total number of sample points, and st means "subject to," that is, to comply with a certain condition. From the above two equations, it can be seen that it is only necessary to satisfy sty·(w T Under the constraints given by (x+b)≥1, i=1,…,n, using the training sample set provided in the previous step, solve... The minimum value of can be used to obtain the hyperplane of the SVM model.

[0102] For linearly inseparable data, by introducing slack variables ξ and penalty factors C, the constrained optimization problem can be transformed into the following form:

[0103]

[0104] In the above three equations, the slack variable ξ and the penalty factor C are usually taken as numbers greater than 0. To solve equation 10, its dual problem can be obtained through Lagrange multipliers, and a kernel function representing the dot product of two sample data can be introduced for solution. These are all conventional methods for solving SVM models, and will not be elaborated here.

[0105] SVM core parameter distribution refers to distributing the trained w and b parameters to the intermediate layer edge host to update the hyperplane mathematical model within the intermediate layer edge host.

[0106] The IoT data diagnostic method in this embodiment has the advantages of layered collaboration, automatic diagnosis, self-monitoring, self-evaluation, and automatic adjustment. It can capture abnormal states of IoT systems in real time without omission, and avoid the excessive pressure on the system caused by dense and massive data collection through the collaborative work of cloud, edge, and terminal layers. Furthermore, it can adapt to the latest status of the monitored objects in a timely manner through retraining of the diagnostic system, thereby significantly reducing false triggers and false alarms. The method proposed in this embodiment is applicable to various IoT monitoring scenarios, such as automated steel strip rolling systems, electrical safety monitoring systems, water treatment systems, air purification systems, and building electrical fire early warning systems.

[0107] S170. If the diagnostic quality meets the requirements, that is, the diagnostic error rate is in one of the three ranges of decreasing, maintaining, or increasing, then the intermediate layer edge host is commanded to output the diagnostic result.

[0108] This embodiment can capture abnormal states of IoT systems in real time without omission, and is particularly good at capturing minute amounts of abnormal data from a large amount of normal data. Through the division of labor and cooperation of cloud, edge and terminal, the computing pressure is distributed layer by layer, avoiding the huge pressure on any level, especially the upper-level server level, caused by dense and large-scale data collection. It has the characteristics of self-monitoring, self-evaluation and self-updating, so that the diagnostic system can keep up with the latest status of the detected object in a timely manner to ensure the quality of diagnosis.

[0109] The entire method can be summarized as follows: including bottom-level data preprocessing and data filtering, middle-level data diagnosis, upper-level diagnostic quality supervision, and upper-level model retraining;

[0110] The underlying data preprocessing and data filtering are performed by a circuit board with an embedded microcontroller as its core, the intermediate data diagnosis is performed by an intermediate edge host with an ARM or X86 architecture chip as its core, and the upper-layer diagnostic quality supervision and upper-layer model retraining are performed by an upper-layer server in the form of a single unit or a cluster.

[0111] The underlying data preprocessing and data filtering process includes:

[0112] First, the underlying circuit board collects IoT sensing data and processes missing data within a single collection cycle; data with missing data is then deleted.

[0113] Next, singularities are removed from the remaining IoT sensing data; the data is bubble sorted from largest to smallest, and then some of the maximum and minimum values ​​are removed.

[0114] The remaining IoT sensing data is subjected to noise filtering again; Kalman filtering or median filtering is performed on the data; the data after noise filtering is stored in the storage module of the underlying circuit board.

[0115] Finally, the remaining IoT sensing data is filtered. Based on the parameter dimensions issued by the upper-layer server compared to the hyperplane filtering line or surface of the middle-layer edge host SVM model, normal data is removed, and suspected abnormal data is uploaded to the edge host. The parameter dimensions of the filtering line or surface are one level lower than the parameter dimensions of the middle-layer edge host SVM model hyperplane.

[0116] The intermediate layer data diagnostic process includes:

[0117] First, the intermediate layer edge host executes the SVM model to diagnose the screening results; the diagnosed IoT data is classified into two categories: normal and abnormal, to determine whether the corresponding detection object is in a normal state; during the execution of the SVM model by the intermediate layer edge host, it only performs calculations according to the predetermined SVM model mathematical formula, and does not undertake the training process of the SVM model; it does not undertake the derivation and optimization process of the SVM model mathematical formula.

[0118] Next, the middle-layer edge host randomly selects a portion of its diagnosed data and diagnostic results and uploads it to the upper-layer server according to the dynamic sampling ratio specified by the upper-layer server.

[0119] The aforementioned upper-level diagnostic quality supervision process is as follows:

[0120] First, using the XGBoost classification algorithm, which has a completely different mathematical principle from the SVM model, the data uploaded by the intermediate layer edge host is classified as normal or abnormal.

[0121] Secondly, the XGBoost calculation results are compared with the SVM model diagnostic results; the XGBoost calculation results are used as the standard value to determine the number of diagnostic results that differ from the standard value of the intermediate layer edge host SVM model diagnostic results.

[0122] Next, the error rate is calculated based on the stated quantity to determine the diagnostic quality;

[0123] The diagnostic quality is divided into four intervals according to the error rate from small to large: down, maintain, up, and recall. If the error rate is in the down interval, the dynamic sampling ratio of the intermediate layer edge host is reduced by a certain value. If the error rate is in the maintain interval, the dynamic sampling ratio is maintained. If the error rate is in the up interval, the dynamic sampling ratio is increased by a certain value. If the error rate is in the recall interval, the upper layer model is retrained.

[0124] Finally, if the error rate falls within the three ranges of decreasing, maintaining, or increasing, the intermediate layer edge host is instructed to output the diagnostic results.

[0125] The aforementioned upper-layer model retraining includes two parts: SVM model retraining and re-determination of the selection line / selection surface. The specific process is as follows:

[0126] First, configure the raw data, that is, send a command to the underlying circuit board to upload all the data stored therein, after missing item processing, singular point removal and noise filtering, to the cloud platform via communication.

[0127] Secondly, the retrieved original data is clustered into two categories using the K-Means clustering algorithm. The category with more data is labeled as normal, and the category with fewer data is labeled as abnormal. These two categories of data are then used as the sample set for retraining the SVM model.

[0128] Next, the SVM model is trained using the training sample set, that is, the hyperplane of the SVM model is re-found in order to obtain the core parameters of the SVM model.

[0129] Then, based on the latest SVM model hyperplane, we find its limit value in each parameter dimension by traversing it, and make a screening line / screening surface with the parameter dimension one degree lower than the hyperplane based on the limit value.

[0130] Finally, the core parameters of the SVM model are sent to the middle layer edge host to update its internal SVM model; the core parameters of the screening line / screening surface are sent to the bottom board to update its internal screening line / screening surface.

[0131] The aforementioned cloud-edge-device collaborative IoT sensing data diagnostic method filters IoT sensing data, uses an SVM model to diagnose the filtered results, and employs the XGBoost classification algorithm to calculate the diagnostic quality of the results. Based on the diagnostic quality, it determines whether the SVM model needs to be retrained. This method significantly reduces false triggering and false alarms in IoT systems, such as electrical fires, while ensuring data diagnostic quality. It avoids the excessive pressure that dense and massive data collection can place on any level, and ensures that the diagnostic system can adapt to the latest status of the detected objects in a timely manner, thereby guaranteeing diagnostic quality.

[0132] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for diagnosing IoT sensing data based on cloud-edge-device collaboration, characterized in that: This includes bottom-level data preprocessing and filtering, middle-level data diagnostics, upper-level diagnostic quality supervision, and upper-level model retraining. Among them, the underlying data preprocessing and data filtering are performed by a circuit board with an embedded microcontroller as the core, the intermediate layer data diagnosis is performed by an intermediate layer edge host with an ARM architecture or X86 architecture chip as the core, and the upper layer diagnostic quality supervision and upper layer model retraining are performed by an upper layer server in the form of a single or cluster. The underlying data preprocessing and data filtering process includes: First, the underlying circuit board collects IoT sensing data, and then processes the missing data in a single collection cycle by deleting data with missing items. Next, singularities are removed from the remaining IoT sensing data. The data is sorted by bubble sort from largest to smallest, and then some of the maximum and minimum values ​​are removed. The remaining IoT sensing data is subjected to noise filtering again, and Kalman filtering or median filtering is performed on the data; the data after noise filtering is stored in the storage module of the underlying circuit board. Finally, the remaining IoT sensing data is filtered. Based on the filtering lines or surfaces issued by the upper-layer server, data in a normal state is removed, and data suspected of being abnormal is uploaded to the edge host. The parameter dimension of the filtering lines or surfaces is one level lower than the parameter dimension of the hyperplane of the SVM model of the middle-layer edge host. The intermediate layer data diagnostic process includes: First, the intermediate layer edge host executes the SVM model to diagnose the screening results, classifying the diagnosed IoT data into two categories: normal and abnormal, in order to determine whether the corresponding detection object is in a normal state. During the execution of the SVM model by the intermediate layer edge host, it only performs calculations according to the established SVM model mathematical formula, without undertaking the training process of the SVM model, nor undertaking the derivation and optimization process of the SVM model mathematical formula. Next, the middle-layer edge host randomly selects a portion of its diagnosed data and diagnostic results and uploads it to the upper-layer server according to the dynamic sampling ratio specified by the upper-layer server.

2. The IoT sensing data diagnosis method based on cloud-edge-device collaboration according to claim 1, characterized in that, The aforementioned upper-level diagnostic quality supervision process is as follows: First, using the XGBoost classification algorithm, which has a completely different mathematical principle from the SVM model, the data uploaded by the intermediate layer edge host is classified as normal or abnormal. Secondly, the XGBoost calculation results are compared with the SVM model diagnostic results; the XGBoost calculation results are used as the standard value to determine the number of diagnostic results that differ from the standard value of the intermediate layer edge host SVM model diagnostic results. Next, the error rate is calculated based on the stated quantity to determine the diagnostic quality; The diagnostic quality is divided into four intervals according to the error rate from small to large: down, maintain, up, and recall. If the error rate is in the down interval, the dynamic sampling ratio of the intermediate layer edge host is reduced by a certain value. If the error rate is in the maintain interval, the dynamic sampling ratio is maintained. If the error rate is in the up interval, the dynamic sampling ratio is increased by a certain value. If the error rate is in the recall interval, the upper layer model is retrained. Finally, if the error rate falls within the three ranges of decreasing, maintaining, or increasing, the intermediate layer edge host is instructed to output the diagnostic results.

3. The IoT sensing data diagnosis method based on cloud-edge-device collaboration according to claim 1, characterized in that, The aforementioned upper-layer model retraining includes two parts: SVM model retraining and re-determination of the selection line / selection surface. The specific process is as follows: First, configure the raw data, that is, send a command to the underlying circuit board to upload all the data stored therein, after missing item processing, singular point removal and noise filtering, to the cloud platform via communication. Secondly, the retrieved original data is clustered into two categories using the K-Means clustering algorithm. The category with more data is labeled as normal, and the category with fewer data is labeled as abnormal. These two categories of data are then used as the sample set for retraining the SVM model. Next, the SVM model is trained using the training sample set, that is, the hyperplane of the SVM model is re-found in order to obtain the core parameters of the SVM model. Then, based on the latest SVM model hyperplane, we find its limit value in each parameter dimension by traversing it, and make a screening line / screening surface with the parameter dimension one degree lower than the hyperplane based on the limit value; Finally, the core parameters of the SVM model are sent to the middle-layer edge host to update its internal SVM model; The core parameters of the screening lines / screening surfaces are sent down to the underlying circuit board to update the screening lines / screening surfaces inside it.