Robust multi-modal network operation and maintenance fault detection method, system and product
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XI AN JIAOTONG UNIV
- Filing Date
- 2023-04-21
- Publication Date
- 2026-06-19
Smart Images

Figure CN116743555B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of network operation and maintenance technology, and in particular to a robust multimodal network operation and maintenance fault detection method, system and product. Background Technology
[0002] In recent years, network operations and maintenance (O&M) technology has become increasingly important across various industries. Due to increasing digitalization, larger system scale, finer-grained component monitoring, massive amounts of monitoring data, and the continuous introduction of new technologies and components, network O&M has become increasingly challenging, overwhelming O&M engineers with massive amounts of high-speed monitoring data. Traditional network fault diagnosis methods rely primarily on human experience and expertise to diagnose and resolve problems. This approach has inherent subjectivity and limitations, and cannot meet the needs of large-scale networks.
[0003] In recent years, with the development of machine learning technology, some data-driven fault diagnosis methods have been proposed. These methods mainly rely on machine learning algorithms to automatically diagnose network faults by analyzing data from network devices. However, traditional data-driven methods typically only utilize single-modality data for diagnosis and cannot leverage multi-modal data from the operations and maintenance field for fault detection and diagnosis. Furthermore, single-type data may suffer from incompleteness or noise, affecting the accuracy and reliability of fault diagnosis.
[0004] Therefore, to improve the accuracy and reliability of network fault diagnosis, a method capable of simultaneously utilizing multimodal data is needed. However, in the field of network operation and maintenance, due to factors such as the signal-to-noise ratio of the data, multimodal models often exhibit bias towards one of the modes, leading to a decline in model performance. Existing debiasing methods often sacrifice performance on data that is independent and identically distributed (i.i.d.) from the training data to improve performance on non-independent and identically distributed (i.i.d.) data. Both of these approaches are detrimental to the robustness of the system. Summary of the Invention
[0005] This invention provides a robust multimodal network operation and maintenance fault detection method, system, and product to solve the problem that existing multimodal learning models can only learn the distribution characteristics of log data, but cannot learn the information characteristics of the data, resulting in insufficient system robustness.
[0006] In a first aspect, embodiments of the present invention provide a robust multimodal network operation and maintenance fault detection method, comprising the following steps:
[0007] Step 1: Input the log file into the corresponding unimodal model and extract the feature representation of the log file;
[0008] Step 2: Input the time series of KPI indicators into the corresponding single-modal model and extract the feature representation of the KPI time series;
[0009] Step 3: Integrate the feature representations of different modalities extracted in Step 1 and Step 2 to obtain a multimodal feature vector;
[0010] Step 4: Input the multimodal feature vector from Step 3 into the first classifier to obtain the corresponding output;
[0011] Step 5: Input the features from Step 1 into the single-modality feature extractor to obtain the corresponding output;
[0012] Step 6: Combine the outputs of Step 4 and Step 5 to obtain the fused information result. Perform calculations on the fused information result to obtain the final prediction result.
[0013] Based on the first aspect, in step 3, the information fusion of the features from different modalities extracted in steps 1 and 2 to obtain a multimodal feature vector includes:
[0014] The feature representations of the log file and the feature representations of the KPI time series are respectively passed through a linear layer, added together, and then passed through an activation function to obtain the corresponding representations;
[0015] The corresponding representations are passed through a linear layer to obtain attention weights, and then softmax normalization is used to obtain the final attention weights.
[0016] The attention weights are weighted to obtain a fused representation of the feature representations of multiple KPI time series.
[0017] The fused representation and the feature representation of the log file are processed using the Hadamard product to obtain a multimodal feature vector that incorporates a top-down attention mechanism.
[0018] Based on the first aspect, the multimodal feature vector E m The definition is as follows:
[0019] E m =E mk ⊙e l (l),
[0020] In the formula, E mk A fusion representation of the feature representations of multiple KPI time series. ⊙ is the product of the Hadamarda, e l (l) represents the characteristics of the log file;
[0021] Among them, E mk The definition is as follows:
[0022]
[0023] In the formula, j represents the j-th group of KPI performance index data. Let j be the final attention weight corresponding to the j-th group of KPI performance data, j∈[1,n] k ], e k (k) represents the feature representation of the KPI time series, k j For the j-th group of KPI performance index data (out of n) k Group), α is the attention weight obtained after the linear layer. W a For linear layers, E Q The feature representations of the log file and the KPI time series are respectively processed through linear layers, summed, and then activated by the ReLU (·) function to obtain the corresponding representations. In this context, T represents the linear layer W. α The transpose of E Q,i =Relu(W l e l (l)+W k e k,i (k) T ), 1≤i≤n k E Q,i The characteristic representation of the log file is combined with any 1-n k The feature representation obtained from the feature representation of the KPI time series of the group, e l (l) represents the characteristics of the log file, e k,i (k) represents the feature representation of the i-th KPI time series and the corresponding input e. k The result of (k), [e k,i (k)] T In this context, T represents e k,i The transpose of (k), W l and W k It is a linear layer.
[0024] Based on the first aspect, in step 4, the output is a non-bias-reduced classification result z. nd The definition is as follows:
[0025] z nd =softmax(FCN(E) m )),
[0026] In the formula, FCN(·) is a fully connected network, and E m It is a multimodal feature;
[0027] In step 5, the output is an n-dimensional feature vector E in a single mode. nThe definition is as follows:
[0028] E n =FCN(e l (l)),
[0029] In the formula, FCN(·) is a fully connected network, e l (l) represents the characteristics of the log file.
[0030] Based on the first aspect, in step 6, fusing the outputs of steps 4 and 5 to obtain a fused information result, and then performing calculations on the fused information result to obtain the final prediction result, includes:
[0031] The output of step 5 is input into the sigmoid activation function, and then the inner product is performed with the output of step 4) to obtain the fused information result;
[0032] The fused information results are then subjected to a softmax operation in an n-dimensional space to obtain the final prediction result.
[0033] Wherein, the final prediction result z pred The definition is as follows:
[0034] z pred =softmax(z)=softmax(z) nd ·σ(E n )),
[0035] In the formula, z nd E represents the non-bias-reducing classification result from step 4. n Let σ be the n-dimensional feature vector of the single mode in step 5, and σ(·) be the sigmoid activation function.
[0036] Based on the first aspect, the loss function of the robust multimodal network is calculated through the following steps:
[0037] The output of step 5 is input into the second classifier to perform n-label classification. The single-modal loss function is obtained by calculating the cross-entropy loss between the label classification and the real label.
[0038] Calculate the cross-entropy loss between the final prediction result and the actual result in step 6 to obtain the multimodal loss function;
[0039] A relaxed control factor is introduced to optimize the multimodal loss function, resulting in the final loss function.
[0040] The final loss function LO is defined as follows:
[0041] LO=γL1+L2=-[a]log(z pred ) γ-[a]log(softmax(c n (E n ))),
[0042] The single-mode loss function L2 is defined as follows:
[0043] L2 = -[a]log(softmax(c) n (E n ))),
[0044] The multimodal loss function L1 is defined as follows:
[0045] L1 = -[a]log(z) pred ),
[0046] In the formula, γ∈(0,1) represents the loose control factor, a represents the true result, [a] represents the true label corresponding to the true result, and z nd The result is the non-biased classification result from step 4, where σ(·) is the sigmoid activation function, and E... n c is the n-dimensional feature vector in the single-mode condition in step 5. n This represents the second classifier.
[0047] Based on the detection method of the first aspect, the method further includes:
[0048] During training, the loss function calculated for the output of the first classifier in each round is subjected to relaxation optimization, and the relaxation control factor is adjusted as follows:
[0049]
[0050] In the formula, L1 represents the loss function value output and calculated by the current first classifier, and L*1 represents the loss function value output and calculated by the first classifier in the previous round.
[0051] Based on the same inventive concept described above, in a second aspect, the present invention provides a robust multimodal network operation and maintenance fault detection system, comprising:
[0052] The first extraction module is used in step 1 to input the log file into the corresponding single-modal model and extract the feature representation of the log file.
[0053] The second extraction module is used in step 2 to input the time series of KPI indicators into the corresponding single-modal model and extract the feature representation of the KPI time series.
[0054] The first fusion module is used in step 3 to fuse the features of different modalities in steps 1 and 2 to obtain a multimodal feature vector.
[0055] The first classification module is used in step 4 to input the multimodal feature vector from step 3 into the first classifier and obtain the corresponding output.
[0056] The third extraction module is used in step 5 to input the features from step 1 into the single-modality feature extractor and obtain the corresponding output.
[0057] The second fusion module is used in step 6 to fuse the outputs of steps 4 and 5 to obtain fused information results, and to perform calculations on the fused information results to obtain the final prediction result.
[0058] Thirdly, the present invention provides an electronic device, comprising:
[0059] Memory, used to store one or more programs;
[0060] processor;
[0061] When the processor executes the one or more programs, it implements the robust multimodal network operation and maintenance fault detection method as described in any one of the first aspects.
[0062] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the robust multimodal network operation and maintenance fault detection method as described in any one of the first aspects.
[0063] This invention has the following advantages:
[0064] (1) The embodiments of the present invention make full use of multimodal data in network operation and maintenance, which greatly enhances the reliability and accuracy of fault prediction;
[0065] (2) The embodiments of the present invention employ debiasing optimization to solve the problem of bias in the data learned by the model due to the low signal-to-noise ratio of data in network operation and maintenance.
[0066] (3) In this embodiment of the invention, after introducing debiasing optimization, a relaxed optimization strategy is added. By introducing a relaxed control factor, the effectiveness of bias optimization is further guaranteed, so that the system can maintain high accuracy when predicting faults for data with different distributions, which greatly increases the robustness of the system. Attached Figure Description
[0067] Figure 1 A flowchart illustrating a robust multimodal network operation and maintenance fault detection method provided in this embodiment of the invention;
[0068] Figure 2 A schematic diagram of the framework flow of a multimodal information fusion module provided in an embodiment of the present invention;
[0069] Figure 3A schematic diagram of a robust multimodal operation and maintenance fault detection framework provided in an embodiment of the present invention;
[0070] Figure 4 An architecture diagram of a robust multimodal network operation and maintenance fault detection system provided in an embodiment of the present invention;
[0071] Figure 5 This is a schematic structural block diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0072] This invention provides a robust multimodal network operation and maintenance fault detection method, system, and product. It can utilize machine KPI performance indicators and multimodal information from log files to detect faults in network operation and maintenance systems, maintaining the stability and reliability of prediction results even when processing data with different distributions. First, features are extracted from the KPI performance indicators and log files respectively. Then, a multimodal information fusion module processes the extracted multimodal features to obtain corresponding multimodal features. A classifier performs preliminary classification, followed by debiasing to obtain the single-modal feature vectors needed for debiasing. These vectors are then combined with the previous classification results to obtain the final fault detection classification result. Furthermore, a relaxed optimization strategy is introduced during the model training phase to reduce the model's dependence on data distribution, thereby enhancing the robustness of system fault detection.
[0073] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0074] See Figure 1 , Figure 1 A flowchart illustrating a robust multimodal network operation and maintenance fault detection method provided in this embodiment of the invention. The robust multimodal network operation and maintenance fault detection method includes the following steps:
[0075] Step 1: Input the log file into the corresponding unimodal model and extract the feature representation of the log file;
[0076] Step 1 aims to obtain a feature representation of the log file by inputting the log file in text form into the encoder. The dimension is d z The feature vector is obtained by segmenting and serializing the log file in text form at time t according to the vocabulary, and then inputting the serialized data l into the Transformer model to obtain the corresponding feature representation.
[0077] Step 2: Input the time series of KPI indicators into the corresponding single-modal model and extract the feature representation of the KPI time series;
[0078] Step 2 aims to obtain the characteristic representation of KPI performance indicators, and to convert n k KPI performance metrics in the form of group time series for k-input encoders Get n k The group dimension is d z The feature vector; specifically, inputting the KPI performance index data k in time series form within time t into the Transformer model, obtaining the corresponding feature representation e of the KPI time series in time series form within time t. k (k); where the KPI indicator is a KPI performance indicator, such as network latency, bandwidth utilization, packet loss rate, CPU utilization, etc. KPI performance indicators usually exist in the form of time series.
[0079] Step 3: Integrate the feature representations of different modalities extracted in Step 1 and Step 2 to obtain a multimodal feature vector;
[0080] In step 3, the information fusion of the features from different modalities extracted in steps 1 and 2 to obtain a multimodal feature vector includes:
[0081] The feature representations of the log file and the feature representations of the KPI time series are respectively passed through a linear layer, added together, and then passed through an activation function to obtain the corresponding representations;
[0082] The corresponding representations are passed through a linear layer to obtain attention weights, and then softmax normalization is used to obtain the final attention weights.
[0083] The attention weights are weighted to obtain a fused representation of the feature representations of multiple KPI time series.
[0084] The fused representation and the feature representation of the log file are processed using the Hadamard product to obtain a multimodal feature vector that incorporates a top-down attention mechanism.
[0085] In this embodiment of the invention, the activation function is Relu(·).
[0086] Wherein, the multimodal feature vector E m The definition is as follows:
[0087] E m =E mk ⊙e l (l),
[0088] In the formula, E mk This is a fusion representation of the feature representations of m KPI time series. ⊙ is the product of the Hadamarda, e l(l) represents the characteristics of the log file;
[0089] Among them, E mk The definition is as follows:
[0090]
[0091] In the formula, j represents the j-th group of KPI performance index data. Let j be the final attention weight corresponding to the j-th group of KPI performance data, j∈[1,n] k ], e k (k) represents the feature representation of the KPI time series, k j For the j-th group of KPI performance index data (out of n) k Group), α is the attention weight obtained after the linear layer. W a For linear layers, E Q The feature representations of the log file and the KPI time series are respectively processed through linear layers, summed, and then activated by the ReLU (·) function to obtain the corresponding representations. In this context, T represents the linear layer W. α The transpose of E Q,i =Relu(W l e l (l)+W k e k,i (k) T ), 1≤i≤n k E Q,i The characteristic representation of the log file is combined with any 1-n k The feature representation obtained from the feature representation of the KPI time series of the group, e l (l) represents the characteristics of the log file, e k,i (k) represents the feature representation of the i-th KPI time series and the corresponding input e. k The result of (k), [e k,i (k)] T In this context, T represents e k,i The transpose of (k), W l and W k It is a linear layer.
[0092] Step 3 aims to obtain multimodal feature vectors, representing the features of the log files extracted in Step 1 as e l (l) and the feature representation of the KPI time series extracted in step 2. k (k), input multimodal encoder By using a top-down attention mechanism to fuse the single-modal features obtained from the bottom-up approach, the model can extract more fine-grained information from the data.m =e m (e l (l),e k (k)); Specifically, the features of different modalities obtained in steps 1 and 2 through a bottom-up approach are fused to obtain multimodal information features; First, the features of different modalities obtained in steps 1 and 2 are fused to obtain multimodal information features; Passing through linear layers respectively The sums are then processed through the ReLU activation function to obtain the corresponding representations. Among them, E Q,i =Relu(W l e l (l)+W k e k,i (k) T ), 1≤i≤n k E Q,i The characteristic representation of the log file is combined with any 1-n k The feature representation of the KPI time series of the group is obtained by feature representation; then E Q After a linear layer The corresponding attention weights α are obtained, and then normalized using softmax to obtain the final attention weights. in, In this context, T represents the linear layer W. α The transpose of n; thus, n can be... k The weighted representation of the KPI performance indicators is used to obtain the fusion representation E of the feature representations of the corresponding multiple KPI time series. mk E is a fusion representation of the feature representations of multiple KPI time series. mk and the characteristic representation of log files e l (l) Perform the Hadamard product, which yields a multimodal feature vector incorporating a top-down attention mechanism. Among them, E m =E mk ⊙e l (l).
[0093] For example, see Figure 2 , Figure 2 This is a schematic diagram of the framework flow of a multimodal information fusion module provided in an embodiment of the present invention. Specifically, the feature representation e of the bottom-up single-modal feature log file is first... l (l) and the time series characteristics of KPI performance indicators e k (k) passes through the linear layer W respectively l and W k The outputs are then summed and passed through the ReLU activation function to obtain the corresponding representation E.Q Then E Q After a linear layer W α The corresponding attention weights α are obtained, and then normalized using softmax to obtain the final attention weights. n k The weighted representation of the KPI performance indicators is used to obtain the fusion representation E of the feature representations of the corresponding multiple KPI time series. mk Then, the feature representations of multiple KPI time series are fused and represented as E. mk and the characteristic representation of log files e l (l) Perform the Hadamard product, which yields the multimodal feature vector E incorporating the top-down attention mechanism. m In the process of obtaining attention weights for log files, a top-down attention mechanism is used to obtain attention weights.
[0094] Step 4: Input the multimodal feature vector from Step 3 into the first classifier to obtain the corresponding output;
[0095] The first classifier is a multimodal classifier;
[0096] Step 5: Input the features from Step 1 into the single-modality feature extractor to obtain the corresponding output;
[0097] In step 4), the output is the non-bias-reduced classification result z. nd The definition is as follows:
[0098] z nd =softmax(FCN(E) m )),
[0099] In the formula, FCN(·) is a fully connected network, and E m It is a multimodal feature;
[0100] In step S510, the output is an n-dimensional feature vector E in a single mode. n The definition is as follows:
[0101] E n =FCN(e l (l)),
[0102] In the formula, FCN(·) is a fully connected network, e l (l) represents the characteristics of the log file.
[0103] Step 4 aims to obtain non-biased classification results by passing the multimodal features through a multimodal classifier. The non-bias-reducing classification result z is obtained. ndStep 5 aims to obtain the final classification result by inputting the log file l into the new unimodal feature extractor. In this process, the corresponding n-dimensional feature E under a single mode is obtained. n The final classification of fault results is z. pred =softmax(z nd ·σ(E n Specifically, in step 4, the multimodal feature vector E obtained in step 3 is... m The input is fed into a multimodal classifier to obtain the corresponding output, specifically, E... m As an input to a fully connected network, it will be taken from d z The dimension is mapped to a preset number of fault types (including no faults) n dimensions, resulting in a non-bias-debiased classification result z. nd , z nd =softmax(FCN(E) m ), where FCN(·) is a fully connected network, E m For multimodal features; specifically, in step 5, the feature representation from step 1 is input into a single-modal feature extractor to obtain the corresponding output; the d from the log file in step 1 is... z The 1D feature representation is mapped to n dimensions through a fully connected network, resulting in the corresponding n-dimensional feature vector E under a single modality. n E n =FCN(e l (l)), where FCN(·) is a fully connected network, e l (l) represents the characteristics of the log file.
[0104] Step 6: Combine the outputs of Step 4 and Step 5 to obtain the fused information result. Perform calculations on the fused information result to obtain the final prediction result.
[0105] This invention provides a robust multimodal network operation and maintenance fault detection system. It can utilize machine KPI performance indicators and multimodal information from log files to detect faults in network operation and maintenance systems, maintaining the stability and reliability of prediction results even when processing data with different distributions. The invention first extracts features from KPI indicators and log files respectively. Then, a multimodal information fusion module processes the extracted multimodal features to obtain corresponding multimodal features. A classifier performs preliminary classification, followed by debiasing to obtain the single-modal feature vectors needed for debiasing. These vectors are then combined with the previous classification results to obtain the final fault detection classification result. Furthermore, a relaxed optimization strategy is introduced during the model training phase to reduce the model's dependence on data distribution, thereby enhancing the robustness of the system's fault detection.
[0106] In step 6, the outputs of steps 4 and 5 are fused to obtain a fused information result. The fused information result is then processed to obtain the final prediction result, including:
[0107] The output of step 5 is input into the sigmoid activation function, and then the inner product is performed with the output of step 4 to obtain the fused information result.
[0108] The fused information results are then subjected to a softmax operation in an n-dimensional space to obtain the final prediction result.
[0109] Wherein, the final prediction result z pred The definition is as follows:
[0110] z pred =softmax(z)=softmax(z) nd ·σ(E n )),
[0111] In the formula, z nd E represents the non-bias-reducing classification result from step 4. n Let σ be the n-dimensional feature vector of the single mode in step 5, and σ(·) be the sigmoid activation function.
[0112] In step 6, both steps 6 and 5 aim to eliminate the influence of bias. Because the signal-to-noise ratio of data in the operations and maintenance field is very low, with a large amount of data being normal and only a small portion being abnormal data, deep learning models may develop data bias during training. The model may strongly correlate the normal system state with the normal log files during learning, ignoring the contribution of another machine KPI performance indicator. Therefore, step 6 is introduced to eliminate this bias. After completing step 6, the loss function LO of the entire system can be obtained, and backpropagation is performed to update the model parameters. It is important to note that the process in step 6 does not involve backpropagation to prevent the encoder from directly learning the data bias. That is, the gradients between the single-modal feature extractor in step 5 and the parameters in step 1 are not calculated. For example, the specific corresponding process can be found in [link to relevant documentation]. Figure 3 The part marked with an "X".
[0113] Specifically, firstly, the n-dimensional single-modal log file feature vector E from step 5... n Input a sigmoid activation function and then compare it with the classification result z from step 4. nd Perform the inner product to obtain the corresponding fused information result z:
[0114] z = z nd ·σ(E n ),
[0115] Next, the fused information result z is subjected to a softmax operation in n-dimensional space to obtain the final prediction result z. pred ,
[0116] z pred =softmax(z)=softmax(z) nd ·σ(E n )),
[0117] In the formula, z represents the fused information result, z nd E represents the non-bias-reducing classification result from step 4. n Let σ be the n-dimensional feature vector of the single mode in step 5, and σ(·) be the sigmoid activation function.
[0118] Furthermore, the loss function of the robust multimodal network is calculated through the following steps:
[0119] The output of step 5 is input into the second classifier to perform n-label classification. The single-modal loss function is obtained by calculating the cross-entropy loss between the label classification and the real label.
[0120] Calculate the cross-entropy loss between the final prediction result and the actual result in step 6 to obtain the multimodal loss function;
[0121] A relaxed control factor is introduced to optimize the multimodal loss function, resulting in the final loss function.
[0122] The final loss function LO is defined as follows:
[0123] LO=γL1+L2=-[a]log(z pred ) γ -[a]log(softmax(c n (E n ))),
[0124] The single-mode loss function L2 is defined as follows:
[0125] L2 = -[a]log(softmax(c) n (E n ))),
[0126] The multimodal loss function L1 is defined as follows:
[0127] L1 = -[a]log(z) pred ),
[0128] In the formula, γ∈(0,1) represents the loose control factor, a represents the true result, [a] represents the true label corresponding to the true result, and z ndThe result is the non-biased classification result from step 4, where σ(·) is the sigmoid activation function, and E... n c is the n-dimensional feature vector in the single-mode condition in step 5. d This represents the second classifier; where the second classifier is a unimodal classifier.
[0129] The features from step 5 are input into the second classifier, and the loss between the final prediction result and the true result of step 5 and step 6 is calculated. The loss between the final prediction result and the true result is then subjected to a relaxed optimization process to obtain the final loss. The model is then iteratively trained. Specifically, the cross-entropy loss function L1 between the classification result and the true label in step 6 is calculated, and the n-dimensional unimodal features E from step 5 are used as the basis for the loss. n Input a classifier In the process, n-label classification is performed, and the cross-entropy loss function L2 between the output of step 5 and the true label is calculated.
[0130] In this embodiment of the invention, the loss function comprises two parts: a cross-entropy loss function L1 and a cross-entropy loss function L2, as detailed below:
[0131] L1 = -[a]log(z) pred ),
[0132] L2 = -[a]log(softmax(c) n (E n ))),
[0133] Based on the above cross-entropy loss function L1 and cross-entropy loss function L2, the original loss function CE = L1 + L2 is obtained by adding them together.
[0134] A relaxed control factor γ is introduced to perform a relaxed optimization on the L1 part of the cross-entropy loss function. The optimized loss function LO is defined as follows:
[0135] LO=γL1+L2=-[a]log(z pred ) γ -[a]log(softmax(c n (E n ))),
[0136] Where a represents the true result, [a] represents the true label corresponding to the true result, γ∈(0,1) represents the lenient control factor, and z nd The result is the non-biased classification result from step 4, where σ(·) is the sigmoid activation function, and E... n c is the n-dimensional feature vector in the single-mode condition in step 5. n This represents the second classifier.
[0137] The principle behind calculating the loss function described above is that, for the cross-entropy loss function, its derivative is:
[0138]
[0139] In the formula, a∈{±1} correspond to whether the classification result is correct, and p is the predicted probability; it can be seen that the derivative of the loss function after relaxation optimization is compared with the original derivative:
[0140]
[0141] In the formula, p d This is the final prediction result obtained in step 6, while p b This is the prediction result obtained in step 4; given 0 < γ < 1 and 0 ≤ p d ≤1, can be obtained Therefore, this method can use smaller gradient changes to update the training model parameters to reduce the model bias learning ability caused by the characteristics of the log function, further ensuring the effectiveness of bias optimization and thus improving the robustness of the system.
[0142] Furthermore, based on the above robust multimodal network operation and maintenance fault detection method, the method further includes:
[0143] During training, the loss function calculated for the output of the first classifier in each round is subjected to relaxation optimization, and the relaxation control factor is adjusted as follows:
[0144]
[0145] In the formula, L1 represents the loss function value output and calculated by the current first classifier, and L*1 represents the loss function value output and calculated by the first classifier in the previous round; where the first classifier is a multimodal classifier.
[0146] For example, see Figure 3 , Figure 3This is a schematic diagram of a robust multimodal operation and maintenance fault detection framework provided by an embodiment of the present invention. Specifically, firstly, feature representations of log files and KPI performance indicators are extracted using a single-modal model. The extracted feature representations of log files and KPI performance indicators are then fused using multimodal information. The fused multimodal information is input into a fault classifier to obtain a non-biased prediction result. This fault classifier is a multimodal classifier. Next, a bias reduction module is used to reduce the bias of the model by training it on log file data. The bias reduction module includes steps 5 and 6. In specific implementation, the bias reduction module performs bias reduction processing through steps 5 and 6, specifically: the extracted feature representations of log files are input into a single-modal feature extractor to obtain an n-dimensional feature vector in single-modality mode. This feature vector is then input into a sigmoid function. After the live function is applied, the Hadamard product and the non-bias-reduced prediction results are fused to obtain the fused information result. After softmax operation, the final prediction result [0.8,0.1,0.1] is obtained. This final prediction result [0.8,0.1,0.1] is the classification result obtained by the fused information result through a multimodal classifier. The feature representation of the log file is input into the fault classifier to obtain the corresponding prediction result [0.7,0.2,0.1]. This fault classifier is a unimodal classifier. The cross-entropy loss function is calculated for the final prediction result [0.8,0.1,0.1] and the prediction result [0.7,0.2,0.1], respectively. After relaxing the cross-entropy loss function calculated for the final prediction result [0.8,0.1,0.1], the optimized loss function is obtained, resulting in the optimized prediction result [1,0,0].
[0147] In the above implementation process, step 1 involves inputting the log file into the corresponding unimodal model to extract the feature representation of the log file; step 2 involves inputting the time series of KPI indicators into the corresponding unimodal model to extract the feature representation of the KPI time series; step 3 involves fusing the feature representations of different modalities extracted in steps 1 and 2 to obtain a multimodal feature vector; step 4 involves inputting the multimodal feature vector from step 3 into the first classifier to obtain the corresponding output; step 5 involves inputting the features from step 1 into the unimodal feature extractor to obtain the corresponding output; and step 6 involves fusing the outputs from steps 4 and 5 to obtain the fused information result, and then performing calculations on the fused information result to obtain the final prediction result. The robust multimodal network operation and maintenance fault detection method provided in this invention fully utilizes multimodal data in network operation and maintenance, greatly enhancing the reliability and accuracy of fault prediction. It employs debiasing optimization to address the problem of low signal-to-noise ratio in network operation and maintenance data leading to biased data learning by the model. After introducing debiasing optimization, a relaxed optimization strategy is added. By introducing a relaxed control factor, the effectiveness of bias optimization is further ensured, allowing the system to maintain high accuracy when predicting faults using data with different distributions, greatly increasing the system's robustness.
[0148] Based on the same inventive concept described above, embodiments of the present invention also provide a robust multimodal network operation and maintenance fault detection system. (See also...) Figure 4 , Figure 4 An architecture diagram of a robust multimodal network operation and maintenance fault detection system provided in this embodiment of the invention includes:
[0149] The first extraction module 210 is used in step 1 to input the log file into the corresponding single-modal model and extract the feature representation of the log file.
[0150] The second extraction module 220 is used in step 2 to input the time series of KPI indicators into the corresponding single-modal model and extract the feature representation of the KPI time series.
[0151] The first fusion module 230 is used in step 3 to fuse the features of different modalities extracted in steps 1 and 2 to obtain a multimodal feature vector.
[0152] The first classification module 240 is used in step 4 to input the multimodal feature vector from step 3 into the first classifier to obtain the corresponding output;
[0153] The third extraction module 250 is used in step 5 to input the features from step 1 into the single-modality feature extractor and obtain the corresponding output.
[0154] The second fusion module 260 is used in step 6 to fuse the outputs of steps 4 and 5 to obtain fused information results, and to perform calculations on the fused information results to obtain the final prediction result.
[0155] In the above implementation process, this invention proposes a robust multimodal network operation and maintenance fault detection system. The system uses a first extraction module 210 to input log files into a corresponding single-modal model and extract feature representations from the log files. A second extraction module 220 inputs the time series of KPI indicators into the corresponding single-modal model and extracts feature representations from the KPI time series. A first fusion module 230 fuses the features from different modalities extracted by the first and second extraction modules to obtain a multimodal feature vector. A first classification module 240 inputs the multimodal feature vector from the first fusion module into a first classifier to obtain the corresponding output. A third extraction module 250 inputs the features from the first extraction module into a single-modal feature extractor to obtain the corresponding output. A second fusion module 260 fuses the outputs from the first classification module and the third extraction module to obtain a fused information result. The fused information result is then processed to obtain the final prediction result.
[0156] Specifically, firstly, to fully utilize the massive amounts of data in the network operations and maintenance (O&M) field, a first fusion module is introduced. This module fuses the feature representations of log files extracted by the first extraction module and the feature representations of KPI time series extracted by the second extraction module using top-down attention weights, extracting data features from a finer-grained perspective and obtaining multimodal feature vectors. Next, because the signal-to-noise ratio of O&M data is very low, with a large amount of data being normal and only a small portion being abnormal, deep learning models may be affected by data bias during training. For example, for fault types where KPIs are abnormal but log files are normal, since most normal log files correspond to normal results, the model may strongly correlate normal system status with normal log files during training, ignoring the contribution of another machine KPI performance indicator. This invention introduces a third extraction module and a second fusion module, training a module that only inputs log file data to reduce this bias in the model.
[0157] Furthermore, during the model training phase, the cross-entropy loss function is typically used to optimize model parameters. When the model's final predicted probability for a label is a numerically small number p relative to the probability 1, a large loss occurs due to the characteristics of the log function. However, in reality, the fault set is usually large, i.e., the number of labels n is large (e.g., n = 100). When p is relatively small compared to 1 (e.g., p = 0.4), p is a relatively large number relative to the sufficiently small 1 / n = 0.01. In this case, a loss of -log0.4 ≈ 0.40 still occurs. If the gradient is updated significantly based on the loss function, the model is forced to learn a certain mapping relationship between anomalies and single-modal data, thus making the model more capable of remembering certain biases. Therefore, to ensure the robustness of the system, this invention introduces a relaxed optimization strategy. By introducing a relaxed control factor γ, the loss function value can be reduced, allowing the model to achieve better performance on data with various distributions.
[0158] The robust multimodal network operation and maintenance fault detection method provided by this invention makes full use of multimodal data in network operation and maintenance, greatly enhancing the reliability and accuracy of fault prediction. It adopts debiasing processing to solve the problem that the low signal-to-noise ratio of data in network operation and maintenance leads to the model learning biased data. After introducing debiasing processing, a relaxed optimization strategy is added. By introducing a relaxed control factor, the effectiveness of bias optimization is further guaranteed, so that the system can maintain high accuracy when predicting faults on data with different distributions, greatly increasing the robustness of the system.
[0159] Please see Figure 5 , Figure 5 This is a schematic structural block diagram of an electronic device provided in an embodiment of the present invention. The electronic device includes a memory 101, a processor 102, and a communication interface 103. The memory 101, processor 102, and communication interface 103 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines. The memory 101 can be used to store software programs and modules, such as the program instructions / modules corresponding to the class incremental learning system based on the dynamic class prototype generation mechanism provided in this embodiment of the present invention. The processor 102 executes the software programs and modules stored in the memory 101 to perform various functional applications and data processing. The communication interface 103 can be used to communicate with other node devices for signaling or data.
[0160] The memory 101 may be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.
[0161] The processor 102 can be an integrated circuit chip with signal processing capabilities. The processor 102 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0162] Understandable. Figure 5 The structure shown is for illustrative purposes only; the electronic device may also include components that are more advanced than those shown. Figure 5 The more or fewer components shown, or having the same Figure 5 The different configurations shown. Figure 5 The components shown can be implemented using hardware, software, or a combination thereof.
[0163] In the embodiments provided by this invention, it should be understood that the disclosed systems and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative; for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0164] In addition, the functional modules in the various embodiments of the present invention can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0165] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, essentially, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0166] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
[0167] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.
Claims
1. A robust multi-modal network operation and maintenance fault detection method, characterized in that, Includes the following steps: Step 1: Input the log file into the corresponding unimodal model and extract the feature representation of the log file; Step 2: Input the time series of KPI indicators into the corresponding single-modal model and extract the feature representation of the KPI time series; Step 3: Integrate the feature representations of different modalities extracted in Step 1 and Step 2 to obtain a multimodal feature vector; Step 4: Input the multimodal feature vector from Step 3 into the first classifier to obtain the corresponding output; Step 5: Input the features from Step 1 into the single-modality feature extractor to obtain the corresponding output; Step 6: The outputs of Step 4 and Step 5 are fused to obtain the fused information result. The fused information result is then processed to obtain the final prediction result. The process in Step 6 does not perform backpropagation to prevent the encoder from directly learning the bias in the data. In step 3, the information fusion of the features from different modalities extracted in steps 1 and 2 to obtain a multimodal feature vector includes: The feature representations of the log file and the KPI time series are respectively input into a linear layer and then added together. The result of the addition is then input into an activation function to obtain the corresponding representation. The corresponding representation is subjected to a linear layer to obtain attention weights, using Normalization to obtain the final attention weights; The attention weights are weighted to obtain a fused representation of the feature representations of multiple KPI time series. The fused representation and the feature representation of the log file are processed using Hadamard product to obtain a multimodal feature vector combined with a top-down attention mechanism; The loss function of the robust multimodal network is calculated through the following steps: inputting the output of step 5 into a second classifier to perform tag classification, obtaining a single-modal loss function by calculating cross-entropy loss of the tag classification and a true tag; Calculate the cross-entropy loss between the final prediction result and the actual result in step 6 to obtain the multimodal loss function; A relaxed control factor is introduced to optimize the multimodal loss function, resulting in the final loss function. where the final loss function is defined as follows: , wherein the single-modal loss function is defined as follows: , wherein the multi-modal loss function is defined as follows: , In the formula, Indicates a relaxed control factor. This indicates the actual result. The true label represents the actual result. For the final prediction result, It is the sigmoid activation function. For the single-mode condition in step 5 3D feature vectors This represents the second classifier; During training, the loss function calculated for the output of the first classifier in each round is subjected to relaxation optimization, and the relaxation control factor is adjusted as follows: , In the formula, This represents the loss function value output and calculated by the current first classifier. This represents the loss function value output and calculated by the first classifier in the previous round.
2. The detection method according to claim 1, characterized in that, The multimodal feature vector The definition is as follows: , In the formula, for A fusion representation of the features of a group of KPI time series. , Let ⊙ represent the dimension, and ⊙ represent the Hadamard product. A characteristic representation of log files; in, The definition is as follows: , In the formula, Indicates the first Group KPI performance data, For the first The final attention weight corresponding to the group's KPI performance data. , This is a feature representation of the KPI time series. For the first Group KPI performance data Group, , These are the attention weights obtained after passing through a linear layer. , For linear layers, The feature representations of the log files and the feature representations of the KPI time series are respectively input into a linear layer, then summed, and the sum is input into an activation function. The corresponding representation obtained is In Represents a linear layer transpose, , For the characteristic representation of the log file, combine any one The feature representation obtained from the KPI time series of the group. This is a characteristic representation of log files. Represented as the first The feature representation of each KPI time series corresponds to the input. As a result, In express transpose, and It is a linear layer.
3. The detection method according to claim 1, characterized in that, In step 4, the output is a non-bias-reduced classification result. The definition is as follows: , In the formula, For a fully connected network, It is a multimodal feature; In step 5, the output is in single-mode. 3D feature vector The definition is as follows: , In the formula, For a fully connected network, This is a characteristic representation of the log file.
4. The detection method according to claim 1, characterized in that, In step 6, the outputs of steps 4 and 5 are fused to obtain a fused information result. The fused information result is then processed to obtain the final prediction result, including: The output of step 5 is input into the sigmoid activation function, and then the inner product is performed with the output of step 4) to obtain the fused information result; The fusion information results are in Perform in 3D space The calculations are performed to obtain the final prediction result; The final prediction result The definition is as follows: , In the formula, The non-bias-reducing classification result in step 4. For the single-mode condition in step 5 3D feature vectors It is the sigmoid activation function.
5. A robust multimodal network operation and maintenance fault detection system, characterized in that, include: The first extraction module is used in step 1 to input the log file into the corresponding single-modal model and extract the feature representation of the log file. The second extraction module is used in step 2 to input the time series of KPI indicators into the corresponding single-modal model and extract the feature representation of the KPI time series. The first fusion module is used in step 3 to fuse the features of different modalities in steps 1 and 2 to obtain a multimodal feature vector. The first classification module is used in step 4 to input the multimodal feature vector from step 3 into the first classifier and obtain the corresponding output. The third extraction module is used in step 5 to input the features from step 1 into the single-modality feature extractor and obtain the corresponding output. The second fusion module is used in step 6 to fuse the outputs of steps 4 and 5 to obtain a fused information result. The fused information result is then processed to obtain the final prediction result. The corresponding process in step 6 does not perform backpropagation to prevent the encoder from directly learning the deviation of the data. The first fusion module is specifically used for: The feature representations of the log file and the KPI time series are respectively input into a linear layer and then added together. The result of the addition is then input into an activation function to obtain the corresponding representation. The corresponding representation is passed through a linear layer to obtain attention weights, which are then used... Normalization yields the final attention weights; The attention weights are weighted to obtain a fused representation of the feature representations of multiple KPI time series. The fused representation and the feature representation of the log file are processed using Hadamard product to obtain a multimodal feature vector combined with a top-down attention mechanism; The loss function of the robust multimodal network is calculated by the computation module. The calculation module is used to input the output of step 5 into the second classifier for processing. For label classification, a single-modal loss function is obtained by calculating the cross-entropy loss between the label classification and the true label; a multi-modal loss function is obtained by calculating the cross-entropy loss between the final prediction result and the true result in step 6; a relaxed control factor is introduced to optimize the multi-modal loss function to obtain the final loss function. Among them, the final loss function The definition is as follows: , Among them, the single-mode loss function The definition is as follows: , Among them, the multimodal loss function The definition is as follows: , In the formula, Indicates a relaxed control factor. This indicates the actual result. The true label represents the actual result. For the final prediction result, It is the sigmoid activation function. For the single-mode condition in step 5 3D feature vectors This represents the second classifier; The adjustment module is used to perform relaxation optimization on the loss function calculated for the output of the first classifier in each round during training, and to adjust the relaxation control factor as follows: , In the formula, This represents the loss function value output and calculated by the current first classifier. This represents the loss function value output and calculated by the first classifier in the previous round.
6. An electronic device, characterized in that, include: Memory, used to store one or more programs; processor; When the processor executes the one or more programs, it implements the robust multimodal network operation and maintenance fault detection method as described in any one of claims 1-4.
7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the robust multimodal network operation and maintenance fault detection method as described in any one of claims 1-4.