An extreme few-shot fault diagnosis method for industrial scenarios

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a self-adversarial fault diagnosis method based on OCSVM, autoencoders, and twin networks, the challenge of fault diagnosis in industrial scenarios with extremely few samples is solved. This method achieves rapid, low-cost fault diagnosis and strong generalization, making it applicable to various environmental conditions and avoiding economic losses and safety risks caused by equipment failures.

CN117349595BActive Publication Date: 2026-06-19SHANGHAI JIAOTONG UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANGHAI JIAOTONG UNIV
Filing Date: 2023-10-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Oct 2023

Application

19 Jun 2026

Publication

CN117349595B

IPC: G06F18/10; G06F18/2411; G06F18/214; G06F18/22; G06F17/12; G06N3/0455; G06N3/094

AI Tagging

Application Domain

Biological models Complex mathematical operations

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A power distribution network voltage support evaluation method, system, device and medium based on generalized regulation resources
CN122225477ABiological models Ac network voltage adjustment
System(s) and method(s) for generative model processing of image data including object(s) having particular feature(s) and / or classification(s)
WO2026122857A1Biological models
Knowledge graph construction method and device, equipment and storage medium
CN119149753BImprove timing analysisImproving performance in directional reasoningBiological models Knowledge representation
QA system and method
US20260162247A1Programme control Image enhancement
Systems and methods for data collection in an industrial environment
US20260161153A1Machine part testing Receivers monitoring

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies are difficult to effectively solve the problem of fault diagnosis with extremely few samples in industrial scenarios, especially in the cold start phase of newly established industrial production lines or equipment put into operation. They cannot quickly build diagnostic models, and existing methods have problems such as poor model generalization, high sample collection costs, and lack of rapid fault screening mechanisms.

Method used

We employ an OCSVM-based fault screening mechanism, combined with a self-adversarial fault diagnosis method using autoencoders and Siamese networks. Through data preprocessing, data calibration, and neural network training, we form a closed loop to achieve automatic sample correction and strong generalization diagnosis.

Benefits of technology

In cases with extremely limited samples, this method can quickly establish fault diagnosis models applicable to various environmental conditions, reduce sample collection costs, improve diagnostic accuracy and model generalization ability, reduce computational resource consumption, and avoid economic losses and safety accidents caused by equipment failures.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117349595B_ABST

Patent Text Reader

Abstract

This invention discloses an extreme few-sample fault diagnosis method for industrial scenarios, relating to the field of industrial systems. The method first establishes a fault screening mechanism based on one-classification support vector machine (OCSVM), capable of quickly distinguishing faulty samples from normal samples. Then, a self-adversarial fault diagnosis method is established based on autoencoders and Siamese networks, tightly coupling data preprocessing, data calibration, and neural network model training to form a closed loop with a feedback mechanism. This fully mines information from existing samples, achieves automatic sample correction, and establishes an extreme few-sample fault diagnosis model applicable to various environmental conditions. This invention solves the problems of high cost and difficulty in collecting samples in industrial settings, as well as poor model generalization, and is of great significance in avoiding economic losses and major safety accidents caused by untimely detection of equipment faults.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of industrial systems, and more particularly to an extreme few-sample fault diagnosis method for industrial scenarios. Background Technology

[0002] With the continuous development of technology, the complexity and cost of industrial systems are also increasing, which means that industrial systems require higher reliability and safety. At the same time, health management and fault diagnosis of industrial systems to avoid dangers and losses are becoming increasingly important. In recent years, intelligent fault diagnosis technology has made great progress and applications. Intelligent fault diagnosis technology is mainly divided into traditional machine learning methods and deep learning methods. Traditional machine learning methods usually require certain background knowledge to manually select features. Moreover, this method is mainly applicable to scenarios with small differences in data distribution and cannot be applied to scenarios with increased data volume and high requirements for model generalization ability. Therefore, deep learning, with its greater intelligence and generalization capabilities, is becoming increasingly popular in industrial fault diagnosis.

[0003] Compared to traditional machine learning methods, deep learning offers the advantage of automatic feature extraction, which is crucial for early detection of anomalies and identification of actual faults. However, existing models typically fall under the category of supervised learning, requiring a large amount of training data to ensure model effectiveness. In reality, in industrial scenarios, machines mostly operate under normal conditions, and the data collected by massive sensors is largely composed of normal samples. Problems such as low failure probability and high cost of fault data collection often result in insufficient fault data for training deep learning models, hindering existing data-driven deep learning-based fault diagnosis technologies due to the scarcity of historical fault data. Furthermore, only a small portion of monitoring data and corresponding states are known. Most data requires manual labeling, increasing labor costs. More importantly, when sudden catastrophic failures occur, the system is immediately shut down for maintenance. Therefore, it is difficult to collect enough fault data to train deep networks, and data scarcity often leads to overfitting, hindering the training of high-precision diagnostic models. Thus, how to solve fault diagnosis problems with limited data has become a critical issue urgently needing to be addressed in modern industrial systems.

[0004] Existing technologies for few-shot fault diagnosis in industry are mostly based on unsupervised clustering, transfer learning, generative adversarial networks (GANs), and contrastive learning. The main problem with unsupervised clustering is its low model accuracy. Transfer learning requires a large-scale dataset for pre-training, and parameter tuning during domain transfer is highly dependent on experience, making it difficult to implement. GANs, as an effective data augmentation method, can expand the sample size based on known samples; however, current technologies simply use them as a data preprocessing method before model training, without coupling them with subsequent fault diagnosis model training, thus limiting the model's generalization and accuracy. Contrastive learning trains network parameters by constructing a large number of sample pairs, transforming the original fault classification problem into judging the similarity between samples to address the few-shot problem. However, the performance of contrastive learning is greatly limited by the richness of the initial sample data, meaning that this method has high requirements for the quality of the initial data. Existing technologies have not proposed corresponding solutions, often resulting in poor generalization performance in real-world industrial scenarios. In addition, some existing technologies combine the above methods, such as combining clustering methods with contrastive learning, or combining GANs with contrastive learning.

[0005] However, several significant problems remain to be addressed in existing technologies. Firstly, even considering the limited number of fault samples, existing technologies still generate a considerable number of samples for various fault types, failing to address the issue of diagnosing faults with few samples under extreme conditions. To address this, this invention further considers the cold start problem of newly established industrial production lines or operational equipment. Since sufficient time cannot be waited to collect data on faults related to operational safety, and the data collected for certain specific faults is extremely limited, the method of this invention allows for data with only a single sample. Secondly, existing technologies do not consider how to enhance model generalization. For example, a technology most similar to this invention also employs autoencoders, generative adversarial networks (GANs), and contrastive learning. However, this technology merely uses GANs to expand the sample pool, variational autoencoders to further filter low-quality data, and contrastive learning to construct a sample pair prediction network. In this solution, GANs, autoencoders, and contrastive learning are completely decoupled, meaning they fail to form an effective closed-loop feedback loop and are executed sequentially. Models built in this way typically exhibit poor generalization, meaning they are only effective for samples similar to the initial data. Once the operating conditions and environment of industrial production change, the model's accuracy rapidly decreases. Furthermore, existing technologies do not consider the selection of training sample types in contrastive learning. However, inappropriate sample selection in real-world scenarios can lead to reduced training difficulty and accuracy, while also increasing sample collection costs. Additionally, existing technologies lack a rapid and effective fault screening mechanism. Most existing few-sample techniques in industrial scenarios determine the fault type by comparing the sample to be tested with multiple known samples individually. However, in most cases, field data is normal, and comparing them one by one consumes unnecessary computational resources and time.

[0006] Therefore, based on the above analysis, for fault diagnosis in industrial scenarios with few samples, especially in cold-start scenarios with extremely few samples, there is an urgent need for an effective neural network model training framework. This framework should be able to quickly build a diagnostic model when there are only a single sample of the fault type, and possess rapid fault screening and training sample selection mechanisms to significantly reduce sample collection costs. Simultaneously, it is also necessary to ensure that the model has strong generalization ability to achieve applications under complex working conditions and realize high-precision fault diagnosis.

[0007] Therefore, those skilled in the art are dedicated to developing a self-adversarial fault diagnosis method based on autoencoders and twin networks. This method can establish a highly generalizable industrial fault diagnosis model even with extremely few samples, applicable to various environmental conditions. It is of great significance for avoiding economic losses and major safety accidents caused by untimely detection of equipment faults. Summary of the Invention

[0008] In view of the above-mentioned deficiencies of the prior art, the technical problem to be solved by the present invention is how to ensure the generalization performance of the extremely few-sample model.

[0009] To achieve the above objectives, this invention provides an extreme few-sample fault diagnosis method for industrial scenarios. This method first establishes a fault screening mechanism based on OCSVM, which can quickly distinguish between faulty and normal samples. Then, a self-adversarial fault diagnosis method is established based on autoencoders and Siamese networks, which tightly couples data preprocessing, data calibration, and neural network model training to form a closed loop with a feedback mechanism. This fully mines the information in existing samples, realizes automatic sample correction, and establishes an extreme few-sample fault diagnosis model applicable to various environmental conditions.

[0010] Furthermore, the method includes the following steps:

[0011] Step 1: Data preprocessing;

[0012] Step 2: Establish a fault screening mechanism based on OCSVM;

[0013] Step 3: Establish a training sample type selection mechanism;

[0014] Step 4: Establish a self-adversarial fault diagnosis model based on autoencoders and twin networks;

[0015] Step 5: Method accuracy assessment;

[0016] Step 1 further includes:

[0017] Step 1.1: Data Reconstruction. In order to train the neural network model, it is necessary to ensure that the dimension of the input data meets the dimension required by the network input. Therefore, it is necessary to adjust the dimension of the cleaned original data based on resampling, bilinear interpolation and other methods. At the same time, redundant information is removed without losing fault features, thereby saving computing resources and improving the accuracy of the model.

[0018] Step 1.2, Dataset Partitioning: This method involves three parts related to model training, including establishing a fault screening mechanism based on OCSVM, implementing automatic sample correction based on an autoencoder, and training a few-shot classification model based on Siamese networks. In the fault screening mechanism based on OCSVM, normal samples can be obtained in large quantities with almost no collection cost. Since the accuracy of OCSVM is positively correlated with sample richness, all normal sample data is directly used for unsupervised training of OCSVM. In the automatic sample correction based on an autoencoder, the normal samples are first randomly shuffled, and then the training and test sets are divided in an 8:2 ratio, meaning 80% of the data is used for training. The remaining 20% of the data is used to test whether the model has sample correction capabilities. The purpose of randomly shuffling the samples is to increase the range of working conditions covered by the training samples as much as possible, so that the model based on the autoencoder has strong generalization ability. In the training of the few-shot classification model based on Siamese network, after determining the optimal sample training type, the samples are first downsampled to ensure that the number of samples of the two types is the same. Then, the samples of the two types are shuffled respectively, and then the training set and test set are divided into training set and test set respectively according to an 8:2 ratio. Then, the two types of training sets are matched with sample pairs to obtain the final training set, and the two types of test sets are matched with sample pairs to obtain the final test set.

[0019] Step 4 also includes:

[0020] Step 4.1: Build an autoencoder (AE);

[0021] Step 4.2: Construction of the Siamese network training set in extremely low-sample scenarios;

[0022] Step 4.3: Construction of the twin network;

[0023] Step 4.4: Real-time fault diagnosis of industrial field samples.

[0024] Furthermore, in step 2, OCSVM is an unsupervised learning method. Unlike standard SVM, which constructs a hyperplane and separates the two classes with maximum margin, OCSVM constructs a descriptive surface that encloses the data distribution region. Therefore, the goal of trained OCSVM is to include most of the sampling points within the closed surface, while leaving a small portion of the data points outside the closed surface; given a set of normal samples {x1, x2, ..., x...} n OCSVM first maps the data to a high-dimensional feature space H using a nonlinear mapping, so as to separate the data from the origin by solving the objective function, which is as follows:

[0025]

[0026] st(w·φ(x i))≥ρ-ε i andε i ≥0

[0027] Where w and ρ are hyperplane parameters, φ(x) i ε is a mapping function that maps data to a high-dimensional space. i It is a slack variable, ε i ≥0; The introduction of slack variables can constrain the hyperplane, making the hyperplane have better classification performance. v∈(0,1) is the control parameter used to adjust the degree of relaxation, and l is the number of samples.

[0028] To solve for the parameters of the OCSVM model more efficiently, the Lagrange multiplier method is further used to obtain the Lagrange function corresponding to the objective function, which has the following form:

[0029]

[0030] Where, α i ,β i It is a Lagrange multiplier, α i ≥0, β i ≥0; Differentiating the Lagrange function with respect to w, ξ, and ρ respectively, and setting the differential expression to zero, we obtain the formula:

[0031]

[0032] Because of the inequality constraints in the formula, according to the Karush-Kuhn-Tucker (KKT) conditions, we can obtain:

[0033] α i ((ω·Φ(x i ))-ρ+ξ i ) = 0

[0034] β i ξ i =0

[0035] Furthermore, the following conclusions can be drawn:

[0036] (1) If there is α i ∈(0,1 / vl), the corresponding vector x i The following conditions must be met (ω·Φ(x) i ))-ρ+ξ i =0, that is, (ω·Φ(x) i ))=ρ, at this time, vector x i On the hyperplane, and vector x i These are boundary support vectors;

[0037] (2) If there is α i=1 / vl), the corresponding vector x i The following conditions must be met (ω·Φ(x) i ))<ρ. At this time, the corresponding x i The vector x is distributed in the region between the origin and the hyperplane in the high-dimensional feature space. i These are non-boundary support vectors;

[0038] (3) If there is α i =0, the corresponding vector x i The following conditions must be met (ω·Φ(x) i When x > ρ, the corresponding x i The distribution lies outside the hyperplane in the high-dimensional feature space, and the vector x i These are non-support vectors;

[0039] Based on conclusion (1), the hyperplane optimization problem can be rewritten in the following dual form:

[0040]

[0041] ste T α=1

[0042] 0≤α i ≤1 / vli=1,2,…,l

[0043] Where K represents the kernel function matrix, K i,j =k(x i ,x j ), α={α1,α2,…,α l According to conclusions (1) and (2), as long as the Lagrange multiplier α i >0, the corresponding vector x i The support vectors are the shape of the hyperplane, which depends only on the support vectors; here, the present invention defines the support vector set as S. SV The number of support vectors is l sv At this point, the hyperplane can be represented as:

[0044]

[0045] Here, f(x) is called the decision function. If all samples satisfy f(x) > 0, then the region enclosed by the hyperplane is considered the target sample. For any new sample data z, substitute the sample data z into the decision function; if... Then z is the target sample, if there is Then z is a non-target sample, which is an industrial fault sample point;

[0046] To reduce classification errors caused by an excessive number of feature vectors, this method selects a support vector as a representative pivot in each class and introduces a weighting mechanism to improve the accuracy of model classification. The distance from the unknown sample to the support vector pivot in the feature space is calculated. and Subtracting the two distance values yields the distance difference g(x), which is then compared with a set threshold ε. The optimal weighted classifier for classification prediction is selected, with the following weighting factors:

[0047] S K =diag(s1,s2,s3…,s n )

[0048]

[0049] The improved hyperplane decision function can be expressed as:

[0050]

[0051] Where t(x) is the improved decision function. If all samples satisfy t(x) > 0, then the region enclosed by the hyperplane is considered the target sample. For any new sample data z, substitute the sample data z into the decision function. If... Then z is the target sample, if there is Then z is a non-target sample, which is an industrial fault sample point;

[0052] Step 2 allows for real-time and rapid determination of the status of the latest samples collected at the industrial site. If the initial OCSVM screening result is normal, subsequent fault diagnosis procedures are unnecessary, effectively reducing the time required for industrial site sample diagnosis and saving computational resources.

[0053] Further, in step 3, the fault types used for training are selected by measuring the similarity between individual samples of various fault types at the initial time and normal samples under standard conditions. The similarity measurement task is based on three evaluation metrics: brightness, contrast, and structure. First, the brightness is compared, which is determined based on the average intensity (μ). The brightness comparison function l(x,y) is the mean of the two samples x and y. Second, the standard deviation is used as an estimate of the contrast. The comparison of the standard deviations (σ) of samples x and y is represented by the contrast comparison function c(x,y). Finally, the samples are divided by their own standard deviations so that the two compared samples have unit standard deviations. These normalized samples (x-μ) are then compared. x ) / σ x and (y-μ) y ) / σ yBy comparing the aforementioned structural indices, a structural comparison function s(x,y) is obtained;

[0054] The brightness comparison function l(x,y) is defined as follows:

[0055]

[0056] Where, μ x and μ y It is the average of samples x and y. When the mean and variance are close to zero, c1 and c2 are used to limit the range of the function value; the contrast comparison function c(x,y) is defined as:

[0057]

[0058] The structure comparison function s(x,y) is defined as follows:

[0059]

[0060] Where c3 = c2 / 2;

[0061] Combining the three equations yields the similarity index R(x,y) between samples x and y:

[0062] R(x,y)=[l(x,y)] a [c(x,y)] b [s(x,y)] c

[0063] Where a>0, b>0, and c>0 are parameters used to adjust the relative importance of the three components. To simplify the expression, in this invention, a = b = c = 1, and the similarity index R(x,y) between samples x and y is expressed as:

[0064]

[0065] Since the similarity metric index R(x,y) represents the similarity between two blocks, this invention further defines dR(x,y) as a distortion metric, and it is given by the following equation:

[0066]

[0067] Considering that the mean squared error (MSE) is used to calculate the difference between the estimated value and the true value of the estimated quantity, and that its corresponding value is the square of the sample difference, this invention further correlates the similarity measure index R(x,y) with MSE, thereby enabling a faster approximation of the similarity measure index R(x,y) and the distortion measure dR(x,y) using MSE:

[0068]

[0069]

[0070] Where c = (c1 + c2) / 2, its function is to limit the range of function values when the mean and variance are close to zero;

[0071] By establishing a training sample type selection mechanism, the individual samples of various fault types at the initial moment are compared with the normal samples under standard conditions in turn. The fault type with the lowest similarity, that is, the highest distortion, is selected. A small number of fault samples are then collected from it, counted as Ms, and Ms is much smaller than the number of initial normal samples N. This ensures that the cost of collecting fault samples is low and is used for subsequent training of the fault classification model.

[0072] Furthermore, in step 4, instead of introducing a GAN network, the autoencoder is deeply coupled with the Siamese network to form a self-adversarial relationship between the two. This enables automatic calibration of industrial field samples under various environmental conditions and establishes a fault diagnosis model under extremely limited sample conditions.

[0073] Further, in step 4.1, the AE includes three components: an encoder, a decoder, and a loss function; wherein the encoder f is responsible for converting the original sample x into an encoded sample z, and the decoder g is responsible for converting the encoded sample z into an output sample x′.

[0074]

[0075] Following the dataset partitioning method described in step 1, all given normal sample data are partitioned to obtain the training set D. The AE also has the following functions:

[0076] 1) Achieve automatic correction of industrial field samples; The encoder f in this invention needs to have the function of automatic sample correction, that is, for the original sample x of any type under any environmental conditions, the encoded sample z after being transformed by the encoder f is the sample of the corresponding type under standard environmental conditions. In this way, even if the dataset cannot cover all environmental conditions, the model still has strong generalization and effectively improves the accuracy of the model.

[0077] 2) A self-adversarial fault diagnosis network is constructed by coupling with the subsequently described Siamese network. Specifically, firstly, the corrected normal samples from step 1) are used to construct sample pairs, which serve as inputs to the subsequent Siamese network. The AE is equivalent to a generator network. The original sample x under any environmental condition, after being transformed by the encoder f, can be regenerated into a sample of the corresponding type under standard environmental conditions. That is, the process by which the encoder automatically corrects normal samples to standard conditions is essentially a new sample generation process. Simultaneously, after obtaining the corrected sample pairs from the encoder, the Siamese network uses a deep network to determine whether the two samples belong to the same type. This judgment process can be viewed as the AE (Advanced Image Processor) acting as a generative network attempting to "deceive" the Siamese network with newly generated samples. The Siamese network, in turn, functions as an adversarial network, used to evaluate the effectiveness of the encoder's automatic calibration of samples. Ideally, after the AE transforms samples of the same type into standard operating conditions, the two samples should be very similar. If the Siamese network, acting as an adversarial network, cannot distinguish between them, it indicates that the trained AE is highly effective and can successfully achieve automatic correction of industrial field samples, thus solving the model's generalization problem. However, if the Siamese network still considers the two normal samples to be of different types, it indicates that the AE has failed and further training is needed.

[0078] 3) Anti-cheating mechanism: The data of the same type of samples under standard conditions is fixed. The AE may "cheate" during the training process. "Cheating" means that the AE does not learn useful sample transformation parameters during training, and instead directly defaults to outputting the standard working condition regardless of the working condition of the input sample z. To avoid this situation, when designing the AE, the decoder g is responsible for converting the encoded sample z into the output sample x′. Since the output sample x′ needs to restore the original sample x as much as possible, and the original sample x has differences due to being collected under different environmental working conditions, the existence of the AE is equivalent to a constraint: for different original samples x, the encoded samples z after being converted by the encoder f are different even if they are similar. Otherwise, the decoder g will convert the encoded sample z to obtain the same output sample x′. That is, the AE has an anti-cheating mechanism, which can effectively ensure automatic sample correction and the construction of the self-adversarial network.

[0079] Therefore, a deep neural network is used to parameterize f and g, respectively, using the parameter θ. f and θ g The reconstruction error for a given dataset D is shown below, based on the parameters trained using the reconstruction error criterion, given the encoder function f: x→z and the decoder function g: z→x′.

[0080]

[0081] By defining g as a Gaussian distribution centered at g(f(x)), the above loss function can be simplified to:

[0082]

[0083] Given a well-trained AE, and It can be obtained To generate new data, we first obtain a low-dimensional representation of the original data x in the normal sample training set, i.e. Then, perform a dimensional transformation on the obtained z to get This ensures that the input dimensions of the subsequent Siamese network are met; finally, the decoder is used to generate new data. In order to generate new data, the AEs that need to be trained must meet the following conditions:

[0084]

[0085] Therefore, the present invention can guarantee This means that the anti-cheating mechanism of the AE mentioned above is implemented; during the complete training of the AE, a large amount of computing resources are required. In order to reduce overload, the present invention further uses the early stopping method to train the AE. and satisfy:

[0086]

[0087] Among them, c and d are hyperparameters to ensure that the model does not overfit. The autoencoder trained with normal samples can ensure automatic correction of data from various environmental conditions in industrial sites.

[0088] Furthermore, in step 4.2, for an anomaly detection problem in a given industrial scenario, it is assumed that two general datasets D are used. nor and D ano Used to indicate normal and abnormal samples respectively. Contains N nor One labeled normal sample, of which It is a data sample. These are the corresponding class tags, and similarly, Contains N ano 10 labeled abnormal samples, of which It is a data sample. These are the corresponding class labels, assuming N nor >>N ano Describe the extreme low-sample scenario studied by this method;

[0089] First, based on the training sample type selection mechanism established in step 3, the optimal fault type is determined from the initial fault samples. For the determined type, a small number of samples are further collected, counted as Ms, and Ms is much smaller than the number N of the initial normal samples on site. nor This ensures that the cost of collecting fault samples is low; then, the Ms fault samples of this type are input into the AE trained in step 4.1, and the coded samples output by the AE after dimensional transformation are used to construct the anomaly sample dataset D. ano The count is Ms; then, the initial normal samples are input into the AE trained in step 4.1, and the dimensionality-transformed encoded samples output by the AE are used as normal samples, counted as N. nor To ensure the balance of positive and negative samples during the subsequent training of the Siamese network model, the obtained N... nor We downsample the normal samples so that the number of normal samples after sampling is also Ms, thus obtaining the normal sample dataset D. nor ;

[0090] In order to base on the abnormal sample dataset D ano and normal sample dataset D nor The training set contains sample pairs required for the subsequent training of the Siamese network. A sample pair generation method is designed. First, a positive sample is defined as a sample pair in which the two samples are of the same class, and a negative sample is a sample pair in which the two samples belong to different classes. Then, each time, one sample is randomly selected from the training set, and another sample is randomly selected from the same class. This is the positive sample pair, and the label of the positive sample pair is set to 1, indicating complete similarity. Similarly, each time, one sample is randomly selected from the training set, and another sample is randomly selected from other classes. This is the negative sample pair, and the label of the negative sample pair is set to 0, indicating a similarity of 0. Finally, the number of positive sample pairs and negative sample pairs constructed from the training set are compared. The sample pair type with fewer numbers is used as the benchmark, and the other type of sample pair is downsampled to ensure that the number of positive sample pairs and negative sample pairs in the final training set is the same, thereby ensuring the balance of positive and negative samples during the subsequent training of the Siamese network.

[0091] Furthermore, in step 4.3, unlike traditional classification models that input a single sample each time, the constructed Siamese network does not directly predict the category of the input sample data. Instead, it inputs two samples simultaneously each time and calculates the distance between the input samples based on its optimized feature representation. By judging the similarity between the two samples, it achieves fault classification in the case of extremely small sample sizes. The Siamese network mainly consists of two identical model branch sub-networks, which are mainly used to train the feature information of the input sample pairs. Then, the similarity is calculated based on the comparison of the feature information. The Siamese network has two inputs, X1 and X2, which use the same neural network W, i.e., weight sharing. The Siamese network is built based on CNN.

[0092] The twin neural network maps two inputs to a new space to form the input, which can be represented as G. W (X1) and G W (X2), then, in order to measure the similarity between two samples in a sample pair in the target space, a simple Euclidean distance d is used to measure the similarity between the two samples:

[0093]

[0094] Where, N sum x represents the number of all sample pairs in the training set. 1i and x 2i Let X1 and X2 represent the samples in the i-th sample pair, respectively.

[0095] When training the Siamese network, the training parameters are two input samples, namely a sample pair and a label, where the label value is given by the type of the input sample pair; when the inputs are of the same category, the label is 0, otherwise it is 1; in the Siamese neural network, contrastive loss is chosen as the loss function, and its calculation is based on the principle that the feature distance between a pair of similar objects should be small, and the feature distance between a pair of dissimilar objects should be large.

[0096]

[0097] Here, x1 and x2 represent two input samples, y is the label of the input sample pair, and the margin is the threshold for determining whether the two input samples belong to the same class. According to the definition of comparison loss, when the two samples belong to the same class, the parameters are adjusted to minimize the Euclidean distance between x1 and x2; if the two samples do not belong to the same class and the Euclidean distance between x1 and x2 is greater than the set margin, the loss is set to 0, that is, no optimization is performed; otherwise, the distance between the sample pairs will increase to the set margin. Therefore, using comparison loss can effectively achieve the purpose of Siamese networks to distinguish samples.

[0098] The Siamese network is tightly coupled with the AE throughout the model training process. On the one hand, the data used to construct sample pairs in step 4.2 and the data used to train the Siamese network in step 4.3 are both obtained by automatically correcting the AE samples in step 4.1. On the other hand, when the corrected sample pairs are input into the Siamese network, the Siamese network judges the correction result of the AE and feeds back the correction effect to the AE in step 4.1. Until through multiple rounds of joint iterative optimization, the AE, as a generator, generates new samples with a high similarity to the samples of the standard working conditions of the actual type. At the same time, the Siamese network, as a discriminator, also has a strong sample pair verification capability. Thus, the AE and the Siamese network jointly complete the establishment of the fault diagnosis model under the extreme few-sample situation based on self-adversarial interaction.

[0099] 9. The method as described in claim 8, characterized in that, in step 4.4, in addition to the training set constructed in step 4.2, the definitions of the support set S and query set Q under the industrial field fault diagnosis scenario are further given; wherein, the support set is composed of normal samples under standard conditions and unique single samples of each fault type at the initial time, S = {(x i ,y i |i = 1, 2, ..., N s}, where N s Let represent the total number of possible fault types in the industrial site, i.e., all fault types, where ... Where j represents the j-th fault sample to be detected, N q This indicates the total number of fault samples to be tested in the industrial field;

[0100] First, based on step 2, a rapid initial screening of the samples to be tested is achieved, and the samples identified as faulty are further constructed into the query set Q described above. Then, for each sample in the query set Q, it is sequentially paired with a single sample in the support set S to construct a test sample pair. Next, based on the twin network fault diagnosis model established in step S3, a result can be obtained from N... s The vector is composed of several similarity probability values. Further, the support set type corresponding to the term with the largest similarity probability value is taken as the fault type of the sample to be detected, thereby realizing real-time fault diagnosis of samples in industrial field.

[0101] Furthermore, in step 5, based on whether the type of the sample to be tested in the industrial field is correctly identified, three metrics—precision, recall, and F1 score—are used to verify the performance of the above method on a fully labeled public dataset. Precision refers to the prediction result, meaning the probability that a sample is actually positive among all samples predicted as positive, expressed as:

[0102]

[0103] Where TP represents the number of samples that were predicted to be positive but were actually positive, and FP represents the number of samples that were predicted to be positive but were actually negative.

[0104] The term "recall" refers to the original sample and means the probability of being predicted as a positive sample among samples that are actually positive. The expression is:

[0105]

[0106] Where TP represents the number of samples that were predicted to be positive but were actually positive, and FP represents the number of samples that were predicted to be negative but were actually positive.

[0107] The F1 score can be viewed as the harmonic average of precision and recall:

[0108]

[0109] By evaluating the accuracy of the samples to be tested, we can adjust the training parameters of the Siamese network and the sample type selection mechanism.

[0110] The present invention has the following technical effects:

[0111] (1) It can effectively solve the problem that traditional machine learning methods cannot drive modeling when there are extremely few samples in the industrial field, greatly reduce the cost of sample collection, and realize the automatic correction of industrial sample data to ensure that the final model is applicable to various environmental conditions.

[0112] (2) It effectively reduces the training difficulty of the model and improves the accuracy, while also significantly reducing the cost of sample collection;

[0113] (3) By establishing a fault screening mechanism based on one-class support vector machine (OCSVM), it is possible to quickly determine whether the latest samples collected in the industrial field are in a normal state in real time. If the OCSVM screening result is normal, there is no need to go through the subsequent fault diagnosis process, which can effectively reduce the time for industrial field sample diagnosis and save computing resources.

[0114] The following will further explain the concept, specific structure, and technical effects of the present invention in conjunction with the accompanying drawings, so as to fully understand the purpose, features, and effects of the present invention. Attached Figure Description

[0115] Figure 1 This is a general flowchart of a preferred embodiment of the present invention;

[0116] Figure 2 This is a diagram of a twin network structure according to a preferred embodiment of the present invention;

[0117] Figure 3 This is a flowchart of an extreme few-sample fault diagnosis method according to a preferred embodiment of the present invention. Detailed Implementation

[0118] The following description, with reference to the accompanying drawings, illustrates several preferred embodiments of the present invention to make its technical content clearer and easier to understand. The present invention can be embodied in many different forms, and the scope of protection of the present invention is not limited to the embodiments mentioned herein.

[0119] In the accompanying drawings, components with the same structure are indicated by the same numerical designation, and components with similar structures or functions are indicated by similar numerical designations. The dimensions and thicknesses of each component shown in the drawings are arbitrary, and the present invention does not limit the dimensions and thicknesses of each component. To make the illustrations clearer, the thickness of some components has been appropriately exaggerated in the drawings.

[0120] This invention relates to the industrial field, and more particularly to a fault diagnosis method in industrial operation and maintenance. The main objective is to design a novel method for generating extreme few-shot fault diagnosis models. First, this method establishes a fault screening mechanism based on one-class support vector machine (OCSVM), capable of quickly distinguishing between faulty and normal samples. Then, a self-adversarial fault diagnosis method is established based on autoencoders and Siamese networks, tightly coupling data preprocessing, data calibration, and neural network model training to form a closed loop with a feedback mechanism. This fully leverages the information in existing samples, enabling automatic sample correction and establishing extreme few-shot fault diagnosis models applicable to various environmental conditions. This addresses the problems of high cost and difficulty in collecting samples in industrial settings, as well as poor model generalization, and is of great significance in preventing economic losses and major safety accidents caused by untimely equipment fault detection. The extreme few-shot fault diagnosis model generation method proposed in this invention includes the following steps:

[0121] Step 1: Data Preprocessing

[0122] The extreme few-shot fault diagnosis model generation method studied in this invention is based on a data-driven neural network. Therefore, it is necessary to first clean the collected industrial field data, remove outliers, and imput missing values. Then, to ensure that the data can be effectively used for training the neural network model, this invention further preprocesses the cleaned data, mainly including the following steps:

[0123] S1, Data Reconstruction: To train a neural network model, it's necessary to ensure the input data's dimensions match the network's requirements. Therefore, the dimensions of the cleaned original data need to be adjusted using methods such as resampling and bilinear interpolation. Simultaneously, redundant information is removed with minimal loss of fault characteristics, thus saving computational resources and improving model accuracy.

[0124] S2, Dataset Partitioning. This invention involves three steps in model training: 1) Establishing a fault screening mechanism based on One-Class Support Vector Machine (OCSVM); 2) Implementing automatic sample correction based on an autoencoder; 3) Training a few-shot classification model based on a Siamese network. For step 1), considering that normal samples are readily available with almost no acquisition cost, and that the accuracy of OCSVM is positively correlated with sample richness, a test set is unnecessary; all normal sample data is directly used for unsupervised OCSVM training. For step 2), since this invention considers an extremely few-shot scenario, only normal samples are abundant, while other faults initially only have a single sample. Therefore, the autoencoder can only be trained based on normal samples. To this end, the normal samples are first randomly shuffled, and then the training and test sets are divided in an 8:2 ratio, meaning 80% of the data is used for training, and the remaining 20% is used to test whether the model possesses sample correction capabilities. Randomly shuffling the samples aims to maximize the coverage of different scenarios in the training samples, thereby enabling the autoencoder-based model to have strong generalization ability. For step 3), after determining the optimal training type, the samples are first downsampled to ensure that the number of samples in both types is equal. This is done to ensure sample balance and avoid overfitting and underfitting of the model. Next, the samples of the two types are shuffled, and then the training and test sets are divided into training and test sets respectively in an 8:2 ratio. Then, the training sets of the two types are matched with sample pairs to obtain the final training set, and the test sets of the two types are matched with sample pairs to obtain the final test set. The specific sample pair matching rules will be explained in detail in subsequent steps.

[0125] Step 2: Establish a fault screening mechanism based on one-class support vector machine (OCSVM)

[0126] Support Vector Machines (SVMs) are machine learning models based on statistical learning theory and the principle of structural risk minimization. They are primarily used for data classification and regression estimation, offering advantages such as strong small-sample learning and generalization capabilities, controllable confidence intervals, and high convergence rates. OCSVM, an important branch of SVM that requires only one class of data for training, is well-suited for two-class classification where one class of data is unknown or difficult to classify, such as anomaly detection. Its main idea is to transform a single-class classification problem into a special type of two-class classification problem. It uses kernel functions to map objects in the input space to a high-dimensional space, treating the origin of the coordinate system as an anomaly sample. Then, it seeks the optimal hyperplane in this high-dimensional space that best separates the transformation point from the origin of the objects in the input space. In this invention, considering the extremely limited sample size—where only normal samples are abundant initially, and other fault types have only a single sample—it is impossible to establish a discriminative model based on fully supervised learning training, and the accuracy of clustering models is difficult to guarantee. Therefore, OCSVM is used to establish a rapid initial screening mechanism for faults.

[0127] Specifically, OCSVM is an unsupervised learning method. Unlike standard SVM, which constructs a hyperplane and separates the two classes with maximum margin, OCSVM constructs a descriptive surface that encloses the data distribution region. Therefore, the goal of trained OCSVM is to include most of the sampled points within the closed surface, while leaving a small subset of data points outside the closed surface. Given a set of normal samples {x1, x2, ..., x...} n OCSVM first maps the data to a high-dimensional feature space H using a nonlinear mapping, so as to separate the data from the origin by solving the objective function, which is as follows:

[0128]

[0129] st(w·φ(x i ))≥ρ-ε i andε i ≥0

[0130] Where w and ρ are hyperplane parameters, φ(x) i ε is a mapping function that maps data to a high-dimensional space. i It is a slack variable, ε i ≥0. The introduction of slack variables can constrain the hyperplane, making the hyperplane have better classification performance. v∈(0,1) is a control parameter used to adjust the degree of relaxation, and l is the number of samples.

[0131] To solve for the parameters of the OCSVM model more efficiently, the Lagrange multiplier method is further used to obtain the Lagrange function corresponding to the objective function, which has the following form:

[0132]

[0133] where α i , β i are Lagrange multipliers, α i ≥ 0, β i ≥ 0. The Lagrangian function is differentiated with respect to w, ξ, ρ respectively, and the differential expressions are set to zero to obtain the formula:

[0134]

[0135] Due to the inequality constraints in the formula, according to the Karush-Kuhn-Tucker (KKT) conditions, we can obtain:

[0136] α i ((ω · Φ(x i )) - ρ + ξ i )) = 0

[0137] β i ξ i = 0

[0138] Furthermore, the following conclusions can be drawn:

[0139] 1) If α i ∈ (0, 1 / vl), the corresponding vector x i satisfies the following condition (ω · Φ(x i )) - ρ + ξ i = 0, that is, (ω · Φ(x i )) = ρ. At this time, the vector x i is on the hyperplane, and the vector x i is a boundary support vector.[[ID=5

[0143]

[0144] ste T α=1

[0145] 0≤α i ≤1 / vli=1,2,…,l

[0146] Where K represents the kernel function matrix, K i,j =k(x i ,x j ), α={α1,α2,…,α l According to the first and second conclusions, as long as the Lagrange multiplier α... i >0, the corresponding vector x i The support vectors are the shape of the hyperplane, which depends only on the support vectors. Here, the present invention defines the support vector set as S. SV The number of support vectors is l sv At this point, the hyperplane can be represented as:

[0147]

[0148] Here, f(x) is called the decision function. If all samples satisfy f(x) > 0, then the region enclosed by the hyperplane is considered the target sample. For any new sample data z, substitute the sample data z into the decision function; if... Then z is the target sample, if there is Then z is a non-target sample, which is an industrial fault sample point.

[0149] Furthermore, this invention fully utilizes the feature attributes of data samples near the optimal hyperplane to improve classification accuracy. To reduce classification errors caused by excessive feature vectors, this invention selects a support vector as a representative pivot in each class and introduces a weighting mechanism to improve the model's classification accuracy. The distance from the unknown sample to the support vector pivot in the feature space is calculated. and Subtracting the two distance values yields the distance difference g(x). This is compared to a set threshold ε, and the optimal weighted classifier for classification prediction is selected, with the following weighting factors:

[0150] S K =diag(s1,s2,s3…,s n )

[0151]

[0152] The improved hyperplane decision function can be expressed as:

[0153]

[0154] Where t(x) is the improved decision function. If all samples satisfy t(x) > 0, then the region enclosed by the hyperplane is considered the target sample. For any new sample data z, substitute the sample data z into the decision function; if... Then z is the target sample, if there is Then z is a non-target sample, which is an industrial fault sample point.

[0155] By establishing a fault screening mechanism based on One Class Support Vector Machine (OCSVM), it is possible to quickly determine in real time whether the latest samples collected in the industrial field are in a normal state. If the OCSVM screening result is normal, there is no need for subsequent fault diagnosis processes, which can effectively reduce the time for industrial field sample diagnosis and save computing resources.

[0156] Step 3: Establish a training sample type selection mechanism

[0157] This invention studies an extremely low-sample scenario, assuming that initially only a single sample exists for each fault type. To successfully train the fault classification model, it is necessary to increase the number of samples for each fault type. However, considering the high cost of collecting samples from actual field conditions, simulating the collection of samples for all fault types is difficult to achieve. Furthermore, existing technologies overlook an important fact: when using a contrastive learning-based method to build a fault diagnosis classification model, the sample type used to construct the sample pairs significantly impacts the model's performance. That is, if the fault type selected for constructing the training set sample pairs is inappropriate, the accuracy of the final model will be greatly reduced. Therefore, this invention proposes a training sample type selection mechanism that automatically selects the most suitable fault type from multiple initial fault types. Then, only a small number of samples of this fault type are collected and combined with normal samples to form sample pairs, which can then be used to train the subsequent Siamese network model, ensuring that the final trained model possesses optimal fault diagnosis accuracy.

[0158] Specifically, this invention designs a similarity comparison mechanism to select fault types for training by measuring the similarity between individual samples of various fault types at the initial moment and normal samples under standard conditions. The similarity measurement task is based on three evaluation metrics: brightness, contrast, and structure. First, the brightness of the samples is compared, which is determined based on the average intensity (μ). The brightness comparison function l(x,y) is the mean of the two samples x and y. Second, the standard deviation is used as an estimate of the sample contrast. The comparison of the standard deviations (σ) of samples x and y is represented by the contrast comparison function c(x,y). Third, the samples are divided by their own standard deviations to ensure that the two compared samples have unit standard deviations. For these normalized samples (x-μ... x ) / σ x and (y-μ) y ) / σ y By comparing structural indices, the structural comparison function s(x,y) is obtained.

[0159] The brightness comparison function l(x,y) is defined as follows:

[0160]

[0161] Where, μ x and μ y c(x,y) represents the average of samples x and y. When the mean and variance are close to zero, c1 and c2 are used to limit the range of the function values. The contrast comparison function c(x,y) is defined as:

[0162]

[0163] The structural comparison function s(x,y) is defined as follows:

[0164]

[0165] Where c3 = c2 / 2.

[0166] Combining the three equations yields the similarity index R(x,y) between samples x and y:

[0167] R(x,y)=[l(x,y)] a [c(x,y)] b [s(x,y)] c

[0168] Where a>0, b>0, and c>0 are parameters used to adjust the relative importance of the three components. To simplify the expression, in this invention, a = b = c = 1, and the similarity index R(x,y) between samples x and y is expressed as:

[0169]

[0170] Since the similarity metric index R(x,y) represents the similarity between two blocks, this invention further defines dR(x,y) as a distortion metric, and it is given by the following equation:

[0171]

[0172] Considering that the mean squared error (MSE) is used to calculate the difference between the estimated value and the true value of the estimated quantity, and its corresponding value is the square of the sample difference, this invention further correlates the similarity metric index R(x,y) with MSE, thereby enabling a faster approximation of the similarity metric index R(x,y) and the distortion metric dR(x,y) using MSE:

[0173]

[0174]

[0175] Where c = (c1 + c2) / 2, its function is to limit the range of function values when the mean and variance are close to zero.

[0176] By establishing a training sample type selection mechanism, individual samples of various fault types at the initial moment are sequentially compared with normal samples under standard conditions for similarity. The fault type with the lowest similarity, i.e., the highest distortion, is selected. A small number of fault samples, counted as Ms, are then collected from this type, and Ms is much smaller than the number N of the initial normal samples on site. This ensures that the cost of collecting fault samples is low, which is then used for subsequent training of the fault classification model.

[0177] Step 4: Establish a self-adversarial fault diagnosis model based on autoencoders and twin networks.

[0178] A significant problem with existing few-shot fault diagnosis techniques based on contrastive learning is the lack of consideration for enhancing the model's generalization ability. Although existing techniques employ autoencoders, generative adversarial networks (GANs), and contrastive learning, they merely utilize GANs to expand the sample size, variational autoencoders to further filter low-quality data, and contrastive learning to construct a sample pair prediction network. In these existing solutions, GANs, autoencoders, and contrastive learning are completely decoupled, meaning they fail to form an effective closed-loop feedback loop and are simply executed sequentially. Models built in this way typically have poor generalization ability, meaning they are only effective for samples similar to the initial data. Once the operating conditions and environment of industrial production change, the model's accuracy will rapidly decrease.

[0179] To address this, this invention designs a novel self-adversarial fault diagnosis scheme based on autoencoders and Siamese networks. Instead of introducing GAN networks, it achieves self-adversarial interaction by deeply coupling autoencoders and Siamese networks. This enables automatic calibration of industrial field samples under various environmental conditions and establishes a fault diagnosis model for extremely limited sample sizes. Specifically, it includes the following steps:

[0180] S1, Build an autoencoder (AE):

[0181] An autoencoder (AE) consists of three components: an encoder, a decoder, and a loss function. The encoder f is responsible for converting the input sample x into an encoded sample z, and the decoder g is responsible for converting the encoded sample z into an output sample x′.

[0182]

[0183] Considering the extremely limited sample size in the case studied in this invention, only the amount of normal sample data is sufficient. Therefore, following the dataset partitioning method in step one above, all given normal sample data are partitioned to obtain the training set D. In the prior art, the AE aims to construct an encoded sample z that can more simply represent the original sample x, ensuring that z does not lose key information x, while ensuring that the AE can recover the original sample x from the encoded sample z as much as possible; that is, it only achieves the function of feature extraction or feature enhancement. However, it is worth noting that in this invention, the AE also has the following functions:

[0184] 1) Achieving automatic correction of industrial field samples. The original sample x in this invention is a training set of normal samples after partitioning. However, these normal samples are collected under different environmental conditions. Considering the collection cost for the fault types selected in step three, only a small number of samples can be collected. This means that the samples used to train fault types cannot cover all environmental conditions like normal samples. If directly used for training the subsequent Siamese network model, it will lead to insufficient model generalization. When the new samples differ significantly from the known samples, the model's accuracy will drop drastically. Therefore, the encoder f in this invention needs to have an automatic sample correction function. That is, for any type of original sample x under any environmental condition, the encoded sample z after being transformed by encoder f is the corresponding type of sample under standard environmental conditions. In this way, even if the dataset cannot cover all environmental conditions, the model still has strong generalization, effectively improving the model's accuracy.

[0185] 2) Constructing a self-adversarial fault diagnosis network by coupling with the subsequent Siamese network. Specifically, firstly, the normal samples corrected in step 1) are used to construct sample pairs as input to the subsequent Siamese network. At this point, the AE in this invention acts as a generator network. The original sample x under any environmental condition, after being transformed by the encoder f, can be regenerated into a sample of the corresponding type under standard environmental conditions. That is, the process of the encoder automatically correcting normal samples to standard conditions is essentially a new sample generation process. Simultaneously, after obtaining the corrected sample pairs from the encoder, the Siamese network uses a deep network to determine whether the two samples belong to the same type. This determination process can be seen as the encoder AE, acting as a generator network, attempting to "deceive" the Siamese network with newly generated samples. The Siamese network then acts as an adversarial network, used to evaluate the effectiveness of the encoder's automatic sample calibration. Ideally, after the AE encoder transforms samples of the same type into standard conditions, the two samples should be very similar. If the Siamese network, acting as an adversarial network, cannot distinguish them, it indicates that the encoder of the trained AE is very effective and can successfully achieve automatic correction of industrial field samples, thereby solving the model generalization problem. If the Siamese network still considers two normal samples to be of different types, it indicates that the encoder has failed and further training is needed.

[0186] 3) Anti-cheating mechanism. Data for the same type of samples under standard conditions is deterministic, but the encoder may "cheate" during training. This "cheating" refers to the encoder failing to learn useful sample transformation parameters during training and instead directly defaulting to outputting the standard operating condition regardless of the input sample's operating condition. This situation is obviously something we need to avoid. Therefore, the decoder in our designed AE effectively solves this problem. The decoder g is responsible for converting the encoded sample z into the output sample x′. Since the output sample x′ needs to restore the original sample x as much as possible, and the original sample x varies depending on the environmental conditions it was collected in, the existence of the decoder provides a strong constraint: for different original samples x, the encoded samples z transformed by the encoder f, even if similar, are still different. Otherwise, the decoder g would produce the same output sample x′, which contradicts the aforementioned assumption of the AE. In other words, the AE designed in this invention has an anti-cheating mechanism, which can effectively ensure automatic sample correction and the construction of the self-adversarial network.

[0187] Therefore, this invention uses a deep neural network to parameterize f and g, respectively using parameter θ. f and θ g Representation. Parameters are trained using a reconstruction error criterion. Given an encoder function f: x→z and a decoder function g: z→x′, the reconstruction error for a given dataset D is as follows:

[0188]

[0189] By defining g as a Gaussian distribution centered at g(f(x)), the above loss function can be simplified to:

[0190]

[0191] Given a well-trained AE, and It can be obtained To generate new data, this invention first obtains a low-dimensional representation of the original data x in the normal sample training set, i.e. Then, the present invention performs a dimensional transformation on the obtained z to obtain This ensures that the data meets the input dimensions of the subsequent Siamese network; finally, a decoder is used to generate new data. In order to generate new data, the trained Action Environment (AE) needs to meet the following conditions:

[0192]

[0193] Therefore, the present invention can guarantee This achieves the anti-cheating mechanism of the AE mentioned earlier. During the complete training of the AE, a large amount of computing resources are required. To reduce overload, this invention further uses an early stopping method to train the AE. and satisfy:

[0194]

[0195] Here, c and d are hyperparameters to ensure that the model does not overfit. The autoencoder, after being trained with normal samples, can ensure automatic correction of data from various environmental conditions in industrial sites.

[0196] S2, Construction of Siamese network training set in extremely low-sample scenarios:

[0197] For an anomaly detection problem in a given industrial scenario, assume two general datasets D nor and D ano These are used to indicate normal and abnormal samples, respectively. Contains N nor One labeled normal sample, of which It is a data sample. These are the corresponding class tags. Similarly, Contains N ano 10 labeled abnormal samples, of which It is a data sample. These are the corresponding class tags. Assume N nor >>Nano This invention describes the extreme low-sample scenario studied in this invention.

[0198] It is worth noting that, unlike existing technologies, this invention studies an extremely small sample size scenario. Therefore, this invention first establishes a training sample type selection mechanism based on step three to determine the optimal fault type from the initial fault samples. For the determined type, a small number of samples are further collected, counted as Ms, and Ms is much smaller than the number N of the initial normal samples on site. nor This ensures that the cost of collecting fault samples is low. Then, the Ms fault samples of this type are input into the AE trained in step S1, and the dimensionality-transformed encoded samples output by the encoder are used to construct the anomaly sample dataset D. ano The count is Ms. Then, the initial normal samples are input into the AE trained in step S1, and the encoded samples output by the encoder after dimensionality transformation are used as normal samples, counted as N. nor To ensure the balance of positive and negative samples during the subsequent training of the Siamese network model, further analysis was conducted on the obtained N... nor We downsample the normal samples so that the number of normal samples after sampling is also Ms, thus obtaining the normal sample dataset D. nor .

[0199] Furthermore, in order to base on the abnormal sample dataset D ano and normal sample dataset D nor The present invention designs a sample pair generation method to obtain the sample pairs required for subsequent Siamese network training from the training set. First, the present invention defines a positive sample as a sample pair in which the two samples are of the same class, and a negative sample as a sample pair in which the two samples belong to different classes. Then, each time, a sample is randomly selected from the training set, and another sample is randomly selected from the same class; this is a positive sample pair, and the label of the positive sample pair is set to 1, indicating complete similarity. Similarly, each time, a sample is randomly selected from the training set, and another sample is randomly selected from other classes; this is a negative sample pair, and the label of the negative sample pair is set to 0, indicating zero similarity. Finally, the number of positive and negative sample pairs constructed from the training set is compared. Using the sample pair type with fewer positive and negative pairs as a benchmark, the other type of sample pair is downsampled to ensure that the number of positive and negative sample pairs in the final training set is the same, thereby ensuring the balance of positive and negative samples during subsequent Siamese network training.

[0200] S3, twin network setup:

[0201] Unlike traditional classification models that input a single sample at a time, the Siamese network constructed in this invention does not directly predict the category of the input sample data. Instead, it simultaneously inputs two samples each time and calculates the distance between the input samples based on their optimized feature representations. By judging the similarity between the two samples, it achieves fault classification in extremely small sample situations. In this invention, the Siamese network mainly consists of two identical model branch subnetworks, primarily used to train the feature information of the input sample pairs, and then calculates the similarity based on the comparison of the feature information. The Siamese neural network has two inputs, X1 and X2, which use the same neural network W, i.e., weight sharing. It is worth noting that in this invention, the Siamese network is built based on CNN, such as... Figure 2 As shown. However, this network can actually be replaced by various other networks, not just CNN networks.

[0202] By mapping two inputs to a new space to form the input through a Siamese neural network, it can be represented as G. W (X1) and G W (X2). Subsequently, in order to measure the similarity between two samples in a sample pair within the target space, this invention uses a simple Euclidean distance d to measure the similarity between the two samples:

[0203]

[0204] Where, N sum x represents the number of all sample pairs in the training set. 1i and x 2i Let X1 and X2 represent the samples in the i-th sample pair, respectively.

[0205] When training a Siamese network, the training parameters are two input samples (a sample pair) and labels, where the label value is given by the type of the input sample pair. When the inputs are of the same class, the label is 0; otherwise, it is 1. In the Siamese network, contrastive loss is chosen as the loss function, calculated based on the principle that the feature distance between a pair of samples of the same type should be small, and the feature distance between a pair of samples of different types should be large.

[0206]

[0207] Here, x1 and x2 represent two input samples, y is the label of the input sample pair, and the margin is the threshold for determining whether the two input samples belong to the same class. According to the definition of comparison loss, when two samples belong to the same class, the parameters are adjusted to minimize the Euclidean distance between x1 and x2. If the two samples do not belong to the same class, and the Euclidean distance between x1 and x2 is greater than the set margin, the loss is set to 0, i.e., no optimization is performed. Otherwise, the distance between the sample pairs will increase to the set margin. Therefore, using comparison loss can effectively achieve the purpose of Siamese networks in distinguishing samples.

[0208] Furthermore, unlike existing contrastive learning networks or Siamese networks which are only used for training or diagnosing the similarity of sample pairs, the Siamese network in this invention is tightly coupled with the autoencoder in step S1 throughout the entire model training process. On one hand, the data used to construct sample pairs in step S2 and the data used to train the Siamese network in step S3 are both obtained by automatically correcting the samples from the autoencoder in step S1. On the other hand, when the corrected sample pairs are input into the Siamese network, the Siamese network judges the correction results of the autoencoder and feeds back the correction effect to the encoder in step S1. Through multiple rounds of joint iterative optimization, the autoencoder, as a generator, generates new samples with high similarity to samples under the standard working conditions of this type, while the Siamese network, as a discriminator, also has a strong sample pair verification capability. Thus, the autoencoder and the Siamese network, based on self-adversarial principles, jointly complete the establishment of a fault diagnosis model under extremely few sample conditions.

[0209] S4, Real-time Fault Diagnosis for Industrial Field Samples:

[0210] Furthermore, in addition to the training set constructed in step S2, this invention further defines the support set S and query set Q in an industrial field fault diagnosis scenario. The support set consists of normal samples under standard conditions and unique single samples of each fault type at the initial moment, S = {(x...} i ,yi)|i=1,2,…N s}, where N s Let represent the total number of possible fault types in the industrial field, i.e., all fault types, where 'i' represents the i-th type. Clearly, in the support set of this invention, each type has exactly one sample, a design specifically for the extremely low sample count scenario studied in this invention. Furthermore, this invention defines all fault samples to be detected in the industrial field as the query set. Where j represents the j-th fault sample to be detected, N q This indicates the total number of fault samples to be tested in the industrial field.

[0211] First, based on step 2, a rapid initial screening of the samples to be tested is achieved, and the samples identified as faulty are further constructed into the aforementioned query set Q. Then, for each sample in the query set Q, it is sequentially paired with a single sample in the support set S to construct a test sample pair. Next, based on the Siamese network fault diagnosis model established in step S3, a fault diagnosis model consisting of N... s The vector consists of several similarity probability values. Further, the support set type corresponding to the term with the highest similarity probability value is taken as the fault type of the sample to be detected, thus realizing real-time fault diagnosis of samples in the industrial field.

[0212] Step 5: Method Accuracy Assessment

[0213] To verify the performance of the method on fully labeled public datasets, this invention employs three widely used metrics—precision, recall, and F1 score—to assess the correct identification of the sample type in the industrial setting. Precision, referring to the prediction result, represents the probability that a sample predicted as positive is actually positive among all predicted positive samples. The expression is:

[0214]

[0215] Where TP represents the number of samples that were predicted to be positive but were actually positive, and FP represents the number of samples that were predicted to be positive but were actually negative.

[0216] Recall refers to the original sample; it represents the probability that a sample that is actually positive will be predicted as positive. The expression is:

[0217]

[0218] Where TP represents the number of samples that were predicted to be positive but were actually positive, and FP represents the number of samples that were predicted to be negative but were actually positive.

[0219] The F1 score can be viewed as the harmonic average of precision and recall:

[0220]

[0221] By evaluating the accuracy of the samples to be detected, we can adjust the training parameters of the Siamese network and the sample type selection mechanism. For example... Figure 3 As shown, this invention forms a tight closed loop between model training, testing, and optimization, effectively improving the model's generalization and accuracy, and enabling it to be better applied to industrial fault diagnosis.

[0222] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.

Claims

1. An extreme few-sample fault diagnosis method for industrial scenarios. This method first establishes a fault screening mechanism based on OCSVM, which can quickly distinguish between faulty and normal samples. Then, a self-adversarial fault diagnosis method is established based on autoencoders and Siamese networks, which tightly couples data preprocessing, data calibration, and neural network model training to form a closed loop with a feedback mechanism. This fully mines the information in the existing samples, realizes automatic sample correction, and establishes an extreme few-sample fault diagnosis model applicable to various environmental conditions. The method includes the following steps: Step 1: Data preprocessing; Step 2: Establish a fault screening mechanism based on OCSVM; Step 3: Establish a training sample type selection mechanism; Step 4: Establish a self-adversarial fault diagnosis model based on autoencoders and twin networks; Step 5: Method accuracy assessment; Step 1 further includes: Step 1.1: Data Reconstruction. In order to train the neural network model, it is necessary to ensure that the dimension of the input data meets the dimension required by the network input. Therefore, it is necessary to adjust the dimension of the cleaned original data based on resampling and bilinear interpolation. At the same time, redundant information is removed with almost no loss of fault features, thereby saving computing resources and improving the accuracy of the model. Step 1.2, Dataset Partitioning: This method involves three parts related to model training, including establishing a fault screening mechanism based on OCSVM, implementing automatic sample correction based on an autoencoder, and training a few-shot classification model based on Siamese networks. In the fault screening mechanism based on OCSVM, normal samples can be obtained in large quantities with almost no collection cost. Since the accuracy of OCSVM is positively correlated with sample richness, all normal sample data is directly used for unsupervised training of OCSVM. In the automatic sample correction based on an autoencoder, the normal samples are first randomly shuffled, and then the training and test sets are divided in an 8:2 ratio, meaning 80% of the data is used for training. The remaining 20% of the data is used to test whether the model has sample correction capabilities. The purpose of randomly shuffling the samples is to increase the range of working conditions covered by the training samples as much as possible, so that the model based on the autoencoder has strong generalization ability. In the training of the few-shot classification model based on Siamese network, after determining the optimal sample training type, the samples are first downsampled to ensure that the number of samples of the two types is the same. Then, the samples of the two types are shuffled separately, and then the training set and test set are divided into training set and test set respectively according to an 8:2 ratio. Then, the two types of training sets are matched with sample pairs to obtain the final training set, and the two types of test sets are matched with sample pairs to obtain the final test set. Step 4 also includes: Step 4.1: Build the autoencoder (AE); Step 4.2: Construction of the Siamese network training set in extremely low-sample scenarios; Step 4.3: Construction of the twin network; Step 4.4: Real-time fault diagnosis of industrial field samples; In step 4.4, in addition to the training set constructed in step 4.2, a support set for industrial field fault diagnosis scenarios is also provided. and query set The definition of support set; wherein, the support set consists of normal samples under standard conditions and unique single samples of each fault type at the initial time. ,in This represents the total number of possible failure types in an industrial setting, i.e., all failure types. Indicates the first Type; define all fault samples to be detected in the industrial field as the query set. ,in Indicates the first One fault sample to be tested. This indicates the total number of fault samples to be tested in the industrial field; First, based on step 2, a rapid initial screening of the samples to be tested is achieved, and the samples identified as faulty are further constructed into the query set. ; then, for the queried set For each sample in the set, compare it sequentially with the support set. The single sample in step S3 is used to construct test sample pairs; then, based on the twin network fault diagnosis model established in step S3, the following can be obtained: The vector is composed of several similarity probability values. Further, the support set type corresponding to the term with the largest similarity probability value is taken as the fault type of the sample to be detected, thereby realizing real-time fault diagnosis of samples in industrial field.

2. The method as described in claim 1, characterized in that, In step 2, OCSVM is an unsupervised learning method. Unlike standard SVM, which constructs a hyperplane and separates two classes with maximum margin, OCSVM constructs a descriptive surface that surrounds the data distribution region. Therefore, the goal of trained OCSVM is to include most of the sampling points inside the closed surface, while leaving a small portion of the data points outside the closed surface. Given a set of normal samples OCSVM first maps the data to a high-dimensional feature space H using a nonlinear mapping, so as to separate the data from the origin by solving the objective function, which is as follows: in, and For hyperplane parameters, It is a mapping function that maps data to a high-dimensional space. It is a slack variable. ≥ 0; The introduction of slack variables can constrain the hyperplane, making it perform better in classification. These are control parameters used to adjust the degree of relaxation. It is the number of samples; To solve for the parameters of the OCSVM model more efficiently, the Lagrange multiplier method is further used to obtain the Lagrange function corresponding to the objective function, which has the following form: in, , It is a Lagrange multiplier. , The Lagrange function is relative to... , , By differentiating the expression and setting the differential to zero, we obtain the formula: Because of the inequality constraints in the formula, according to the Karush-Kuhn-Tucker (KKT) conditions, we can obtain: Furthermore, the following conclusions can be drawn: (1) If there are The corresponding vector The following conditions must be met ,Right now At this time, vector On the hyperplane, and vector These are boundary support vectors; (2) If there are The corresponding vector The following conditions must be met At this time, the corresponding The vector is distributed in the region between the origin and the hyperplane in the high-dimensional feature space. These are non-boundary support vectors; (3) If there are The corresponding vector The following conditions must be met At this time, the corresponding The distribution lies outside the hyperplane in the high-dimensional feature space, and the vector These are non-support vectors; Based on conclusion (1), the hyperplane optimization problem can be rewritten in the following dual form: in, Represents the kernel function matrix. K i,j = k ( x i , x j ), α = { α 1, α 2, , α l According to conclusions (1) and (2), as long as the Lagrange multiplier... The corresponding vector x i The support vectors are the shape of the hyperplane, which depends only on the support vectors; here, the set of support vectors is defined as... S SV The number of support vectors is l sv At this point, the hyperplane can be represented as: in, This is called the decision function, which is defined if all samples satisfy... Therefore, the region enclosed by the hyperplane is considered the target sample for any new sample data. , sample data Substituting into the decision function, if... ,but It is the target sample, if there is ,but These are non-target samples, i.e., industrial fault sample points; To reduce classification errors caused by an excessive number of feature vectors, this method selects a support vector as a representative pivot in each class and introduces a weighting mechanism to improve the accuracy of model classification. The distance from the unknown sample to the support vector pivot in the feature space is calculated. and Subtract the two distance values to get the distance difference. Compare it with the set threshold. Select the best weighted classifier for classification prediction, with the following weighting factors: The improved hyperplane decision function can be expressed as: in, Decision function, if all samples satisfy Therefore, the region enclosed by the hyperplane is considered the target sample for any new sample data. , sample data Substituting into the decision function, if... ,but It is the target sample, if there is ,but These are non-target samples, i.e., industrial fault sample points; Step 2 allows for real-time and rapid determination of whether the latest samples collected at the industrial site are in a normal state. If the initial screening result of OCSVM is normal, there is no need for subsequent fault diagnosis procedures, which can effectively reduce the time required for industrial site sample diagnosis and save computing resources.

3. The method as described in claim 2, characterized in that, In step 3, the fault types used for training are selected by measuring the similarity between individual samples of various fault types at the initial time and normal samples under standard conditions. The similarity measurement task is based on three evaluation metrics: brightness, contrast, and structure. First, the brightness is compared, which is based on average intensity. Definitely, brightness comparison function There are two samples. The mean of the function is used, and then the standard deviation is used as an estimate of the contrast for the sample. Standard deviation The comparison is represented by the contrast comparison function. Finally, the sample is divided by its own standard deviation so that the two samples being compared have a unit standard deviation. This applies to these normalized samples. and By comparing structural indices, a structural comparison function is obtained. ; The brightness comparison function Defined as: in, and It is a sample and The average value, when the mean and variance are close to zero, is used. and To limit the range of function values; the contrast comparison function Defined as: The structural comparison function Defined as: in, ; Combining the three equations yields the sample. and Similarity metric index between : in, , , This is a parameter used to adjust the relative importance of the three components. To simplify the expression, we take... ,sample and Similarity metric index between Represented as: Due to similarity measurement index To represent the similarity between two blocks, further define... Used as a distortion metric, and given by the following formula: Considering that the mean squared error (MSE) is used to calculate the difference between the estimated value and the true value of the estimated quantity, and its corresponding value is the square of the sample difference, the similarity measure index is further... This is associated with MSE, allowing for a faster approximation of the similarity index using MSE. and distortion measurement : in, Its function is to be used when the mean and variance are close to zero. To limit the range of function values; Through the established training sample type selection mechanism, individual samples of various fault types at the initial moment are sequentially compared with normal samples under standard conditions. The fault type with the lowest similarity, i.e., the highest distortion, is selected. A small number of fault samples are then collected from this type, and the count is... ,and Much smaller than the number of initial normal samples at the scene. This ensures that the cost of collecting fault samples is low, which can then be used to train the fault classification model.

4. The method as described in claim 3, characterized in that, In step 4, instead of introducing a GAN network, the autoencoder is deeply coupled with the Siamese network to form a self-adversarial relationship. This enables automatic calibration of industrial field samples under various environmental conditions and establishes a fault diagnosis model under extremely limited sample conditions.

5. The method as described in claim 4, characterized in that, In step 4.1, the AE comprises three components: an encoder, a decoder, and a loss function; wherein, the encoder... Responsible for the original sample Convert to encoded samples The decoder Responsible for encoding samples Convert to output sample : Following the dataset partitioning method described in step 1, all given normal sample data are partitioned to obtain the training set. The AE also has the following functions: 1) To achieve automatic calibration of industrial field samples; the encoder It needs to have an automatic sample correction function, that is, it can correct any type of original sample under any environmental conditions. After passing through the encoder The converted coded sample These are samples of the corresponding type under standard environmental conditions. This means that even if the quality of the industrial field dataset is poor and cannot cover various environmental conditions, the model still has strong generalization ability, which effectively improves the accuracy of the model. 2) Coupled with the subsequently described Siamese network, a self-adversarial fault diagnosis network is constructed. Specifically, firstly, the corrected normal samples from step 1) are used to construct sample pairs, which serve as the input to the subsequently described Siamese network. The AE is equivalent to a generator network, and the original samples under any environmental conditions... After passing through the encoder The converted coded sample The samples can be regenerated into samples of the corresponding type under standard environmental conditions. In other words, the encoder's process of automatically correcting normal samples to standard conditions is essentially a new sample generation process. Simultaneously, after acquiring the corrected sample pairs from the encoder, the Siamese network uses a deep network to determine whether the two samples belong to the same type. This determination process can be seen as the AE (Advanced Image Processing) acting as a generator network attempting to "deceive" the Siamese network with newly generated samples. The Siamese network, at this point, acts as an adversarial network, used to evaluate the effectiveness of the encoder's automatic sample calibration. Ideally, after the AE transforms samples of the same type into standard conditions, the two samples should be very similar. If the Siamese network, acting as an adversarial network, cannot distinguish them, it indicates that the trained AE is highly effective and can successfully achieve automatic correction of industrial field samples, thus solving the model's generalization problem. However, if the Siamese network still considers the two normal samples to be of different types, it indicates that the AE has failed and further training is needed. 3) Anti-cheating mechanism: Data for the same type of samples under standard conditions is fixed. The AE (Advanced Image Processor) may "cheate" during training. "Cheating" means that the AE fails to learn useful sample transformation parameters during training, and instead, regardless of the input sample's operating conditions, the encoded sample... All default to outputting standard operating conditions; to avoid this situation, when designing the AE, the decoder... Responsible for encoding the sample Converted to the output sample Because of the output sample The original sample needs to be restored as much as possible. The original sample The differences in data collected under different environmental conditions mean that the existence of the AE (Advanced Image Processor) acts as a constraint: for different original samples... via the encoder The converted coded sample Even if they are similar, they are still different; otherwise, the decoder... Transform the encoded sample The output sample is obtained. They are the same, meaning that the AE has an anti-cheating mechanism, which can effectively guarantee automatic sample correction and the construction of self-adversarial networks; Therefore, deep neural networks are used for parameterization. and , respectively using parameters and This indicates that the encoder function is given by training parameters through reconstructing the error criterion. : and the decoder functions: : For a given dataset The rebuild error is shown below: By Defined as For a Gaussian distribution centered at the given point, the above loss function can be simplified to: Given a well-trained AE, and You can get To generate new data, we first obtain the original data from the normal sample training set. The low-dimensional representation, i.e. Then, perform a dimensional transformation on the obtained z to get This ensures that the input dimensions of the subsequent Siamese network are met; finally, the decoder is used to generate new data. In order to generate new data, the trained AE needs to meet the following conditions: Therefore, guarantee This achieves the anti-cheating mechanism of the AE mentioned earlier. During the complete training of the AE, a large amount of computing resources are required. In order to reduce overload, the early stopping method is used to train the AE. and satisfy: in, and These are hyperparameters used to ensure that the model does not overfit. The autoencoder, trained on normal samples, can automatically correct data from various environmental conditions in industrial settings.

6. The method as described in claim 5, characterized in that, In step 4.2, for an anomaly detection problem in a given industrial scenario, it is assumed that two general datasets are used. and Used to indicate normal and abnormal samples respectively. Include Labeled normal samples, among which It is a data sample. These are the corresponding class tags, and similarly, Include 10 labeled abnormal samples, of which It is a data sample. It is the corresponding class tag, assuming Describe the extreme low-sample scenario studied by this method; First, based on the training sample type selection mechanism established in step 3, the optimal fault type is determined from the initial fault samples. Then, a small number of samples are collected for the determined type, and the count is... ,and Much smaller than the number of initial normal samples at the scene. This ensures that the cost of collecting fault samples is low; then this type of sample will be... Each fault sample is input into the AE trained in step 4.1, and the coded samples output by the AE after dimensionality transformation are used to construct an anomaly sample dataset. , count Next, the initial normal samples are input into the AE trained in step 4.1, and the coded samples output by the AE after dimensionality transformation are used as normal samples, counted as... To ensure the balance of positive and negative samples during the subsequent training of the Siamese network model, further research was conducted on the obtained samples. Downsampling was performed on the normal samples so that the number of normal samples after sampling was also [number]. To obtain the normal sample dataset ; In order to base on abnormal sample datasets and normal sample dataset The training set contains sample pairs required for the subsequent training of the Siamese network. A sample pair generation method is designed. First, a positive sample is defined as a sample pair in which the two samples are of the same class, and a negative sample is a sample pair in which the two samples belong to different classes. Then, each time, one sample is randomly selected from the training set, and another sample is randomly selected from the same class. This is the positive sample pair, and the label of the positive sample pair is set to 1, indicating complete similarity. Similarly, each time, one sample is randomly selected from the training set, and another sample is randomly selected from other classes. This is the negative sample pair, and the label of the negative sample pair is set to 0, indicating a similarity of 0. Finally, the number of positive sample pairs and negative sample pairs constructed from the training set are compared. The sample pair type with fewer numbers is used as the benchmark, and the other type of sample pair is downsampled to ensure that the number of positive sample pairs and negative sample pairs in the final training set is the same, thereby ensuring the balance of positive and negative samples during the subsequent training of the Siamese network.

7. The method as described in claim 6, characterized in that, In step 4.3, unlike traditional classification models that input a single sample each time, the constructed Siamese network does not directly predict the category of the input sample data. Instead, it inputs two samples simultaneously each time and calculates the distance between the input samples based on its optimized feature representation. By judging the similarity between the two samples, it achieves fault classification in extremely small sample situations. The Siamese network consists of two identical model branch subnetworks used to train the feature information of the input sample pairs. Then, it calculates the similarity based on the comparison of the feature information. The Siamese network has two inputs. and They use the same neural network. That is, weight sharing; the twin network is built on CNN; By mapping two inputs to a new space through the twin network to form the input, it can be represented as follows: and Then, in order to measure the similarity between two samples in a sample pair in the target space, a simple Euclidean distance is used. To measure the similarity between two samples: in, This represents the number of all sample pairs in the training set. and They represent the first Input in each sample pair and The sample; When training the Siamese network, the training parameters are two input samples, namely a sample pair and a label, where the label value is given by the type of the input sample pair; when the inputs are of the same category, the label is 0, otherwise it is 1; in the Siamese neural network, contrastive loss is chosen as the loss function, and its calculation is based on the principle that the feature distance between a pair of samples of the same type should be small, and the feature distance between a pair of samples of different types should be large. in, and Representing the two input samples, These are the labels and margins of the input sample pairs. It is the threshold for determining whether two input samples belong to the same category; according to the definition of comparison loss, when two samples belong to the same category, the parameters are adjusted to minimize it. and The Euclidean distance between them; if the two samples do not belong to the same class, and and The Euclidean distance between them is greater than the set margin. If the error is not specified, the loss will be set to 0, meaning no optimization will be performed; otherwise, the distance between sample pairs will increase to the set margin. Therefore, using comparison loss can effectively achieve the purpose of Siamese networks in distinguishing samples; The Siamese network is tightly coupled with the AE throughout the model training process. On the one hand, the data used to construct sample pairs in step 4.2 and the data used to train the Siamese network in step 4.3 are both obtained by automatically correcting the AE samples in step 4.

1. On the other hand, when the corrected sample pairs are input into the Siamese network, the Siamese network judges the correction result of the AE and feeds back the correction effect to the AE in step 4.

1. Until through multiple rounds of joint iterative optimization, the AE, as a generator, generates new samples with a high similarity to the samples of the standard working conditions of the actual type. At the same time, the Siamese network, as a discriminator, also has a strong sample pair verification capability. Thus, the AE and the Siamese network jointly complete the establishment of the fault diagnosis model under the extreme few-sample situation based on self-adversarial interaction.

8. The method as described in claim 7, characterized in that, In step 5, based on whether the type of the sample to be tested in the industrial field is correctly identified, three metrics—precision, recall, and F1 score—are used to verify the performance of the above method on a fully labeled public dataset. Precision refers to the prediction result and represents the probability that a sample is actually positive among all samples predicted as positive. The expression is: Where TP represents the number of samples that were predicted to be positive but were actually positive, and FP represents the number of samples that were predicted to be positive but were actually negative. The term "recall" refers to the original sample and means the probability of being predicted as a positive sample among samples that are actually positive. The expression is: Where TP represents the number of samples that were predicted to be positive but were actually positive, and FP represents the number of samples that were predicted to be negative but were actually positive. The F1 score can be viewed as the harmonic average of precision and recall: By evaluating the accuracy of the samples to be tested, we can adjust the training parameters of the Siamese network and the sample type selection mechanism.