Ship main engine small sample fault diagnosis method based on deep concentric twin network

By using a deep concentric twin network (DCSN) for sample amplification, feature learning, and difference measurement, and by optimizing the objective using concentric loss, the problem of insufficient data on ship main engine failures was solved, and high-accuracy small-sample failure diagnosis was achieved.

CN118312868BActive Publication Date: 2026-06-26HARBIN INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN INST OF TECH
Filing Date
2024-01-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The limited amount of ship main engine fault data in existing technologies makes traditional deep learning methods prone to overfitting, resulting in insufficient generalization ability and difficulty in achieving highly accurate fault diagnosis.

Method used

By employing a deep concentric twin network (DCSN), through sample amplification, feature learning, and difference measurement, and utilizing concentric loss to optimize the objective, the distinguishability of positive and negative pairs is improved, enabling fault diagnosis in small samples.

Benefits of technology

It significantly improves the accuracy of ship main engine fault diagnosis, effectively overcomes the diagnostic challenges under small sample conditions, and enhances the model's inter-class discrimination ability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118312868B_ABST
    Figure CN118312868B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of ship power plant fault diagnosis, in particular to a ship main engine small sample fault diagnosis method based on deep concentric twin network, which can overcome the deficiency of limited fault data volume of existing ship power plant, and significantly improve the accuracy of ship main engine fault diagnosis, DCSN adopts sample pairing to expand the training sample, and balances the number of positive and negative pairs to reduce the risk of overfitting of the deep model, at the same time, DCSN adopts concentric loss as the optimization function, which is helpful to automatically learn the inter-class distinguishability features in the end-to-end training, the effectiveness of the proposed DCSN is verified on the ship main engine fault data set, and the influence of related parameters on the diagnosis performance of DCSN is discussed, including the value of inner and outer boundary, and the output feature dimension of twin network.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of marine propulsion system fault diagnosis technology, specifically a small-sample fault diagnosis method for marine main engines based on deep concentric twin networks that can overcome the limitation of the limited amount of existing marine propulsion system fault data and significantly improve the accuracy of marine main engine fault diagnosis. Background Technology

[0002] The main engine, as the core component of the propulsion system, provides power to the ship. Because navigation is greatly affected by natural weather conditions, ships rely on the power provided by the main engine to cope with the risks posed by complex maritime conditions, such as strong winds and large waves. If the main engine malfunctions and causes a shutdown, the ship will lose power to resist threats from the sea, resulting in incalculable damage. Therefore, timely diagnosis of main engine failures is of paramount importance for ensuring navigational safety.

[0003] Data collected from ship main engine monitoring systems typically contains a wealth of information reflecting the operational health of the main engine. Analyzing this monitoring data allows for timely understanding of the engine's health status. Data-driven fault diagnosis methods, especially deep learning methods, have received widespread attention in the field. However, ship main engines cannot operate under fault conditions for extended periods to collect sufficient fault data. In other words, the data collected during ship operation is often normal data, while fault data is scarce. In this context, traditional deep learning methods are prone to overfitting, resulting in insufficient model generalization ability and difficulty in achieving good diagnostic performance.

[0004] Insufficient fault samples are one of the major challenges in diagnosing ship main engine faults. In recent years, scholars have proposed various strategies to overcome this problem, which can be broadly categorized into three types: data augmentation-based strategies, transfer learning-based strategies, and metric learning-based strategies. Data augmentation-based strategies attempt to generate more new samples using existing samples to improve model performance. They first use Empirical Mode Decomposition (EMD) to convert fault samples into a series of intrinsic mode functions (IMFs), then randomly select an IMF for appropriate scaling, and finally reconstruct the fault samples using these IMFs. By balancing the inter-class samples, they improve the diagnostic performance of CNNs under small sample conditions. Furthermore, oversampling methods such as SMOTE or generative adversarial networks can also increase the number of minority class samples. However, while data augmentation methods can increase the number of minority class samples, the diversity and class invariance of these samples remain questionable.

[0005] A transfer learning-based strategy attempts to train a classifier using a large amount of source domain data and a small amount of target domain data to address the problem of small-sample fault diagnosis. They utilize a turbocharger simulation model to obtain a large amount of data as the source domain data, and actual measurement data of the turbocharger collected from a test bench as the target domain data. Then, they optimize the TrAdaBoost weights using the large amount of source domain data and the small amount of target domain data, and finally apply the optimized TrAdaBoost to marine turbocharger fault diagnosis. Typically, transfer learning requires an auxiliary dataset as the source domain dataset, and the source and target domain datasets must belong to the same distribution. However, marine main engines are complex mechanical devices, comprising multiple rotating and reciprocating components that interact with each other. The failure mechanism of marine main engines remains unknown. Therefore, it is difficult to build an accurate marine main engine simulation model to obtain a large amount of simulation data, and it is also difficult to find suitable auxiliary datasets from other domains as source domain data for the transfer learning task of marine main engine fault diagnosis.

[0006] Metric learning-based strategies address the problem of small-sample fault diagnosis by measuring the similarity between samples. Siamese networks are a typical example of this approach. They utilize two weight-sharing sub-networks to learn features from input sample pairs, distinguishing whether two samples in a pair belong to the same class by measuring the similarity between their feature vectors. Small-sample fault diagnosis is a multi-class classification problem, which Siamese networks transform into a binary classification problem (positive or negative pairs), reducing modeling complexity. In fact, Siamese networks address small-sample fault diagnosis through two approaches. First, since the input to a Siamese network is a pair of samples, pairing training samples expands the number of training samples; this increased number helps deep models learn more inter-class discriminative information. Second, Siamese networks use two weight-sharing sub-networks to map the two samples in a pair to the same feature space, making the feature vectors of the two samples comparable. Thanks to these advantages, Siamese networks have gained widespread attention and are applied in various small-sample fault diagnosis scenarios for mechanical equipment, such as bearings and aero-engines.

[0007] However, a key issue is that improving the distinguishability between positive and negative pairs in metric learning is a challenging task. Figure 1As shown in (a), on the one hand, in the traditional metric space, if the two samples in a positive pair have the same feature values, then their difference is 0, and this value can be placed at the origin. However, even if the two samples in a positive pair belong to the same class, their values ​​are different, and after passing through the same mapping function (i.e., the same network structure and the same weights), they are almost impossible to be the same in the feature space. In this case, most positive pairs are usually scattered near the origin. On the other hand, since the two samples in a negative pair do not belong to the same class, the difference between them is usually large; in this case, the negative pairs are distributed far from the origin. However, some negative pairs composed of inter-class similar samples may have small differences between them, which will be distributed near the origin. In other words, there is partial overlap between positive and negative pairs, which makes it impossible to accurately separate positive and negative pairs, thus affecting the fault diagnosis performance. Summary of the Invention

[0008] This invention addresses the shortcomings and deficiencies of existing technologies by proposing a small-sample fault diagnosis method for ship main engines based on a deep concentric twin DCSN network. This method overcomes the limitation of the limited amount of fault data in existing ship propulsion systems and significantly improves the accuracy of ship main engine fault diagnosis.

[0009] This invention achieves its purpose through the following measures:

[0010] A method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks is characterized by the establishment and application of deep concentric twin networks. The deep concentric twin network includes sample augmentation, feature learning, and difference measurement. In the sample augmentation stage, samples of different categories are cross-combined to form sample pairs. As input to the network; during the feature learning stage, the input sample pairs are used as input. Mapping to a new feature space to obtain the feature vectors of the sample pairs. , In the difference measurement phase, the difference in feature vectors between sample pairs is measured. To the origin The distance between the two sample pairs is used as a measure of their difference. This allows us to distinguish whether two items belong to the same category, thereby enabling fault diagnosis in small samples.

[0011] The sample amplification stage described in this invention specifically includes:

[0012] Step 1: Combine all faulty samples in the training set into pairs to form sample pairs. For each sample pair... ,if and If they belong to the same category, then it's a correct match, and the tag is... Otherwise, it is a negative pair, and the label is... 1; among which, This represents the sample index, and ; The number of samples participating in the pairing;

[0013] Step 2: Following the sample pairing method described above, form sample pairs from all normal samples in the training set, ensuring that all sample pairs are positive pairs.

[0014] Step 3: Count the number of positive and negative pairs, and randomly select a certain number of normal samples from the training set to pair with all faulty samples in turn, so as to achieve a balance between positive and negative pairs, that is, the number of positive pairs is equal to the number of negative pairs.

[0015] The feature learning stage of this invention uses a DRN to construct a Siamese sub-network to learn features from the input training sample pairs. This includes two branches with shared weights, each composed of a DRN. The DRN consists of multiple stacked residual blocks, each containing two batch normalization (BN) layers, two rectified linear unit activation function (ReLU) layers, two convolutional layers (Conv), and an identity connection. BN accelerates the training and optimization process by forcing samples from different batches to have a similar distribution. ReLU, a non-linear activation function, enhances the network's expressive power. Conv uses convolution operations instead of matrix multiplication, reducing the number of trainable weights and making network optimization easier. The identity connection directly adds the input information to the output of subsequent layers through cross-layer connections. The calculation process for each residual block is as follows:

[0016]

[0017] In the formula, , They represent the first The input and output of each residual block; This represents the residual learning function. Furthermore, the feature learning process is as follows: Sample pairs The samples are fed into two branches of the Siamese network, respectively. After passing through convolutional layers and multiple residual blocks, they are sequentially fed into a global average pooling (GAP) layer and a fully connected (FC) layer to finally obtain the feature vectors of the sample pairs. , In this Siamese network, the weights of the two branches are shared, which can map sample pairs to the same feature space, making the feature vectors of the sample pairs... , It is comparable, so as to distinguish whether the two belong to the same category.

[0018] In this invention, the difference measurement stage uses a distance formula to calculate the difference between the feature vectors of two branches. To calculate the difference between two samples in a sample pair, the difference between their feature vectors is first calculated. As shown in the formula below:

[0019]

[0020] In the formula, and They represent respectively , The feature vectors are used to calculate the difference between the feature vectors using Euclidean distance. To the origin Distance as a measure of difference between sample pairs As shown in the formula below, furthermore, other distance metrics can also be used to measure the difference between sample pairs.

[0021]

[0022] In the formula, This represents Euclidean distance; The smaller the value, the smaller the difference between the two samples, and the greater the probability that they belong to the same category.

[0023] The deep concentric Siamese network described in this invention uses concentricity loss as the training optimization objective. The concentricity loss is used to improve the discriminability between positive and negative difference measures, as shown in the following equation:

[0024]

[0025] In the formula, This represents sample pairs and their label values. For the correct answer, when When it is a negative pair; This represents the sample's index number; It is the common center of the inner boundary (corresponding to radius r) and the outer boundary (corresponding to radius R), and this invention sets it at the origin of the coordinate system; This represents a measure of difference between sample pairs;

[0026] Considering the case where the sample pairs are positive pairs, i.e. At this point, the concentric loss is:

[0027]

[0028] If the correct difference measure If the value is greater than r, a large penalty will be imposed, until... Under this constraint, the corresponding difference measure It is confined to a feature space with an inner boundary radius less than r;

[0029] Considering the case where the sample pairs are negative pairs, i.e. At this point, the concentric loss is:

[0030]

[0031] If the difference measure of negative pairs If the value is greater than R, a large penalty will be imposed, until... Under this constraint, the difference measure of negative pairs Constrained to have an outer boundary radius greater than The feature space range;

[0032] Given a by A positive pair and The total concentricity loss for a dataset consisting of negative pairs. The calculation is as follows:

[0033]

[0034] In the formula, the first One is correct, then... One is a negative pair.

[0035] In the deep concentric twin network of this invention, r and R (i.e., inner and outer boundaries) are two important parameters of the concentricity loss. By increasing the interval between the inner and outer boundaries (i.e., ... This can improve the distinguishability of positive and negative pairs, and... Set as a constant, by evaluating the difference measure of sample pairs during training. The distribution is determined by adaptively decreasing r to increase the spacing between the inner and outer boundaries. The specific setup steps are as follows:

[0036] First, initialize and set the inner boundary. outer boundary , This represents the maximum number of iterations. , These represent the number of positive and negative pairs in the training set, respectively.

[0037] Second, perform the first round of iterative optimization. ;

[0038] Third, calculate the difference measure for all sample pairs and sort them in ascending order. ;

[0039] Fourth, in Find the smallest of the three. The difference measure of each sample pair, and labeled as ;

[0040] Fifth, judgment Is it less than If so, update Otherwise, no update will be performed;

[0041] Sixth, number of update iterations ;

[0042] Seventh, determine whether the maximum number of iterations has been reached (i.e., If yes, end training and output. Otherwise, repeat steps three through seven.

[0043] As the network weights are continuously optimized and r is continuously updated, the difference measure of positive pairs is continuously mapped to a smaller inner boundary. In other words, the separation interval between positive and negative pairs gradually increases, and the separation between the two becomes more obvious.

[0044] The deep concentric twin network described in this invention employs a minimum difference strategy for small-sample fault diagnosis. Specifically, firstly, test samples and known samples are paired; then, the sample pairs are input into the network to obtain the difference measure; finally, the known sample category with the smallest difference measure is selected as the category of the test sample, thus achieving fault diagnosis. There are two strategies for pairing test samples and known samples: one-shot testing and N-shot testing. In one-shot testing, the test sample is first paired with one sample from each known category; then, the sample pairs are input into the network to obtain the difference measure; finally, the known sample category with the smallest difference measure is selected as the category of the test sample. In N-shot testing, the test sample is first paired with N samples from each known category; then, the sample pairs are input into the network to obtain the difference measure; finally, the known sample category with the smallest difference measure is selected as the category of the test sample.

[0045] This invention employs an N-shot testing strategy to form sample pairs, with N=5; if the number of known samples is less than 5, then N equals the actual number of known samples. The N-shot testing strategy is adopted for the following reasons: in a one-shot test, if the selected single sample deviates from the distribution of its category, the resulting diagnosis may be poor; conversely, the N-shot testing strategy can better avoid the problems inherent in one-shot testing.

[0046] The application of deep concentric twin networks in this invention includes the following steps:

[0047] Step A: Obtain data from the ship's main engine monitoring data and divide the data into training and testing sets;

[0048] Step B: Form sample pairs from the samples in the training set;

[0049] Step C: Perform the first round of optimization, set the relevant optimization parameters, including the learning rate and the maximum number of iterations, and then start the optimization process;

[0050] Step D: Use the network to learn features from the training sample pairs to obtain the feature vectors of the training sample pairs;

[0051] Step E: Calculate the difference between the feature vectors of the sample pairs;

[0052] Step F: Calculate the difference measure for the sample pairs;

[0053] Step G: Calculate the concentricity loss and use gradient descent to optimize the network weights to minimize the concentricity loss;

[0054] Step H: Update the inner boundary radius.

[0055] Step I: Determine whether the number of optimization iterations has reached the set maximum number of iterations. If yes, obtain the optimized DCSN; otherwise, return to step D.

[0056] Step J: Fault diagnosis using the optimized network: First, adopt the N-shot Test strategy to form sample pairs with test samples in the test set and known samples in the training set. Then, input the sample pairs into the optimized DCSN to obtain the difference measure of the sample pairs. Finally, according to the principle of minimum difference value, take the category of the known sample with the smallest difference measure of the sample pair as the category of the test sample to achieve small sample fault diagnosis.

[0057] This invention proposes a Deep Concentric Twin Network (DSCN) to improve inter-class discriminability in metric learning, thereby enhancing the performance of ship main engine fault diagnosis under small sample conditions. On one hand, the DCSN expands the training samples by pairing samples and reduces the risk of overfitting by balancing the number of positive and negative pairs. On the other hand, the DCSN uses concentric loss as an optimization function, which helps to automatically learn inter-class discriminative features during end-to-end training. In this invention, concentric loss is used to explicitly force the deep model to learn inter-class discriminative features. Under the supervision of concentric loss, positive pairs are pulled into the inner boundary, and negative pairs are pushed out of the outer boundary. The spacing between the inner and outer boundaries effectively separates positive and negative pairs. By measuring the distance from the origin to the difference between the feature vectors of two samples in a sample pair, a sample pair difference measure is given, and small sample fault diagnosis is achieved based on the principle of minimum difference. Specifically, test samples and known samples are paired and input into the DSCN to obtain the difference between the feature vectors of the sample pairs. The class of the known sample in the sample pair whose difference is closest to the origin is taken as the class of the test sample. This invention validates the effectiveness of the proposed DCSN on a ship main engine failure dataset and discusses the impact of relevant parameters on the diagnostic performance of the DCSN, including the values ​​of the inner and outer boundaries and the dimension of the Siamese network output features. Attached image description:

[0058] Appendix Figure 1 This is a schematic diagram of the traditional feature space, concentric loss, and ideal feature space.

[0059] Appendix Figure 2 This is a framework diagram of the deep concentric twin network in this invention.

[0060] Appendix Figure 3 This is a schematic diagram of the twin network structure in this invention.

[0061] Appendix Figure 4 This is a schematic diagram of one-shot and N-shot tests in this invention.

[0062] Appendix Figure 5 This is a flowchart of the small sample fault diagnosis process for ship main engines based on DSCN in this invention.

[0063] Appendix Figure 6 This is a summary histogram of evaluation indicators in an embodiment of the present invention.

[0064] Appendix Figure 7 This is a schematic diagram of the confusion matrix of different methods on the test samples when the number of fault samples of each type on the training set is 15 in an embodiment of the present invention.

[0065] Appendix Figure 8This is a schematic diagram of the deep feature visualization results of different methods on the test samples when the number of fault samples of each class on the training set is 15 in an embodiment of the present invention. Detailed implementation method:

[0066] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0067] This invention proposes a novel depth measurement method, Deep Concentric Twin Network (DCSN), which improves the performance of ship main engine fault diagnosis under small sample conditions by enhancing the distinguishability between positive and negative pairs. The novelty of DCSN lies in its use of a concentric loss to explicitly force the separation of positive and negative pairs. Figure 1 As shown in (b), in the feature space, concentric loss pulls positive pairs outside the inner boundary into the inner boundary. This stems from the consideration that the difference between the two samples in a positive pair is relatively small. For example, if the two samples in a positive pair are identical, then the difference between these two samples should be 0, which can be placed at the origin; if the difference between the two samples in a positive pair is greater than 0, it can be given by measuring the distance between the sample pair difference and the origin. Therefore, it is appropriate to take the region within the inner boundary as the distribution region of positive pairs. Furthermore, in order to separate positive and negative pairs, concentric loss pushes negative pairs inside the outer boundary out of the outer boundary. Considering that the difference between the two samples in a negative pair may be large, while the region outside the outer boundary is infinitely large; therefore, it is appropriate to take the region outside the outer boundary as the distribution region of negative pairs. In this case, the interval between the inner and outer boundaries can separate positive and negative pairs, as shown in the figure. Figure 1 As shown in (c). It is worth noting that in an ideal feature space, the smaller the difference between the two samples in a sample pair, the closer their difference is to the origin, and the greater the probability that the sample pair belongs to the same class. Therefore, by pairing test samples with known samples and inputting them into the DCSN, then measuring the distance of the feature vector difference between the sample pairs to the origin, and using the class of the known sample in the sample pair closest to the origin as the class of the test sample, fault diagnosis under small sample conditions can be achieved.

[0068] This invention utilizes deep concentric twin networks to improve the distinguishability between different types of sample pairs (i.e., positive and negative pairs), thereby enhancing the performance of small-sample fault diagnosis. This section first presents the overall framework of the deep concentric twin network, followed by a detailed description of its technical details, including sample amplification strategies, twin sub-network structures, difference metrics, and optimization objectives.

[0069] The deep concentric twin network described in this invention mainly comprises three parts: sample amplification, feature learning, and difference measurement, such as... Figure 2As shown. The working principle of deep concentric twin networks is as follows: (1) In the sample amplification module, samples of different categories are combined to form sample pairs. (2) In the feature learning module, the network will use the input sample pairs as input to the network. Mapping to a new feature space to obtain the feature vectors of the sample pairs. , (3) In the difference measurement module, the difference in feature vectors of sample pairs is measured. To the origin The distance between the two sample pairs is used as a measure of their difference. This is used to distinguish whether two items belong to the same category, thereby enabling fault diagnosis in small samples. To improve the distinguishability of positive and negative pairs and thus enhance diagnostic performance, DCSN is committed to improving the distinguishability of positive and negative pairs. Pull it inside the inner boundary, and the negative will be... Extending beyond the outer boundary; in this case, the interval between the inner and outer boundaries (i.e., (This can separate positive and negative pairs.)

[0070] The sample augmentation module generates a large number of sample pairs by pairing two samples of the same class into positive pairs and two samples of different classes into negative pairs, thereby achieving the goal of training sample augmentation and enabling deep networks to extract more inter-class discriminative information during metric learning. This invention uses a cross-combination method to generate sample pairs, and the specific process is as follows.

[0071] First, all fault samples in the training set are paired up to form sample pairs, as shown in Table 1.

[0072] Table 1 Sample pairing

[0073]

[0074] In Table 1, the two samples enclosed in each set of square brackets form a sample pair. For each sample pair... ,if and If they belong to the same category, then it's a correct match, and the tag is... Otherwise, it is a negative pair, and the label is... 1; among which, This represents the sample index, and ; This represents the number of samples participating in the pairing.

[0075] Second, following the sample pairing method described above, all normal samples in the training set are paired, and all resulting sample pairs are positive pairs. Typically, the number of normal samples far exceeds the number of faulty samples, thus yielding more positive pairs than negative pairs.

[0076] Third, count the number of positive and negative pairs, and randomly select a certain number of normal samples from the training set to pair with all faulty samples in turn, so as to achieve a balance between positive and negative pairs, that is, the number of positive pairs is equal to the number of negative pairs.

[0077] The above sample pairing method brings the following two benefits: (1) By generating the same number of positive and negative pairs, class imbalance can be avoided from causing overfitting in deep models. (2) By using a carefully designed sample pairing strategy, instead of iteratively generating sample pairs from all samples, not only can all faulty samples be used efficiently, but also too many duplicate sample pairs will not be generated, for example [ 、[ This saves time on deep model optimization.

[0078] The feature learning module uses a Deep Concentric Siamese Network (DRN) to build a twin sub-network and learn features from the input training sample pairs. DRN is a popular neural network widely used in fault diagnosis. It is worth noting that the deep concentric twin network developed in this invention can also use other deep structures to build twin sub-networks, which will be further explored in future work.

[0079] The overall structure of the twin network constructed in this invention is as follows: Figure 3 As shown, this includes two branches with shared weights, and each branch is composed of DRNs. DRNs mainly consist of multiple stacked residual blocks. Each residual block includes two batch normalization (BN) layers, two rectified linear unit activation function (ReLU) layers, two convolutional layers (Conv), and one identity connection. Among them, BN can accelerate the training optimization process by forcing samples from different batches to have similar distributions; ReLU is a non-linear activation function that can enhance the expressive power of the network; Conv uses convolution operations instead of matrix multiplication operations, reducing the number of trainable weights, which makes network optimization easier; the identity connection adds the input information directly to the output of the subsequent layer through a shortcut connection, which brings two obvious advantages: (1) it can make the information flow more smoothly within the network, and (2) it can effectively alleviate the gradient vanishing and gradient exploding problems, increasing training stability. For the first The calculation process for each residual block is as follows.

[0080]

[0081] In the formula, , They represent the first The input and output of each residual block; This represents the residual learning function.

[0082] The feature learning process based on the DRN-based Siamese network is as follows. Sample pairs The samples are fed into two branches of the Siamese network, respectively. After passing through convolutional layers and multiple residual blocks, they are sequentially fed into a global average pooling (GAP) layer and a fully connected (FC) layer to finally obtain the feature vectors of the sample pairs. , It is worth noting that the weights of the two branches in a Siamese network are shared, which maps sample pairs to the same feature space, making the feature vectors of the sample pairs... , It is comparable, so as to distinguish whether the two belong to the same category.

[0083] The difference metric module uses a distance formula to calculate the difference between the feature vectors of two branches. To calculate the difference between two samples in a sample pair, the difference between their feature vectors is first calculated. As shown in the formula below.

[0084]

[0085] In the formula and They represent respectively , eigenvectors.

[0086] This invention uses Euclidean distance to calculate the difference of feature vectors. To the origin Distance as a measure of difference between sample pairs As shown in the formula below. It is worth noting that other distance metrics can also be used to measure the difference between sample pairs.

[0087]

[0088] In the formula, This represents Euclidean distance; The smaller the value, the smaller the difference between the two samples, and the greater the probability that they belong to the same category.

[0089] Deep concentric Siamese networks employ concentricity loss as the training optimization objective. Concentricity loss is a previous work of ours, dedicated to improving the discriminability between different classes. Here, concentricity loss is used to improve the discriminability between positive and negative difference measures, and can be rewritten as follows.

[0090]

[0091] In the formula, This represents sample pairs and their label values. For the correct answer, when When it is a negative pair; This represents the sample's index number; It is the common center of the inner boundary (corresponding to radius r) and the outer boundary (corresponding to radius R), and this invention sets it at the origin of the coordinate system; This represents a measure of the difference between sample pairs.

[0092] Considering the case where the sample pairs are positive pairs, i.e. At this point, the concentric loss is:

[0093]

[0094] If the correct difference measure If the value is greater than r, a large penalty will be imposed, until... Under this constraint, the direct difference measure It is confined to a feature space with an inner boundary radius less than r.

[0095] Considering the case where the sample pairs are negative pairs, i.e. At this point, the concentric loss is:

[0096]

[0097] If the difference measure of negative pairs If the value is greater than R, a large penalty will be imposed, until... Under this constraint, the difference measure of negative pairs Constrained to have an outer boundary radius greater than The feature space range.

[0098] Given a by A positive pair and The total concentricity loss for a dataset consisting of negative pairs. The calculation is as follows.

[0099]

[0100] In the formula, the first One is correct, then... One is a negative pair.

[0101] r and R (i.e., inner and outer boundaries) are two important parameters of concentric loss. Increasing the spacing between the inner and outer boundaries (i.e., Rr) can improve the distinguishability of positive and negative pairs. This invention will... Set as a constant, by evaluating the difference measure of sample pairs during training. The distribution is determined by adaptively reducing r to increase the spacing between the inner and outer boundaries. The specific settings are as follows.

[0102] First, initialize and set the inner boundary. outer boundary , This represents the maximum number of iterations. , These represent the number of positive and negative pairs in the training set, respectively.

[0103] Second, perform the first round of iterative optimization. .

[0104] Third, calculate the difference measure for all sample pairs and sort them in ascending order. .

[0105] Fourth, in Find the smallest of the three. The difference measure of each sample pair, and labeled as .

[0106] Fifth, judgment Is it less than If so, update Otherwise, no update will be performed.

[0107] Sixth, number of update iterations .

[0108] Seventh, determine whether the maximum number of iterations has been reached (i.e., If yes, end training and output. Otherwise, repeat steps three through seven.

[0109] As the network weights are continuously optimized and r is constantly updated, the difference measure of positive pairs is continuously mapped to a smaller inner boundary. In other words, the separation interval between positive and negative pairs gradually increases, and the separation between the two becomes more obvious.

[0110] Deep concentric twin networks employ a minimum difference strategy for small-sample fault diagnosis. Specifically, firstly, test samples and known samples are paired; then, the sample pairs are input into the network to obtain the difference measure of the sample pairs; finally, the known sample class with the smallest difference measure of the sample pair is used as the class of the test sample, thus achieving fault diagnosis.

[0111] There are two strategies for pairing test samples with known samples: one-shot testing and N-shot testing. In one-shot testing, the test sample is first paired with one known sample from each class; then, the pair is fed into the network to obtain a difference measure; finally, the class of the known sample with the smallest difference measure is assigned to the test sample. In N-shot testing, the test sample is first paired with N known samples from each class; then, the pair is fed into the network to obtain a difference measure; finally, the class of the known sample with the smallest difference measure is assigned to the test sample. Figure 4 As shown in (b).

[0112] This invention employs an N-shot testing strategy to form sample pairs, with N=5; if the number of known samples is less than 5, then N equals the actual number of known samples. The N-shot testing strategy is adopted for the following reasons: in a one-shot test, if the selected single sample deviates from the distribution of its category, the resulting diagnosis may be poor; conversely, the N-shot testing strategy can better avoid the problems inherent in one-shot testing.

[0113] This invention employs several techniques to improve diagnostic performance under small sample conditions for ship main engine fault diagnosis based on DCSN. First, the positive and negative pairs are balanced in the sample amplification module. This not only obtains a large number of training samples to help the deep model learn more inter-class discriminative information but also effectively avoids overfitting caused by class imbalance. Second, a Siamese network is constructed using a network structure with identity connections. This not only helps retain more useful information but also reduces the risk of gradient vanishing and gradient exploding during model training. Third, concentric loss is used as the optimization objective. By adaptively increasing the gap between the inner and outer boundaries during training, the deep model can automatically learn inter-class discriminative features during training. Fourth, an N-shot testing strategy is used to form sample pairs and fault diagnosis is achieved based on the principle of minimum difference, which is beneficial for obtaining good diagnostic performance. The specific steps of the DCSN-based ship main engine small sample diagnosis process are as follows: Figure 5 As shown.

[0114] First, data is obtained from ship main engine monitoring data and divided into training and testing sets.

[0115] Second, the samples in the training set are grouped into sample pairs.

[0116] Third, perform the first round of optimization. Set optimization-related parameters, such as learning rate and maximum number of iterations, and then begin the optimization process.

[0117] Fourth, use the network to learn features from the training sample pairs to obtain the feature vectors of the training sample pairs.

[0118] Fifth, calculate the difference between the feature vectors of the sample pairs.

[0119] Sixth, calculate the difference measure of sample pairs.

[0120] Seventh, calculate the concentricity loss.

[0121] Eighth, the gradient descent method is used to optimize the network weights to minimize the concentricity loss.

[0122] Ninth, update the inner boundary radius.

[0123] Tenth, determine whether the number of optimization iterations has reached the set maximum number of iterations. If yes, obtain the optimized DCSN; otherwise, return to step four.

[0124] During the testing phase, the optimized network is used for fault diagnosis, and the specific process is as follows: First, an N-shot Test strategy is adopted to form sample pairs between test samples in the test set and known samples in the training set. Then, the sample pairs are input into the optimized DCSN to obtain the difference measure of the sample pairs. Finally, based on the principle of minimum difference value, the class of the known sample with the smallest difference measure of the sample pair is taken as the class of the test sample, thus achieving small-sample fault diagnosis.

[0125] Example:

[0126] This example uses a publicly available marine main engine failure dataset to validate the effectiveness of the proposed method. The dataset was supplemented with Gaussian white noise with a signal-to-noise ratio of 60 dB. As shown in Table 2, the dataset consists of 3500 samples, encompassing eight health states: normal (no fault), cylinders 1 through 6 faulty, and all cylinders faulty. For single-cylinder faults, the fault is caused by a decrease in cylinder compression ratio or a reduction in fuel injection quantity; for all cylinder faults, it is caused by a decrease in intake manifold pressure. Each sample includes an 84-dimensional feature vector, which was obtained through analysis of cylinder pressure curves and main engine flywheel torsional vibration signals.

[0127] Table 2. Health Status Considered

[0128] health status Tag value Sample size Number of training samples (per class) Normal (all cylinders are functioning correctly) H 250 100 Cylinder No. 1 malfunction F1 500 3、6、9、12、15 Cylinder No. 2 malfunction F2 500 3、6、9、12、15 Cylinder No. 3 malfunction F3 500 3、6、9、12、15 Cylinder No. 4 malfunction F4 500 3、6、9、12、15 Cylinder No. 5 malfunction F5 500 3、6、9、12、15 Cylinder No. 6 malfunction F6 500 3、6、9、12、15 All cylinders failed F7 250 3、6、9、12、15

[0129] This example simulates a series of small-sample fault diagnosis scenarios by varying the number of fault samples in the training. As shown in Table 1, this example first randomly selects 100 samples from the normal samples, and then randomly selects 3, 6, 9, 12, and 15 samples from each type of fault sample to form the training set to simulate five small-sample scenarios. The remaining samples are used as the test set to evaluate the model performance.

[0130] This example compares the proposed method with four popular deep learning methods, as follows:

[0131] CNN: Convolutional Neural Networks are a popular deep learning model that uses convolution operations instead of matrix multiplication, greatly reducing the number of trainable weight parameters. Treating small-sample diagnostic problems as multi-class classification tasks, cross-entropy loss is used as the optimization objective.

[0132] DRN (Residual Neural Network): By introducing cross-layer connections, residual neural networks reduce the risk of gradient explosion and vanishing during training, making the model easier to train. Treating the small-sample diagnostic problem as a multi-class classification task, cross-entropy loss is used as the optimization objective.

[0133] DSN-CNN: It uses a CNN to build a twin network and uses contrastive loss as the optimization objective.

[0134] DSN-DRN: A twin network is built using DRN, and contrastive loss is used as the optimization objective.

[0135] Furthermore, this invention also employs a CNN to build a Siamese network and uses concentric loss. A variant model, abbreviated as DCSN-CNN, is built as the optimization objective to investigate the impact of different network architectures on the diagnostic performance of DCSN. In the remainder of this invention, a Siamese network will be built using DRN, and concentric loss will be employed. The model used as the optimization objective is labeled DCSN-DRN.

[0136] All methods used in this example were developed using PyTorch 1.8.0. All experiments were conducted on a computer equipped with an Intel® Core™ i9-9900K CPU @ 3.60GHz and an NVIDIA GeForce RTX 2080Ti GPU.

[0137] The hyperparameters related to the CNN and DRN network structures used in this example are summarized in Table 3. CBU represents a convolutional block, and RBU represents a residual block. The difference between them is that RBU has cross-layer connections, while CBU does not. Each CBU and RBU consists of two convolutional layers (Conv), two activation function layers (ReLU), and two batch normalization layers (BN), connected in the following order: BN-ReLU-Conv-BN-ReLU-Conv. Considering the different parameters of the convolutional layers in CBU and RBU, the table lists the four main parameters of the two convolutional layers (Conv) in each CBU and RBU in square brackets: number of input channels, number of output channels, kernel size, and kernel stride. For fully connected layers (FC), the two parameters represent the number of input neurons and the number of output neurons, respectively. The output feature map is represented in 3D (number of channels × width × height) or 1D (one-dimensional feature vector). The input feature vector of this invention is a one-dimensional feature vector, and the height of the convolution kernel is 1, that is, the size of the convolution kernel is 1×3. The number of output neurons in the FC layer is 2, so as to intuitively visualize the distribution of deep features of different fault types in 2D feature space.

[0138] Table 3. Structure-related hyperparameters

[0139] quantity CNN DRN Conv / FC Output size 1 Input Input / 1×84×1 1 Conv Conv [1,4,1×3,1] 4×84×1 1 CBU_1 RBU_1 [4,8,1×3,2][8,8,1×3,1] 8×42×1 1 CBU_2 RBU_2 [8,8,1×3,1][8,8,1×3,1] 8×42×1 1 CBU_3 RBU_3 [8,16,1×3,2][16,16,1×3,1] 16×21×1 1 CBU_4 RBU_4 [16,16,1×3,1][16,16,1×3,1] 16×21×1 1 CBU_5 RBU_5 [16,16,1×3,2][16,16,1×3,1] 16×11×1 1 CBU_6 RBU_6 [16,16,1×3,1][16,16,1×3,1] 16×11×1 1 CBU_7 RBU_7 [16,8,1×3,2][8,8,1×3,1] 8×6×1 1 CBU_8 RBU_8 [8,8,1×3,1][8,8,1×3,1] 8×6×1 1 GAP GAP / 8 1 FC FC [8,2] 2

[0140] The hyperparameters related to training are set as follows: The initial learning rate is 0.001, and it decays to one-tenth of its original value every 60 iterations. The number of training iterations is 200. The batch size is 256. The Adam optimizer is used, and the L2 regularization decay coefficient is set to 0.0001. Dropout is used to process the output of GAP, and the dropout rate is set to 0.5 to reduce the risk of overfitting. The inner boundary radius r in the concentricity loss is initialized to 0.1, and the outer boundary radius... The value is 2.

[0141] Performance comparison of different methods:

[0142] To avoid the influence of random factors, all experiments were repeated 5 times. Three evaluation metrics—Precision, Recall, and F1-score—were used to present the experimental results, as shown in Tables 1-2 and Figures 3-4. Each diagnostic result in the tables was determined by… To describe, The mean and standard deviation of the results from the five repeated experiments are represented, with the best result marked in bold; "Num" represents the number of fault samples of each class in the training set.

[0143] Table 4. Precision (%) of different methods

[0144] method Num = 3 Num = 6 Num = 9 Num = 12 Num = 15 CNN 18.92 ± 5.93 27.78 ± 14.27 27.30 ± 14.43 30.19 ± 14.71 30.26 ± 16.27 DRN 12.53 ± 0.07 21.73 ± 6.99 31.79 ± 7.55 36.39 ± 6.74 43.56 ± 9.53 DSN-CNN 41.36 ± 5.14 70.36 ± 10.93 71.48 ± 3.65 84.88 ± 1.31 88.01 ± 2.31 DSN-DRN 59.77 ± 4.63 83.36 ± 2.67 84.45 ± 1.41 88.55 ± 0.80 92.58 ± 0.63 DCSN-CNN 44.87 ± 8.74 73.01 ± 7.11 73.26 ± 13.81 84.05 ± 1.39 89.53 ± 2.25 DCSN-DRN 60.52 ± 3.11 83.80 ± 1.28 85.10 ± 0.77 89.39 ± 0.83 92.73 ± 0.80

[0145] Table 5 Recall (%) for different methods

[0146] method Num = 3 Num = 6 Num = 9 Num = 12 Num = 15 CNN 7.32 ± 7.74 14.17 ± 15.31 15.33 ± 14.03 18.43 ± 14.18 21.20 ± 14.93 DRN 3.06 ± 5.59 19.90 ± 15.85 38.08 ± 19.41 40.80 ± 11.51 42.60 ± 14.20 DSN-CNN 42.33 ± 4.36 72.02 ± 10.96 73.37 ± 3.22 86.72 ± 1.17 90.44 ± 2.75 DSN-DRN 62.20 ± 3.69 86.49 ± 2.45 87.62 ± 1.21 92.01 ± 0.76 95.05 ± 0.63 DCSN-CNN 46.09 ± 6.71 75.32 ± 6.78 75.71 ± 12.45 85.86 ± 1.46 92.16 ± 2.00 DCSN-DRN 63.74 ± 3.14 87.10 ± 1.55 88.37 ± 1.08 92.53 ± 0.63 95.43 ± 0.80

[0147] Table 6. F1-score (%) for different methods

[0148] method Num = 3 Num = 6 Num = 9 Num = 12 Num = 15 CNN 6.41 ± 5.45 14.66 ± 13.66 13.20 ± 11.69 17.79 ± 13.8 17.47 ± 14.75 DRN 1.12 ± 0.13 11.10 ± 7.76 23.20 ± 10.08 28.05 ± 8.72 34.51 ± 12.52 DSN-CNN 37.89 ± 4.22 68.74 ± 11.57 71.25 ± 3.64 84.76 ± 1.41 88.13 ± 2.62 DSN-DRN 56.41 ± 4.04 82.64 ± 2.72 83.63 ± 1.30 88.94 ± 0.93 92.98 ± 0.63 DCSN-CNN 41.71 ± 8.16 72.47 ± 7.09 73.38 ± 13.27 84.06 ± 1.37 89.95 ± 2.41 DCSN-DRN 58.18 ± 2.73 83.19 ± 1.68 84.48 ± 1.30 89.79 ± 0.75 93.28 ± 0.92

[0149] From Tables 4 to 6 and Figure 6 Experimental results show that, with different numbers of fault samples, the proposed method DCSN-DRN outperforms other competing methods in terms of precision, recall, and F1-score. Specifically, in terms of average precision, DCSN-DRN achieves improvements of 55.42%, 53.11%, 11.09%, 0.57%, and 9.36% compared to CNN, DRN, DSN-CNN, DSN-DRN, DCSN-CNN, and DCSN-DRN, respectively; in terms of average recall, DCSN-DRN achieves improvements of 70.14%, 56.55%, 12.46%, 0.76%, and 10.41%; and in terms of average F1-score, DCSN-DRN achieves improvements of 67.88%, 62.19%, 11.63%, 0.86%, and 9.47%. These improvements in diagnostic performance indicate that DCSN can learn inter-class discriminative information from the input data.

[0150] As the number of training samples increases, the diagnostic performance of all considered methods gradually improves. Specifically, when there are 3 training samples, traditional deep models (i.e., CNN, DRN) have a precision of less than 20%, and recall and F1-score of less than 10%, while the proposed DCSN-DRN achieves the best diagnostic performance under this condition, with a precision of 60.52%, a recall of 63.74%, and an F1-score of 58.18%, respectively. When the number of training samples increases to 12, the recall of both DCSN-DRN and DSN-DRN exceeds 90%, and the proposed DCSN-DRN achieves the highest recall of 92.53%.

[0151] It's noteworthy that, compared to traditional deep CNN and DRN methods, DSN-based and DCSN-based methods achieved superior diagnostic results using only 3 training samples, surpassing the highest diagnostic performance of traditional CNN and DRN methods that used 15 training samples. This is primarily because DSN-based and DCSN-based methods increased the number of training samples through pairing. Specifically, although there are only 3 training samples, pairing them together creates a significantly larger number of training samples for DSN-based and DCSN-based methods than just 3. This increase in training samples helps deep models learn more inter-class discriminative features from the input data, thereby improving diagnostic performance and highlighting the necessity of training sample augmentation.

[0152] Figure 7 The confusion matrices of different methods on the test samples are presented when the number of fault samples of each class in the training set is 15. Figures (a) and (c) show that CNN and DRN predict most other class samples as class H (i.e., the normal class), especially in CNN, where almost all classes are predicted as class H. This is because there are many class H samples (100 samples), while there are few fault samples in other classes (15 samples per class), resulting in insufficient discriminative information learned by CNN and DRN, leading to a tendency to predict the minority class as the majority class. Figures (b)-(c) and (e)-(f) show that the diagnostic performance of methods based on DSN and DCSN is significantly better than that of traditional CNN and DRN. This is because methods based on DSN and DCSN increase the number of training samples through training sample pairing, helping the deep model learn more discriminative information and thus improving diagnostic performance. Overall, DCSN-DRN exhibits better diagnostic performance, which is attributed to the following two factors: (1) the use of cross-layer connections to construct a Siamese network helps to learn more discriminative information; and (2) the use of concentric loss as an optimization function improves inter-class distinguishability. Therefore, DCSN-DRN can achieve good diagnostic performance.

[0153] Figure 8The results show the visualization of deep features of different methods on the test samples when the number of fault samples of each class is 15 on the training set. Figures (a) and (c) show severe overlap between samples of some classes. This is because the deep features learned by CNN and DRN lack sufficient discriminative power, making it difficult to distinguish samples of different classes. In Figures (b)-(c) and (e)-(f), class H (i.e., normal samples) is distributed within a relatively small range; classes F1 to F7 extend outwards from the edge of the normal sample distribution, which is related to the different severity of cylinder faults. Specifically, early fault symptoms are weak, and the features of fault samples are relatively similar to those of normal samples; as the severity of cylinder faults intensifies, they gradually exhibit features different from normal samples, thus the separation between fault samples of different severity and normal samples becomes increasingly obvious. In other words, the more severe the fault sample, the further it is from the normal sample. It is noted that the distribution of F1 to F6 categories (corresponding to cylinder fault samples 1 to 6) roughly presents two elongated stripes, eventually clustering together. This is because each cylinder fault is caused by either a decrease in cylinder compression ratio or a reduction in fuel injection quantity, and faults caused by different reasons exhibit different sample distribution variations. F7 category, caused by a decrease in intake manifold pressure, presents a roughly separate elongated strip. Overall, in Figure (f), the sample separation between different categories is more pronounced, with less inter-class overlap. This indicates that the proposed DCSN-DRN can effectively learn inter-class distinguishability features from the input data, thereby improving diagnostic performance.

[0154] Performance comparison between DCSN-based and DSN-based methods:

[0155] DCSN-based methods, namely DCSN-CNN and DCSN-DRN, use concentricity loss as the optimization objective; DSN-based methods, namely DSN-CNN and DSN-DRN, use contrastive loss as the optimization objective. It is worth noting that both DCSN-based and DSN-based methods use the same training samples, generating the same positive and negative pairs through the same sample pairing method.

[0156] From Tables 4 to 6 and Figure 6Experimental results show that, with different numbers of fault samples, DCSN-DRN outperforms DSN-DRN in Precision, Recall, and F1-score; similarly, DCSN-CNN outperforms DSN-CNN in Precision, Recall, and F1-score in most cases. Furthermore, in terms of average Precision, average Recall, and average F1-score, DCSN-DRN achieves performance improvements of 0.57%, 0.76%, and 0.86% compared to DSN-DRN, respectively; and 1.73%, 2.05%, and 2.16% compared to DSN-CNN, respectively. The DCSN-based method achieves better diagnostic performance than its corresponding DSN-based method, mainly because the DCSN-based method uses concentricity loss as the optimization objective. Specifically, concentricity loss pulls the difference measure of positive pairs into the inner boundary and pushes the difference measure of negative pairs out of the outer boundary, improving the distinguishability between positive and negative pairs. In other words, concentricity loss improves diagnostic performance by enhancing inter-class distinguishability. Meanwhile, the improved diagnostic performance achieved by the DCSN-based method also reveals the usefulness of concentric loss.

[0157] Performance comparison between DCSN-based methods:

[0158] This example uses two popular network architectures (CNN and DRN) to construct a Siamese network, aiming to investigate the impact of different network architectures on diagnostic performance. There are two DCSN-based methods: DCSN-CNN and DCSN-DRN. They are similar in that they both use concentricity loss as the optimization objective. The difference lies in that DCSN-DRN uses a network structure with cross-layer connections to construct the Siamese network, while DCSN-CNN does not.

[0159] From Tables 4 to 6 and Figure 6The experimental results show that DCSN-DRN outperforms DCSN-CNN in Precision, Recall, and F1-score when using different numbers of fault samples. Specifically, in terms of average Precision, average Recall, and average F1-score, DCSN-DRN achieves performance improvements of 9.36%, 10.41%, and 9.47% compared to DCSN-CNN, respectively. This is mainly because DCSN-DRN adopts a network structure with cross-layer connections, which brings the following advantages: (1) Cross-layer connections allow the network to perform identity mappings, which helps information to be transmitted more smoothly within the network; (2) Cross-layer connections reduce the risk of gradient explosion and gradient vanishing during model training, making deep models easier to train. Thanks to the benefits brought by cross-layer connections, DCSN-DRN can achieve better diagnostic performance than DCSN-CNN. These improvements in diagnostic performance also demonstrate the importance of feature learning networks. In the future, more advanced feature learning networks can be used to construct deep concentric twin networks to further improve diagnostic performance.

[0160] Furthermore, it's noted that when there are 3 or 6 fault samples per class in the training set, CNN outperforms DRN in terms of precision and F1-score, but both are below 30%. A plausible reason is that with fewer training samples, deep models struggle to learn discriminative features from the samples, and their predictions may be highly random. In this case, comparing the diagnostic performance of CNN and DRN is not representative. As the number of training samples increases, DRN's diagnostic performance significantly improves compared to CNN. In terms of average precision, average recall, and average F1-score, DRN achieves improvements of 2.31%, 13.60%, and 5.69% respectively compared to CNN. These improvements indicate that DRN has a better feature learning ability than CNN in learning discriminative features.

[0161] This invention proposes a Deep Concentric Twin Network (DCSN-DRN) to improve the diagnostic performance of ship main engine faults under small sample conditions by enhancing inter-class discriminability during metric learning. The novelty lies in using concentric loss as the optimization objective of DCSN-DRN, explicitly forcing the deep model to learn inter-class discriminability features. Furthermore, by evaluating the distribution of sample pair difference measures during training, the inner boundary radius is adaptively set, which helps to adaptively increase the gap between the inner and outer boundaries during end-to-end training, thereby improving the discriminability of positive and negative pairs. In addition, DCSN-DRN employs several techniques to improve diagnostic performance: (1) by balancing the number of positive and negative pairs in training sample pairings, the risk of overfitting in the deep model is reduced; (2) a network structure with identity mapping is used to build twin subnetworks, reducing the risk of gradient vanishing and gradient exploding during training; (3) an N-shot testing strategy is used to form sample pairs, avoiding the problem of individual samples selected in the one-shot testing strategy deviating from their class distribution. Therefore, DCSN-DRN can achieve good diagnostic performance under small sample conditions.

[0162] The usefulness of DCSN-DRN was validated by comparing its performance with popular deep learning methods on a marine main engine failure dataset. Under different numbers of failure samples, DCSN-DRN achieved the highest diagnostic performance among the five methods considered, in terms of Precision, Recall, and F1-score. Specifically, in terms of average Precision, DCSN-DRN achieved improvements of 55.42%, 53.11%, 11.09%, 0.57%, and 9.36% compared to CNN, DRN, DSN-CNN, DSN-DRN, DCSN-CNN, and DCSN-DRN, respectively; in terms of average Recall, DCSN-DRN achieved improvements of 70.14%, 56.55%, 12.46%, 0.76%, and 10.41%; and in terms of average F1-score, DCSN-DRN achieved improvements of 67.88%, 62.19%, 11.63%, 0.86%, and 9.47%. These improvements in diagnostic performance demonstrate that the proposed DCSN-DRN can learn strong inter-class discriminative information to achieve good diagnostic performance even with small sample sizes. Finally, the proposed DCSN-DRN is not only applicable to fault diagnosis of ship main engines under small sample conditions, but also to pattern recognition tasks of other mechanical equipment under small sample conditions, such as fault diagnosis of wind turbines and aero engines.

Claims

1. A method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks, characterized in that, This includes the establishment and application of deep concentric twin networks. The deep concentric twin network comprises sample augmentation, feature learning, and difference measurement. In the sample augmentation stage, samples of different categories are combined to form sample pairs. As input to the network; during the feature learning stage, the input sample pairs are used as input. Mapping to a new feature space to obtain the feature vectors of the sample pairs. , In the difference measurement phase, the difference in feature vectors between sample pairs is measured. To the origin The distance between the two sample pairs is used as a measure of their difference. This is used to distinguish whether the two belong to the same category, thereby achieving fault diagnosis in small samples; In deep concentric twin networks and In other words, the inner and outer boundaries are two important parameters of concentric loss. Increasing the spacing between the inner and outer boundaries... This improves the distinguishability of positive and negative pairs. Set as a constant, by evaluating the difference measure of sample pairs during training. The distribution comes from adaptive reduction To increase the spacing between the inner and outer boundaries, the specific steps are as follows: First, initialize and set the inner boundary. outer boundary , This represents the maximum number of iterations. , These represent the number of positive and negative pairs in the training set, respectively. Second, perform the first round of iterative optimization. ; Third, calculate the difference measure for all sample pairs and sort them in ascending order. ; Fourth, in Find the smallest of the three. The difference measure of each sample pair, and labeled as ; Fifth, judgment Is it less than If so, update Otherwise, no update will be performed; Sixth, number of update iterations ; Seventh, determine if the maximum number of iterations has been reached; if so, end training and output. Otherwise, repeat steps three through seven. As network weights are continuously optimized As it is continuously updated, the difference measure of positive pairs is continuously mapped to a smaller inner boundary, that is, the separation interval between positive and negative pairs gradually increases, and the separation between the two becomes more obvious. The deep concentric Siamese network uses concentric loss as the training optimization objective. The concentric loss is used to improve the discriminability between positive and negative difference measures, as shown in the following equation: , In the formula, This represents sample pairs and their label values. For the correct answer, when When it is a negative pair; This represents the sample's index number; The radius corresponding to the inner boundary is The radius corresponding to the outer boundary is The common center of is set at the origin of the coordinate system; This represents a measure of the difference between sample pairs.

2. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 1, characterized in that, The sample amplification stage specifically includes: Step 1: Combine all faulty samples in the training set into pairs to form sample pairs. For each sample pair... ],if and If they belong to the same category, then it's a correct match, and the tag is... Otherwise, it is a negative pair, and the label is... 1; among which, This represents the sample index, and ; The number of samples participating in the pairing; Step 2: Form sample pairs from all normal samples in the training set, and ensure that all sample pairs are positive pairs; Step 3: Count the number of positive and negative pairs, and randomly select a certain number of normal samples from the training set to pair with all faulty samples in turn, so as to achieve a balance between positive and negative pairs, that is, the number of positive pairs is equal to the number of negative pairs.

3. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 1, characterized in that, The feature learning stage uses a DRN to build a Siamese sub-network to learn features from the input training sample pairs. This includes two branches with shared weights, each composed of a DRN. The DRN consists of multiple stacked residual blocks, each containing two batch normalization (BN) layers, two rectified linear unit (ReLU) activation function layers, two convolutional (Conv) layers, and an identity connection. BN accelerates the training and optimization process by forcing samples from different batches to have a similar distribution. ReLU is a non-linear activation function to enhance the network's expressive power. Conv uses convolution operations instead of matrix multiplication, reducing the number of trainable weights and making network optimization easier. The identity connection directly adds the input information to the output of subsequent layers through cross-layer connections. The calculation process for each residual block is as follows: , In the formula, , They represent the first The input and output of each residual block; This represents the residual learning function. Furthermore, the feature learning process is as follows: Sample pairs The samples are fed into two branches of the Siamese network, respectively. After passing through convolutional layers and multiple residual blocks, they are sequentially fed into a global average pooling (GAP) layer and a fully connected (FC) layer to finally obtain the feature vectors of the sample pairs. , In this Siamese network, the weights of the two branches are shared, mapping sample pairs to the same feature space, such that the feature vectors of the sample pairs are... , It is comparable, so as to distinguish whether the two belong to the same category.

4. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 1, characterized in that, The difference measurement phase uses a distance formula to calculate the difference between the feature vectors of two branches. To calculate the difference between two samples in a sample pair, the difference between their feature vectors is first calculated. As shown in the formula below: , In the formula, and They represent respectively , The feature vectors are used to calculate the difference between the feature vectors using Euclidean distance. To the origin Distance as a measure of difference between sample pairs As shown in the formula below, other distance metrics are used to measure the difference between sample pairs. , In the formula, This represents Euclidean distance; The smaller the value, the smaller the difference between the two samples, and the greater the probability that they belong to the same category.

5. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 1, characterized in that, Considering the case where the sample pairs are positive pairs, i.e. At this point, the concentric loss is: , If the correct difference measure Greater than They will be severely punished, until... Under this constraint, the corresponding difference measure Constrained to have an inner boundary radius smaller than In the feature space; Considering the case where the sample pairs are negative pairs, i.e. At this point, the concentric loss is: , If the difference measure of negative pairs Greater than They will be severely punished, until... Under this constraint, the difference measure of negative pairs Constrained to have an outer boundary radius greater than The feature space range; Given a by A positive pair and The total concentricity loss for a dataset consisting of negative pairs. The calculation is as follows: , In the formula, the first One is correct, then... One is a negative pair.

6. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 1, characterized in that, The deep concentric twin network employs a minimum difference strategy for small-sample fault diagnosis. Specifically, firstly, test samples and known samples are paired; then, the sample pairs are input into the network to obtain the difference measure; finally, the known sample category with the smallest difference measure is chosen as the category of the test sample, thus achieving fault diagnosis. There are two strategies for pairing test samples and known samples: one-shot testing and N-shot testing. In one-shot testing, the test sample is paired with one sample from each known category; then, the sample pairs are input into the network to obtain the difference measure; finally, the known sample category with the smallest difference measure is chosen as the category of the test sample. In N-shot testing, the test sample is paired with N samples from each known category; then, the sample pairs are input into the network to obtain the difference measure; finally, the known sample category with the smallest difference measure is chosen as the category of the test sample.

7. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 6, characterized in that, The N-shot testing strategy is used to form sample pairs, and N=5; if the number of known samples is less than 5, then N is equal to the actual number of known samples. The N-shot testing strategy is adopted for the following reasons: in one-shot testing, if the selected single sample deviates from the distribution of the category to which the sample belongs, the diagnostic results will be poor; conversely, the N-shot testing strategy can avoid the problems of one-shot testing.

8. The method for small-sample fault diagnosis of ship main engines based on deep concentric twin networks according to claim 1, characterized in that, The application of deep concentric twin networks includes the following steps: Step A: Obtain data from the ship's main engine monitoring data and divide the data into training and testing sets; Step B: Form sample pairs from the samples in the training set; Step C: Perform the first round of optimization, set the relevant optimization parameters, including the learning rate and the maximum number of iterations, and then start the optimization process; Step D: Use the network to learn features from the training sample pairs to obtain the feature vectors of the training sample pairs; Step E: Calculate the difference between the feature vectors of the sample pairs; Step F: Calculate the difference measure for the sample pairs; Step G: Calculate the concentricity loss and use gradient descent to optimize the network weights to minimize the concentricity loss; Step H: Update the inner boundary radius. Step I: Determine whether the number of optimization iterations has reached the set maximum number of iterations. If yes, obtain the optimized DCSN; otherwise, return to step D. Step J: Fault diagnosis using the optimized network: First, adopt the N-shot Test strategy to form sample pairs with test samples in the test set and known samples in the training set. Then, input the sample pairs into the optimized DCSN to obtain the difference measure of the sample pairs. Finally, according to the principle of minimum difference value, take the category of the known sample with the smallest difference measure of the sample pair as the category of the test sample to achieve small sample fault diagnosis.