High-speed railway adjacent steel structure vibration anomaly detection method based on contrast learning
By comparing the learning framework and the dynamic normal memory, the problems of high false alarm rate and high false negative rate in the detection of vibration anomalies in adjacent steel structures of high-speed railways are solved. High-sensitivity detection with non-destructive annotation is achieved, improving the accuracy and robustness of the detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO UNIV OF TECH
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
AI Technical Summary
In the detection of vibration anomalies in steel structures adjacent to high-speed railways, existing technologies rely on a large number of damage-labeled samples for supervised learning, which is difficult to obtain. Unsupervised learning cannot distinguish between normal large-amplitude vibrations and minor anomalies caused by structural deterioration, and it does not make full use of the physical prior characteristics of the vibration transmission path, resulting in high false alarm rates and high false negative rates.
A contrastive learning framework is adopted. By constructing a dual-channel feature encoding module and a dynamic normal memory, positive and negative sample pairs are constructed for contrastive learning training using the vibration transmission path invariance to optimize the current encoder. Combined with multi-scale convolution and bidirectional LSTM network, deep spatiotemporal features are extracted, and an anomaly detection mechanism is adopted.
It achieves highly sensitive anomaly detection without damaging labeled samples, reduces false alarm and false negative rates, improves detection accuracy and robustness, adapts to complex operating environments, and provides reliable basis for structural health monitoring decisions.
Smart Images

Figure CN122241512A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of structural health monitoring and signal processing technology, specifically relating to a method for detecting abnormal vibrations in adjacent steel structures of high-speed railways based on contrastive learning. Background Technology
[0002] As a critical national infrastructure, the health of adjacent steel structures in high-speed railways directly impacts the operational safety and service life of the lines. With increasing train speeds and density, the track-subgrade-structure system exhibits complex vibration responses under high-frequency, high-amplitude dynamic excitation. Vibration signal-based structural health monitoring technology has become a crucial means of ensuring high-speed railway safety. The core of this technology is to analyze the dynamic response of the structure under external excitation to identify abnormal states such as stiffness degradation and loose connections, enabling early warning and risk prevention.
[0003] Data-driven intelligent anomaly detection methods have attracted much attention. Existing technologies are divided into supervised learning and unsupervised learning, but both have significant limitations: supervised methods rely on a large number of labeled "normal-damage" samples, while damage events are rare and difficult to reproduce in real engineering, resulting in high labeling costs and limiting the practicality of the models; unsupervised methods do not require labels but only model the statistical distribution of data, failing to distinguish between normal large-amplitude vibrations caused by changes in train operating conditions and dangerous minor anomalies caused by structural deterioration, easily leading to false alarms and missed detections. In addition, existing technologies ignore the physical prior characteristics of vibration transmission paths and do not incorporate the stable mapping relationship between excitation and response under healthy conditions into the model, resulting in insufficient sensitivity to fundamental structural changes. The self-supervised representation capability of contrastive learning has not been fully applied in this field. Therefore, there is an urgent need for a self-supervised detection method that does not require damage labels, can utilize path invariance, and can decouple operating condition interference from structural anomalies to achieve high-sensitivity, low-false-alarm-rate vibration anomaly identification. Summary of the Invention
[0004] To achieve the above objectives, the present invention employs the following technical solution: This invention provides a method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning, comprising the following steps: S1. Using the train passing through a specific monitoring section as a trigger event, simultaneously collect track excitation signals and vibration response signals of the steel structure to construct excitation-response sample pairs; S2. Construct a dual-channel feature encoding module, which includes a pre-trained baseline encoder and a current encoder. Both the pre-trained baseline encoder and the current encoder include two parallel sub-networks: an activation encoding sub-network and a response encoding sub-network. The structures of the pre-trained baseline encoder and the current encoder are completely identical. The activation-response sample pairs are processed by the dual-channel feature encoding module to obtain baseline feature pairs and current feature pairs. S3. Construct a dynamic normal memory library. Based on the physical prior of the invariance of vibration transmission path, construct positive and negative sample pairs for comparative learning and training, optimize the parameters of the current encoder, and obtain the trained current encoder. S4. For a new sample to be detected, after extracting the feature vector through the trained dual-channel encoder, the cross-channel feature similarity is calculated as the normality score; the trained dual-channel encoder includes a pre-trained baseline encoder and a trained current encoder. S5. Employ a dynamic threshold mechanism for anomaly detection and maintain an updated dynamic normal memory.
[0005] Furthermore, step S1 specifically includes: Track excitation signals are acquired using a piezoelectric accelerometer. Deployment of key steel structure nodes Three triaxial accelerometers simultaneously acquire vibration response signals. , Then, the Z-score normalization method is used to process the collected signals to obtain standardized data; the standardized track excitation signals of the same train passing event are then processed. and standardized vibration response signals of each steel structure measuring point Pairing samples together creates a set of stimulus-response pairs. , in, This indicates the total number of response measurement points for the steel structure.
[0006] This invention employs a differentiated design with dual parallel sub-networks. The excitation encoding sub-network utilizes a multi-scale parallel convolution module, while the response encoding sub-network adopts a hybrid architecture of serial 1D convolution and bidirectional LSTM, respectively adapting to the inherent characteristics of the two types of signals. This addresses the problem of traditional feature extraction networks having poor feature adaptability to excitation signals (wideband impact characteristics) and response signals (temporal decay characteristics), failing to accurately extract the core dynamic features of vibration transmission, and having insufficient feature representation capabilities.
[0007] Furthermore, the activation encoding subnetwork in step S2 includes a multi-scale parallel convolutional module, a feature concatenation layer, a first 1D convolutional layer, a global average pooling layer, and a fully connected layer; The multi-scale parallel convolution module includes a first convolution branch, a second convolution branch, and a third convolution branch, with kernel sizes of 3×3, 5×5, and 7×7, respectively, and a stride of 2 for each; the formula is expressed as follows: , in, Indicates the input signal; This represents the output features of the convolution branch; The output features of each convolutional branch are concatenated along the channel dimension to obtain the concatenated features. The splicing features After adjusting the number of channels in the 1D convolutional layer, the input is globally averaged in the time dimension and then processed by the fully connected layer to obtain the output of the stimulus coding subnetwork. The subnetwork parameters of the pre-trained baseline encoder are fixed parameters after training; the stimulus-response sample pairs The normalized track excitation signal is input to the pre-trained baseline encoder. After processing by the excitation encoding sub-network, the baseline excitation feature vector is obtained. Standardized vibration response signal The baseline response feature vector is obtained through the response encoding subnetwork. ; Forming a benchmark feature pair .
[0008] Further, the response encoding sub-network in step S2 includes a first 1D convolutional layer, a second 1D convolutional layer, a third 1D convolutional layer, a dimensionality reduction layer, a bidirectional LSTM layer, a temporal pooling layer, and a fully connected layer; the kernel sizes of the first 1D convolutional layer, the second 1D convolutional layer, and the third 1D convolutional layer are 7×7, 3×3, and 3×3, respectively, and the stride is 2 for each. The input features of the response coding subnetwork are passed through a first 1D convolutional layer to obtain first 1D convolutional features; the first 1D convolutional features are passed through a second 1D convolutional layer to obtain second 1D convolutional features; the second 1D convolutional features are passed through a third 1D convolutional layer to obtain third 1D convolutional features; the third 1D convolutional features are passed through a dimensionality reduction layer for global average pooling and dimensionality compression to obtain dimensionality-reduced features; the dimensionality-reduced features are passed through a bidirectional LSTM layer to obtain hidden features; the hidden features are passed through a temporal dimension pooling layer for max pooling along the temporal dimension to obtain max pooled features; the max pooled features are passed through a fully connected layer to obtain the output of the response coding subnetwork; The sub-network parameters of the current encoder are trainable parameters; the same stimulus-response sample pairs as the input of the stimulus encoding sub-network are used. The standardized track excitation signal is input to the current encoder. After processing by the activation encoding subnetwork, the current activation feature vector is obtained. Standardized vibration response signal The current response feature vector is obtained through the response encoding subnetwork. ; Forming the current feature pair .
[0009] Furthermore, in step S3, a dynamic normal memory is constructed. Each storage unit represents a complete set of features for a normal train passing event, including: a set of baseline feature pairs for the normal train passing event. , Sample number; overall normality score of normal train passing events. A fixed-capacity, first-in, first-out (FIFO) queue is adopted, with a maximum capacity queue size of [missing information]. From dynamic normal memory B baseline feature pairs of randomly selected historical normal samples are used to form a comparison set in the memory bank: ,in, This indicates the sample number extracted from the memory bank. Used to construct negative sample pairs.
[0010] Furthermore, in step S3, positive and negative sample pairs are constructed for comparative learning training: Constructing positive and negative sample pairs based on physical priors: Positive sample pair: For the first sample pair in a batch The sample, the first Each measurement point will capture the current excitation feature vector extracted by the current encoder for the same event. With the current response feature vector Pairing, as positive sample pairs ; Cross-event negative samples: for the first event The sample, the first The current response feature vector of each measurement point , and the current activation feature vector of all other events in the batch Pairing to form negative sample pairs across events ; Cross-encoder negative samples: for the first The sample, the first The current response feature vector of each measurement point The baseline activation feature vectors of all historical samples in the comparison set are compared with those in the memory bank. Pairing to form cross-encoder negative sample pairs ; Perturbation negative samples: for the first The sample, the first The current response feature vector of each measurement point Apply additive Gaussian noise The current response feature vector of the disturbance is obtained. ; will perturb the current response feature vector With the current activation feature vector Pairing yields perturbed negative sample pairs. .
[0011] This invention proposes a modified InfoNCE loss function adapted to a multi-dimensional negative sample system, which serves as the optimization objective of the current encoder. It addresses the problems of the traditional InfoNCE loss function's inability to adapt to multi-dimensional negative sample constraints, its inability to maximize the coupling degree of "healthy state stimulus-response features" and "abnormal state feature discriminative power," and its insufficient sensitivity to minor structural anomalies. Furthermore, based on the constructed positive and negative sample pairs, the modified InfoNCE loss function is used as the optimization objective, expressed by the following formula: , in, Represents the set of all negative samples, including negative sample pairs across events. Cross-encoder negative sample pairs and perturbation negative sample pairs ; This represents the current response feature vector in the set of all negative samples; Represents the cosine similarity function; This indicates the temperature hyperparameter.
[0012] Furthermore, in step S4, a weighted fusion method is used to calculate the first... Reference characteristics of individual steel structure measuring points Compared with the current feature pair The joint similarity is used to obtain the cross-channel feature similarity, expressed by the following formula: , in, Indicates joint similarity; Indicates the first weighting coefficient; , , , These represent the baseline excitation feature vector, baseline response feature vector, current excitation feature vector, and current response feature vector output by the trained dual-channel encoder, respectively. The overall normality score of the steel structure is obtained by weighted averaging of the normality scores of all measuring points. The formula is expressed as follows: , in, This represents the second weighting coefficient.
[0013] Further, in step S5, based on the normality scores of historical normal samples stored in the dynamic normal memory, the mean and standard deviation are calculated, and then the dynamic threshold is calculated, as shown in the following formula: , in, , These represent the normality scores of historical normal samples. The mean and standard deviation, Indicates the capacity of the memory bank; Indicates a dynamic threshold; If the overall normality score of the sample to be tested It determines that there is an abnormal state in the steel structure, and outputs the normality score of each steel structure measuring point, and locates the measuring point area with the highest probability of abnormality. If the overall normality score of the sample to be tested If the steel structure is determined to be in a normal state, the baseline feature pair and the current feature pair of the current sample are stored in the dynamic normal memory.
[0014] The advantages of this invention are: This invention reduces engineering application costs by introducing a contrastive learning framework, eliminating the need for a large number of hard-to-obtain damage annotation samples. Based on the physical prior of vibration transmission path invariance, it constructs cross-event, cross-encoder, and disturbance negative sample pairs, effectively decoupling the vibration response caused by changes in train operating conditions and structural degradation, thereby improving the accuracy and robustness of anomaly detection and reducing false alarms and false negatives. Employing a dual-channel encoder structure combined with multi-scale convolution and bidirectional LSTM networks, it fully extracts the deep spatiotemporal features of excitation and response signals, enhancing the model's adaptability to complex operating environments. The introduction of a dynamic normality memory and dynamic threshold mechanism enables adaptive updates to the normality score, ensuring continuous optimization of the detection system as the environment changes. Simultaneously, through multi-point weighted fusion and accurate anomaly monitoring using normality scores, it provides an intuitive and reliable decision-making basis for structural health monitoring. Attached Figure Description
[0015] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used together with the embodiments of the invention to explain the invention and do not constitute a limitation thereof.
[0016] Figure 1 This is a flowchart of the steps of the method of the present invention; Figure 2 This is a comparison curve of the loss of the present invention and the loss of the conventional method. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] Example 1 In this embodiment, as Figure 1 As shown, this invention provides a method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning, the specific steps of which include: S1. Using the train passing through a specific monitoring section as a trigger event, simultaneously collect track excitation signals and vibration response signals of the steel structure to construct excitation-response sample pairs; Specifically, the timing of the train's arrival at the monitoring section is detected using millimeter-wave radar located beside the track. This triggers the unified timing system to align the time of data acquisition from all sensors. Extract complete data for the train's passage time period, with the passage time period ranging from [missing information]. ,in, This is the pre-data collection time before the train locomotive arrives, in seconds. Set to 1 second. This indicates the time it takes for the train to completely pass through the monitoring section, in seconds; the train passing time is calculated as follows: , in, This indicates the standard length of a train formation, in meters. This indicates the real-time speed of the train, in meters per second; the extra 1 second is the buffer time after the rear of the train leaves, to ensure that the complete vibration attenuation response is collected. Track excitation signals are acquired using a piezoelectric accelerometer. Deployment of key steel structure nodes Three triaxial accelerometers simultaneously acquire vibration response signals. , Then, the Z-score normalization method is used to process the collected signals to obtain standardized data; the standardized track excitation signals of the same train passing event are then processed. and standardized vibration response signals of each steel structure measuring point Pairing samples together creates a set of stimulus-response pairs. , in, This indicates the total number of response measurement points for the steel structure.
[0019] S2. Construct a dual-channel feature encoding module, which includes a pre-trained baseline encoder and a current encoder. Both the pre-trained baseline encoder and the current encoder include two parallel sub-networks: an activation encoding sub-network and a response encoding sub-network. The structures of the pre-trained baseline encoder and the current encoder are completely identical. The activation-response sample pairs are processed by the dual-channel feature encoding module to obtain baseline feature pairs and current feature pairs. The baseline encoder of this invention is anchored to the feature distribution under healthy conditions, and the current encoder is used to adapt to new data; the differentiated sub-network design can more accurately capture the essential features of different types of signals, improving the quality of feature representation; it solves the problem that a single encoder cannot simultaneously represent "historical normal state" and "current state under test", and the problem of how to perform targeted feature extraction based on the characteristics of excitation and response signals; specifically, The activation coding subnetwork includes a multi-scale parallel convolutional module, a feature concatenation layer, a first 1D convolutional layer, a global average pooling layer, and a fully connected layer. The multi-scale parallel convolution module includes a first convolution branch, a second convolution branch, and a third convolution branch, with kernel sizes of 3×3, 5×5, and 7×7, respectively, and a stride of 2 for each; the formula is expressed as follows: , in, Represents the input signal, with dimension . ; The output feature of the convolution branch has a dimension of . ; The output features of each convolutional branch are concatenated along the channel dimension to obtain the concatenated features. , dimension The splicing features After adjusting the number of channels using a 1D convolutional layer, the input is subjected to global average pooling along the time dimension, and then processed by a fully connected layer to obtain the output of the activation coding subnetwork, with a dimension of [missing information]. The output dimension of the first 1D convolutional layer is The output dimension of the global average pooling layer is .
[0020] The response encoding subnetwork includes a first 1D convolutional layer, a second 1D convolutional layer, a third 1D convolutional layer, a dimensionality reduction layer, a bidirectional LSTM layer, a temporal pooling layer, and a fully connected layer; the kernel sizes of the first 1D convolutional layer, the second 1D convolutional layer, and the third 1D convolutional layer are 7×7, 3×3, and 3×3, respectively, and the stride is 2 for each. The input features of the response encoding subnetwork are passed through the first 1D convolutional layer to obtain the first 1D convolutional features, with dimension [missing information]. The first 1D convolutional feature is passed through a second 1D convolutional layer to obtain the second 1D convolutional feature, with a dimension of [missing value]. The second 1D convolutional feature is passed through a third 1D convolutional layer to obtain the third 1D convolutional feature, with dimension [missing information]. The third 1D convolutional feature is then subjected to global average pooling and dimensionality compression through a dimensionality reduction layer to obtain a dimensionality-reduced feature with a dimension of [dimensionality value missing]. The dimensionality-reduced features are then processed through a bidirectional LSTM layer to obtain hidden features with a dimension of [missing information]. The hidden features are then subjected to time-dimension pooling, resulting in max-pooled features with dimensions of [missing information]. The max-pooling features are passed through a fully connected layer to obtain the output of the response encoding subnetwork, with a dimension of [missing information]. ; The subnetwork parameters of the pre-trained baseline encoder are fixed parameters after training; the stimulus-response sample pairs The normalized track excitation signal is input to the pre-trained baseline encoder. After processing by the excitation encoding sub-network, the baseline excitation feature vector is obtained. Standardized vibration response signal The baseline response feature vector is obtained through the response encoding subnetwork. ; Forming a benchmark feature pair ; The sub-network parameters of the current encoder are trainable parameters; the same stimulus-response sample pairs are used. Input the current encoder and the standardized track excitation signal. After processing by the activation encoding subnetwork, the current activation feature vector is obtained. Standardized vibration response signal The current response feature vector is obtained through the response encoding subnetwork. ; Forming the current feature pair .
[0021] The training process of the pre-trained benchmark encoder is as follows: After the steel structure is completed and accepted, and under normal operating conditions during the initial period, no less than [number] [times / days] are required. Data from valid train passing events are used to construct stimulus-response normal sample pairs; these normal sample pairs are then used to pre-train the baseline encoder, and all parameters of the baseline encoder are frozen after pre-training. The pre-training objective of the benchmark encoder is to minimize the cosine distance between the excitation and response features of the same event, maximize the cosine distance between the excitation and response features of different events, and learn the vibration transmission mapping relationship under normal conditions; the loss function is: , in, Represents the pre-training loss function; Indicates the training batch size of the baseline encoder; Let represent the cosine similarity function; the training termination condition is: using the Adam optimizer, with a maximum of 100 iterations; when the validation set loss decreases by less than 0.00005 for 10 consecutive iterations, training is terminated early, and all parameters of the baseline encoder are frozen.
[0022] S3. Construct a dynamic normal memory bank. Based on the physical prior of the invariance of vibration transmission path, construct positive and negative sample pairs for comparative learning and training, optimize the parameters of the current encoder, and obtain the trained current encoder. Through the construction of rich negative samples, the model is forced to learn the true dynamic mapping relationship, thus becoming highly sensitive to changes in the essential structure and robust to working condition disturbances. The introduction of the memory bank can prevent the model from forgetting historical normal states.
[0023] Specifically, Building a dynamic normal memory Each storage unit represents a complete set of features for a normal train passing event, including: a set of baseline feature pairs for the normal train passing event. , Sample number; overall normality score of normal train passing events. It also includes auxiliary information such as sample collection time and train operating conditions; it adopts a fixed-capacity first-in-first-out (FIFO) queue with a maximum capacity queue. ; From dynamic normal memory B baseline feature pairs of randomly selected historical normal samples are used to form a comparison set in the memory bank: ,in, This indicates the sample number extracted from the memory bank. Used to construct negative sample pairs.
[0024] This invention, based on the physical prior of "vibration transmission path invariance under healthy conditions," constructs a multi-dimensional sample system of "one type of positive sample and three types of negative samples," including positive samples from the same event, negative samples across events, negative samples across encoders, and negative samples from disturbances. It addresses the problems of existing methods failing to effectively utilize the distribution information of historical normal samples, insufficient richness of negative samples in comparative learning, and inadequate model learning of normal state boundaries, leading to high false alarm rates. Specifically: Positive sample pair: For the first sample pair in a batch The sample, the first Each measurement point will capture the current excitation feature vector extracted by the current encoder for the same event. With the current response feature vector Pairing, as positive sample pairs : ; Cross-event negative samples: for the first event The sample, the first The current response feature vector of each measurement point , and the current activation feature vector of all other events in the batch Pairing to form negative sample pairs across events : ; Cross-encoder negative samples: for the first The sample, the first The current response feature vector of each measurement point The baseline activation feature vectors of all historical samples in the comparison set are compared with those in the memory bank. Pairing to form cross-encoder negative sample pairs : ; Perturbation negative samples: for the first The sample, the first The current response feature vector of each measurement point Apply additive Gaussian noise The current response feature vector of the disturbance is obtained. The formula is expressed as follows: , in, This represents additive Gaussian noise. , ; will perturb the current response feature vector With the current activation feature vector Pairing yields perturbed negative sample pairs. : ; Based on the positive and negative sample pairs constructed above, the modified InfoNCE loss function is used as the optimization objective, and the formula is expressed as follows: , in, Represents the set of all negative samples, including negative sample pairs across events. Cross-encoder negative sample pairs and perturbation negative sample pairs ; This represents the current response feature vector in the set of all negative samples; This represents the temperature hyperparameter, with a value of 0.1 to 0.5, used to adjust the steepness of the similarity distribution. In one embodiment, the value is 0.2. The loss function is calculated using an automatic differentiation framework. The gradient of the trainable parameter θ of the current encoder is calculated; to prevent gradient explosion, the gradient is clipped using the L2 norm with a clipping threshold of 1.0; the Adam optimizer is used to iteratively update the parameters of the current encoder.
[0025] In one embodiment, such as Figure 2As shown, the horizontal axis represents the number of training iterations, and the vertical axis represents the loss value. It can be seen that the modified InfoNCE loss function of this invention has a faster convergence speed, a lower convergence loss value, and better generalization ability. The training set loss decreases steadily without obvious overfitting. This is different from the traditional InfoNCE loss function, which has slow convergence and poor generalization. This proves that the contrastive learning framework constructed in this invention can effectively learn the feature boundaries of the healthy state, improve the model's sensitivity to identifying structural anomalies and its robustness to working condition disturbances, and verify the core beneficial effect of this invention: efficient self-supervised training without damaging labeled samples and reducing the cost of engineering applications.
[0026] S4. For a new sample to be detected, after extracting the feature vector through the trained dual-channel encoder, the cross-channel feature similarity is calculated as the normality score; the trained dual-channel encoder includes a pre-trained baseline encoder and a trained current encoder. Specifically, the weighted fusion method is used to calculate the first... Reference characteristics of individual steel structure measuring points Compared with the current feature pair The joint similarity is used to obtain the cross-channel feature similarity, expressed by the following formula: , in, Indicates joint similarity; This represents the first weighting coefficient, which has a value of 0.3 to 0.7 and is used to balance the contributions of the incentive features and the response features. In one embodiment, the first weighting coefficient has a value of 0.5. , , , These represent the baseline excitation feature vector, baseline response feature vector, current excitation feature vector, and current response feature vector output by the trained dual-channel encoder, respectively. The overall normality score of the steel structure is obtained by weighted averaging of the normality scores of all measuring points. The formula is expressed as follows: , in, This represents the second weighting coefficient, which in one embodiment is set according to the degree of influence of the measuring point on the overall stiffness of the structure. For example, the weight of the support measuring point is set to 0.2, and the weight of the beam-column connection point is set to 0.1, satisfying the following conditions: .
[0027] S5. A dynamic threshold mechanism is used for anomaly detection, and the dynamic normal sample memory is updated. By dynamically adjusting the threshold according to the distribution of normal samples, the adaptability and accuracy of anomaly detection are improved. The update mechanism of the normal sample memory ensures that the model can self-calibrate as the structure undergoes long-term slow changes (such as normal aging), avoiding misjudging slow and gradual changes as anomalies.
[0028] Specifically, based on the normality scores of historical normal samples stored in the dynamic normality memory, the mean and standard deviation are calculated, and then the dynamic threshold is calculated, as shown in the following formula: , in, , These represent the normality scores of historical normal samples. The mean and standard deviation, Indicates the capacity of the memory bank; Indicates a dynamic threshold; If the overall normality score of the sample to be tested It determines that there is an abnormal state in the steel structure, and outputs the normality score of each steel structure measuring point, and locates the measuring point area with the highest probability of abnormality. If the overall normality score of the sample to be tested If the steel structure is determined to be in a normal state, the baseline feature pair and the current feature pair of the current sample are stored in the dynamic normal memory. When a new normal sample feature is added and the queue is full, the oldest feature vector pair is automatically removed to ensure that the memory always stores the latest normal state feature distribution.
[0029] Finally, it should be noted that the above descriptions are merely preferred embodiments of the present invention and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning, characterized in that, Includes the following steps: S1. Using the train passing through a specific monitoring section as a trigger event, simultaneously collect track excitation signals and vibration response signals of the steel structure to construct excitation-response sample pairs; S2. Construct a dual-channel feature encoding module, which includes a pre-trained baseline encoder and a current encoder. Both the pre-trained baseline encoder and the current encoder include two parallel sub-networks: an activation encoding sub-network and a response encoding sub-network. The structures of the pre-trained baseline encoder and the current encoder are completely identical. The activation-response sample pairs are processed by the dual-channel feature encoding module to obtain baseline feature pairs and current feature pairs. S3. Construct a dynamic normal memory library. Based on the physical prior of the invariance of vibration transmission path, construct positive and negative sample pairs for comparative learning and training, optimize the parameters of the current encoder, and obtain the trained current encoder. S4. For a new sample to be detected, after extracting the feature vector through the trained dual-channel encoder, the cross-channel feature similarity is calculated as the normality score; the trained dual-channel encoder includes a pre-trained baseline encoder and a trained current encoder. S5. Employ a dynamic threshold mechanism for anomaly detection and maintain an updated dynamic normal memory.
2. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 1, characterized in that, Step S1 specifically includes: Track excitation signals are acquired using a piezoelectric accelerometer. Deployment of key steel structure nodes A triaxial accelerometer synchronously acquires vibration response signals. , Then, the Z-score normalization method is used to process the collected signals to obtain standardized data; the standardized track excitation signals of the same train passing event are then processed. and standardized vibration response signals of each steel structure measuring point Pairing is performed to form a set of stimulus-response sample pairs: , in, This indicates the total number of response measurement points for the steel structure.
3. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 1, characterized in that, The activation encoding subnetwork described in step S2 includes a multi-scale parallel convolutional module, a feature concatenation layer, a first 1D convolutional layer, a global average pooling layer, and a fully connected layer. The multi-scale parallel convolution module includes a first convolution branch, a second convolution branch, and a third convolution branch, with kernel sizes of 3×3, 5×5, and 7×7, respectively, and a stride of 2 for each; the formula is expressed as follows: , in, Indicates the input signal; This represents the output features of the convolution branch; The output features of each convolutional branch are concatenated along the channel dimension to obtain the concatenated features. The splicing features After adjusting the number of channels in the 1D convolutional layer, the input is globally averaged along the time dimension by the global average pooling layer, and then processed by the fully connected layer to obtain the output of the stimulus coding subnetwork. The subnetwork parameters of the pre-trained baseline encoder are fixed parameters after training; the stimulus-response sample pairs The normalized track excitation signal is input to the pre-trained baseline encoder. After processing by the excitation encoding sub-network, the baseline excitation feature vector is obtained. Standardized vibration response signal The baseline response feature vector is obtained through the response encoding subnetwork. ; Forming a benchmark feature pair .
4. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 1, characterized in that, The response encoding subnetwork described in step S2 includes a first 1D convolutional layer, a second 1D convolutional layer, a third 1D convolutional layer, a dimensionality reduction layer, a bidirectional LSTM layer, a temporal pooling layer, and a fully connected layer; the kernel sizes of the first 1D convolutional layer, the second 1D convolutional layer, and the third 1D convolutional layer are 7×7, 3×3, and 3×3, respectively, and the stride is 2 for each layer. The input features of the response coding subnetwork are passed through a first 1D convolutional layer to obtain first 1D convolutional features; the first 1D convolutional features are passed through a second 1D convolutional layer to obtain second 1D convolutional features; the second 1D convolutional features are passed through a third 1D convolutional layer to obtain third 1D convolutional features; the third 1D convolutional features are passed through a dimensionality reduction layer for global average pooling and dimensionality compression to obtain dimensionality-reduced features; the dimensionality-reduced features are passed through a bidirectional LSTM layer to obtain hidden features; the hidden features are passed through a temporal dimension pooling layer for max pooling along the temporal dimension to obtain max pooled features; the max pooled features are passed through a fully connected layer to obtain the output of the response coding subnetwork; The sub-network parameters of the current encoder are trainable parameters; the same stimulus-response sample pairs as the input of the stimulus encoding sub-network are used. The standardized track excitation signal is input to the current encoder. After processing by the activation encoding subnetwork, the current activation feature vector is obtained. ; Standardized vibration response signal The current response feature vector is obtained through the response encoding subnetwork. ; Forming the current feature pair .
5. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 1, characterized in that, In step S3, a dynamic normal memory is constructed. Each storage unit represents a complete set of features for a normal train passing event, including: a set of baseline feature pairs for the normal train passing event. , Sample number; overall normality score of normal train passing events. A fixed-capacity, first-in-first-out (FIFO) queue is adopted, with a maximum capacity queue size of [missing information]. From dynamic normal memory B baseline feature pairs of randomly selected historical normal samples are used to form a comparison set in the memory bank: ,in, This indicates the sample number extracted from the memory bank. Used to construct negative sample pairs.
6. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 5, characterized in that, In step S3, positive and negative sample pairs are constructed for comparative learning training: Constructing positive and negative sample pairs based on physical priors: Positive sample pair: For the first sample pair in a batch The sample, the first Each measurement point will capture the current excitation feature vector extracted by the current encoder for the same event. With the current response feature vector Pairing, as positive sample pairs ; Cross-event negative samples: for the first event The sample, the first The current response feature vector of each measurement point , and the current activation feature vector of all other events in the batch Pairing to form negative sample pairs across events ; Cross-encoder negative samples: for the first The sample, the first The current response feature vector of each measurement point The baseline activation feature vectors of all historical samples in the comparison set are compared with those in the memory bank. Pairing to form cross-encoder negative sample pairs ; Perturbation negative samples: for the first The sample, the first The current response feature vector of each measurement point Apply additive Gaussian noise The current response feature vector of the disturbance is obtained. ; will perturb the current response feature vector With the current activation feature vector Pairing yields perturbed negative sample pairs. ; Based on the constructed positive and negative sample pairs, the modified InfoNCE loss function is used as the optimization objective, as expressed in the following formula: , in, Represents the set of all negative samples, including negative sample pairs across events. Cross-encoder negative sample pairs and perturbation negative sample pairs ; This represents the current response feature vector in the set of all negative samples; Represents the cosine similarity function; This indicates the temperature hyperparameter.
7. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 1, characterized in that, In step S4, the weighted fusion method is used to calculate the first... Reference characteristics of individual steel structure measuring points With current features The joint similarity is used to obtain the cross-channel feature similarity, expressed by the following formula: , in, Indicates joint similarity; Indicates the first weighting coefficient; , , , These represent the baseline excitation feature vector, baseline response feature vector, current excitation feature vector, and current response feature vector output by the trained dual-channel encoder, respectively. The overall normality score of the steel structure is obtained by weighted averaging of the normality scores of all measuring points. The formula is expressed as follows: , in, This represents the second weighting coefficient.
8. The method for detecting vibration anomalies in adjacent steel structures of high-speed railways based on contrastive learning according to claim 1, characterized in that, In step S5, based on the normality scores of historical normal samples stored in the dynamic normal memory, the mean and standard deviation are calculated, and then the dynamic threshold is calculated, as shown in the following formula: , in, , These represent the normality scores of historical normal samples. The mean and standard deviation, Indicates the capacity of the memory bank; Indicates a dynamic threshold; If the overall normality score of the sample to be tested It determines that there is an abnormal state in the steel structure, and outputs the normality score of each steel structure measuring point, and locates the measuring point area with the highest probability of abnormality. If the overall normality score of the sample to be tested If the steel structure is determined to be in a normal state, the baseline feature pair and the current feature pair of the current sample are stored in the dynamic normal memory.