Intelligent workshop equipment running state monitoring system based on multi-modal perception
The intelligent workshop equipment operation status monitoring system with multimodal perception solves the problems of data time sequence misalignment and insufficient dimensional coverage caused by single sensors, realizes the fusion of multi-source data and rapid identification and location of abnormal equipment, and improves the precision of equipment operation and maintenance and the ability of visual alarms.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JILIN TECH COLLEGE OF ELECTRONICS INFORMATION
- Filing Date
- 2026-05-25
- Publication Date
- 2026-06-30
Smart Images

Figure CN122306166A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent manufacturing monitoring technology, and in particular to an intelligent workshop equipment operation status monitoring system based on multimodal perception. Background Technology
[0002] In the operation and maintenance of industrial equipment in smart workshops, traditional monitoring methods often use single-type sensors to collect equipment operating data. Various sensor data are collected independently, their timing is difficult to align, and equipment operating parameters are stored in fragmented form. This results in limited data dimensional coverage and an inability to comprehensively reflect the overall operating condition of the equipment. Conventional monitoring methods only perform feature analysis on single-dimensional data such as vibration, acoustics, and temperature, without fusing multiple physical characteristics, and the correlation information between data cannot be effectively mined.
[0003] Single network models are widely used in equipment operation status prediction. However, their structure cannot account for the topological relationships between equipment nodes in the workshop, nor can they accurately fit the dynamic evolution of equipment operation status over time. Equipment anomaly identification often relies on direct comparison with fixed numerical thresholds, lacking mathematical and statistical verification logic, making it susceptible to environmental interference and data fluctuations, leading to judgment biases. Alarm information only provides simple textual descriptions of the anomaly, failing to link it to equipment location information, making it difficult for workshop maintenance personnel to quickly locate the fault.
[0004] In the context of large-scale deployment of equipment in industrial workshops, the time-series synchronous integration of multi-source heterogeneous sensor data and the deep fusion of multi-dimensional features have become monitoring challenges. The collaborative modeling of equipment-related topological features and time-series status, the refined anomaly identification, and the visualization of alarm presentation have become technical directions that need to be improved in existing monitoring technologies. Summary of the Invention
[0005] The purpose of this invention is to address the shortcomings of existing technologies by proposing a smart workshop equipment operation status monitoring system based on multimodal perception.
[0006] To achieve the above objectives, the present invention adopts the following technical solution: an intelligent workshop equipment operation status monitoring system based on multimodal perception, comprising: The data acquisition module synchronously collects operational data of the target equipment through multiple types of sensors deployed in the smart workshop and constructs a multimodal synchronous data cube; The feature extraction module extracts vibration frequency domain features, acoustic emission energy features, and thermal imaging temperature gradient features from the multimodal synchronous data cube, and concatenates the three types of features into a joint feature vector. The state prediction module inputs the joint feature vector into a pre-trained graph convolutional network and outputs the hidden state representation vector corresponding to each node. The hidden state representation vector is then input into a temporal prediction network to output the predicted sequence of the target device's operating state within a future time window. The anomaly detection module calculates the residual sequence between the predicted operating state sequence and the measured operating state sequence, performs statistical hypothesis testing on the residual sequence, and determines that the target device has an abnormal operating state when the test statistic exceeds a preset threshold. The alarm output module encapsulates the device number of the target device that is determined to be abnormal, the time of the abnormality, and the duration of the abnormality into an abnormal alarm record, pushes the abnormal alarm record to the workshop monitoring terminal, and highlights the location of the abnormal device on the workshop monitoring terminal.
[0007] As a further aspect of the present invention, multiple types of sensors deployed in a smart workshop synchronously collect operational data of the target equipment, and construct a multimodal synchronous data cube, including: Vibration sensors, acoustic emission sensors, and infrared thermal imagers deployed in the smart workshop synchronously collect the operating data of the target equipment, and generate vibration time-series signals, acoustic emission time-series signals, and thermal imaging sequence maps, respectively. The vibration timing signal, the acoustic emission timing signal, and the thermal imaging sequence are time-stamped and spatial coordinates are registered to construct a multimodal synchronous data cube. The process involves synchronously collecting operational data of the target equipment using vibration sensors, acoustic emission sensors, and infrared thermal imagers deployed within the smart workshop, and generating vibration time-series signals, acoustic emission time-series signals, and thermal imaging sequence maps, specifically including: A vibration sensor is fixedly installed on the casing of each target device in the smart workshop, and the sensitive direction of the vibration sensor is aligned with the main vibration direction of the target device. An acoustic emission sensor is fixedly installed on the outer shell of the target device at a preset distance from the vibration sensor, and the coupling surface of the acoustic emission sensor is in close contact with the surface of the outer shell of the target device; An infrared thermal imager is mounted directly in front of the target device, with the optical axis of the infrared thermal imager pointing towards the key heat-generating area of the target device. The synchronous clock generator is activated to simultaneously send sampling trigger pulses to the vibration sensor, acoustic emission sensor, and infrared thermal imager. After receiving a sampling trigger pulse, the vibration sensor acquires vibration acceleration values at a fixed sampling frequency and generates a vibration timing signal. After receiving a sampling trigger pulse, the acoustic emission sensor acquires the acoustic emission amplitude at a fixed sampling frequency and generates an acoustic emission timing signal. After receiving a sampling trigger pulse, the infrared thermal imager acquires thermal images at a fixed frame rate and generates a thermal imaging sequence.
[0008] As a further aspect of the present invention, the vibration timing signal, the acoustic emission timing signal, and the thermal imaging sequence image are time-stamp aligned and spatial coordinate system registered to construct a multimodal synchronous data cube, specifically including: Each sampling point in the vibration timing signal is labeled with an absolute timestamp, which is derived from the output of the synchronous clock generator; Each sampling point in the acoustic emission timing signal is labeled with the same absolute timestamp; Each frame in the thermal imaging sequence is labeled with the same absolute timestamp; Based on the sampling time points of the vibration time series signal, cubic spline interpolation is performed on the acoustic emission time series signal and the thermal imaging sequence map so that the data of the three modes have the same time point sequence on the time axis; Establish a global spatial coordinate system for the workshop and record the installation coordinates of the vibration sensor and acoustic emission sensor on each target device; Record the optical center position coordinates and optical axis pointing vector of the infrared thermal imager in front of each target device; The image coordinates of each pixel in the thermal imaging sequence are converted into spatial coordinates in the workshop global spatial coordinate system through perspective projection transformation. The vibration timing signal, acoustic emission timing signal, and thermal imaging sequence image data with the same absolute timestamp and corresponding to the target device number are organized according to the time dimension, spatial dimension, and modal dimension to form the multimodal synchronization data cube.
[0009] As a further aspect of the present invention, vibration frequency domain features, acoustic emission energy features, and thermal imaging temperature gradient features are extracted from the multimodal synchronization data cube, and the three types of features are concatenated into a joint feature vector, including: Vibration data slices within the current time window are segmented along the time dimension from the multimodal synchronization data cube; Perform a fast Fourier transform on the vibration data slices to obtain the vibration frequency domain amplitude spectrum; The frequency shift amplitude, harmonic amplitude, and high-frequency noise floor amplitude are extracted from the vibration frequency domain amplitude spectrum to form a vibration frequency domain feature vector; Slice the acoustic emission data within the current time window from the multimodal synchronization data cube along the time dimension; The acoustic emission signal envelope at each time point is calculated for the acoustic emission data slices to obtain the acoustic emission energy envelope sequence; The peak energy value, average energy value, and energy rise rate are extracted from the acoustic emission energy envelope sequence to form an acoustic emission energy feature subvector. The thermal imaging temperature matrix of the target device surface within the current time window is segmented along the spatial dimension from the multimodal synchronous data cube. The horizontal and vertical temperature gradients are calculated from the thermal imaging temperature matrix to obtain the temperature gradient field. The maximum temperature gradient value and the temperature gradient direction angle are extracted from the temperature gradient field to form a thermal imaging temperature gradient feature vector. The vibration frequency domain feature vector, the acoustic emission energy feature vector, and the thermal imaging temperature gradient feature vector are concatenated sequentially to generate the joint feature vector.
[0010] As a further aspect of the present invention, the joint feature vector is input into a pre-trained graph convolutional network, including: The graph convolutional network constructs a device relationship graph using devices within the workshop as nodes and physical distances between devices as edge weights. Obtain a list of device numbers and device location coordinates for all equipment within the smart workshop; Calculate the Euclidean distance between every two devices based on the device location coordinate list; Define each device as a graph node, and bind the corresponding device number to each graph node; For any two devices, when the Euclidean distance value is less than the preset connection distance threshold, an undirected edge is established between the graph nodes corresponding to the two devices. The reciprocal of the Euclidean distance value is used as the edge weight of the undirected edge. The larger the edge weight, the stronger the spatial coupling between the two devices. Organize all graph nodes and all undirected edges and their weights into the device relationship graph; The joint feature vector is used as the input feature of the graph node corresponding to the target device at the current time step; Organize the input features of all graph nodes into a graph structure input tensor according to the adjacency relationship of the device relationship graph; The graph structure input tensor is input into a pre-trained graph convolutional network, and each layer of the graph convolutional network performs neighborhood aggregation and feature transformation on the features of each graph node.
[0011] As a further aspect of the present invention, the joint feature vector is input into a pre-trained graph convolutional network to output a hidden state representation vector corresponding to each node. The hidden state representation vector is then input into a temporal prediction network to output a predicted sequence of the target device's operating state within a future time window, including: The last layer of the graph convolutional network outputs a hidden state representation vector for each graph node. The hidden state representation vector integrates the node's own multimodal features and the spatial coupling features of adjacent nodes. Stack the hidden state representation vectors of all graph nodes into a hidden state matrix in order of device number; The hidden state matrix at the current time step is concatenated with the hidden state matrices at a preset number of historical time steps along the time dimension to obtain the temporal context tensor. The temporal context tensor is input into the temporal prediction network; The time-series prediction network employs a gated recurrent unit structure to recursively process the time-series context tensor along the time dimension, updating the hidden state of the gated recurrent unit at each time step. The fully connected layer of the time series prediction network receives the hidden state of the gated recurrent unit at the last time step and outputs the sequence of predicted values within the future time window. Each predicted value in the predicted value sequence corresponds to the predicted running status at a point in time within a future time window. The predicted value sequence is used as the operating state prediction sequence.
[0012] As a further aspect of the present invention, the residual sequence between the predicted operating state sequence and the measured operating state sequence is calculated, and a statistical hypothesis test is performed on the residual sequence. When the test statistic exceeds a preset threshold, it is determined that the target device has an abnormal operating state, including: After the future time window is actually reached, the measured operating status values within the future time window are collected by sensors deployed on the target device to form the measured operating status sequence. Align the predicted operating state sequence with the measured operating state sequence at the same time point, calculate the difference between the measured operating state value and the predicted operating state value at each time point, and obtain the residual sequence. Calculate the sample mean and sample standard deviation of the residual sequence; Based on the sample mean and the sample standard deviation, each residual value in the residual sequence is standardized to obtain a standardized score sequence, wherein each standardized score is equal to the residual value minus the sample mean and divided by the sample standard deviation. Perform a one-sample Kolmogorov-Smirnov test on the standardized score sequence to test whether the standardized score sequence follows a standard normal distribution; When the test statistic of the single-sample Kolmogorov-Smirnov test is greater than the preset distribution test threshold, the standardized score sequence is determined to deviate from the standard normal distribution. Given that the standardized score sequence deviates from the standard normal distribution, calculate the maximum absolute value of the standardized score sequence. When the maximum value of the absolute value is greater than the preset abnormal deviation threshold, it is determined that the target device has an abnormal operating status.
[0013] As a further aspect of the present invention, the device number of the target device determined to be abnormal, the time of the abnormality occurrence, and the duration of the abnormality are encapsulated into an abnormal alarm record, including: Read the device number of the target device from the list of device numbers of the target devices that are determined to have abnormal operating status; Find the time point when the absolute value of the standardized score first exceeds the preset abnormal deviation threshold from the residual sequence, and record the time point as the abnormal start time point; Find the time point in the residual sequence where the absolute value of the standardized score is continuously lower than the preset abnormal deviation threshold, and record the time point as the abnormal end time point; Calculate the time difference between the end time of the anomaly and the start time of the anomaly, and use the time difference as the duration of the anomaly. The device number, the abnormal start time, and the abnormal duration are organized into a data structure according to a preset alarm data format. Perform a serialization operation on the data structure to generate the exception alarm record in string format; A timestamp and a checksum are attached to the abnormal alarm record. The timestamp records the time when the abnormality determination is completed, and the checksum is used to verify the data integrity of the abnormal alarm record.
[0014] As a further aspect of the present invention, the abnormal alarm record is pushed to the workshop monitoring terminal and the location of the abnormal equipment is highlighted on the workshop monitoring terminal, including: The abnormal alarm records are sent to the workshop monitoring terminal via message queue through the workshop's internal LAN. After receiving abnormal alarm records, the workshop monitoring terminal parses out the equipment number, the start time of the abnormality, and the duration of the abnormality. The workshop monitoring terminal queries the local database for the pixel coordinates of the device on the workshop layout map based on the device number; The workshop monitoring terminal loads a pre-stored workshop layout diagram onto the display interface; Draw a highlighted, flashing marker box centered on the pixel coordinates on the workshop layout diagram; The anomaly start time and duration are displayed as text labels next to the marked box. The workshop monitoring terminal also appends the received abnormal alarm records to the local log file, which is stored in volumes by date.
[0015] As a further aspect of the present invention, after pushing the abnormal alarm record to the workshop monitoring terminal, the following is further performed: The number of abnormal alarm records with the same device number received by the workshop monitoring terminal; When the number of abnormal alarm records for the same device number exceeds a preset frequency threshold within a sliding time window, the workshop monitoring terminal generates a device deterioration trend warning. The device degradation trend warning is packaged with the corresponding device number, the start time and end time of the sliding time window into a degradation report; The workshop monitoring terminal sends the degradation report to the preset email address of the equipment maintenance personnel. The workshop monitoring terminal switches the color of the corresponding device's marker box on the display interface from a bright flashing mode to a continuous red light mode. The workshop monitoring terminal also disables the push and display of abnormal alarm records with the device number within a preset silent period in the future, in order to avoid overwhelming the system with duplicate alarms.
[0016] Compared with the prior art, the advantages and positive effects of the present invention are as follows: Multiple types of sensors synchronously collect various operational data of target equipment in the smart workshop, forming a multimodal synchronous data cube. This unifies the temporal reference and data architecture of multi-source sensor data, and incorporates heterogeneous data of different types, such as vibration, acoustic emission, and thermal imaging, into the same data framework for organized storage. This broadens the data representation dimension of equipment operating status, resolves the information fragmentation problem caused by the time sequence misalignment of data collected by different sensors, and enables various physical parameters to achieve spatiotemporal correlation during equipment operation.
[0017] The joint feature vector is input into a pre-trained graph convolutional network to extract the hidden state representation vectors of nodes. This vector is then fed into a temporal prediction network to infer the future operating state trends of equipment. This hierarchical mining of the topological relationships between workshop equipment nodes and the temporal evolution patterns of operating states compensates for the limitations of single-network model feature mining dimensions. A residual sequence is constructed based on the predicted and measured operating state sequences. Statistical hypothesis testing is introduced to identify anomalies, moving beyond the single-discrimination mode of fixed threshold comparison. Anomaly-related information is integrated, encapsulated, and pushed to the monitoring terminal with location highlighting, enriching the anomaly information content and intuitively presenting the distribution of abnormal equipment within the workshop. Attached Figure Description
[0018] Figure 1 This is a timing diagram of the intelligent workshop equipment operation status monitoring system based on multimodal perception described in this invention. Figure 2 Flowchart for spatiotemporal alignment and data cube construction; Figure 3 This is a flowchart for inputting joint feature vectors into a pre-trained graph convolutional network. Detailed Implementation
[0019] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0020] In the description of this invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships, are based on the orientation or positional relationships shown in the accompanying drawings and are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, in the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0021] See Figure 1 This invention provides a smart workshop equipment operation status monitoring system based on multimodal perception. The system includes: a data acquisition module, a feature extraction module, a status prediction module, an anomaly detection module, and an alarm output module. The data acquisition module synchronously collects operation data of the target equipment through multiple sensors deployed within the smart workshop and constructs a multimodal synchronous data cube. The feature extraction module extracts vibration frequency domain features, acoustic emission energy features, and thermal imaging temperature gradient features from the multimodal synchronous data cube and concatenates these three types of features into a joint feature vector. The status prediction module inputs the joint feature vector into a pre-trained graph convolutional network, outputting a hidden state representation vector for each node. This hidden state representation vector is then input into a time-series prediction network, outputting a predicted sequence of the target equipment's operation status within a future time window. The anomaly detection module calculates the residual sequence between the predicted and measured operation status sequences, performs statistical hypothesis testing on the residual sequence, and determines that the target equipment has an operational anomaly when the test statistic exceeds a preset threshold. The alarm output module encapsulates the device number, time of occurrence, and duration of the abnormal target device into an abnormal alarm record, pushes the abnormal alarm record to the workshop monitoring terminal, and highlights the location of the abnormal device on the workshop monitoring terminal.
[0022] In one embodiment of the invention, a vibration sensor is fixedly installed on the casing of each target device in the smart workshop, with the sensitive direction of the vibration sensor aligned with the main vibration direction of the target device. An acoustic emission sensor is fixedly installed on the casing of the target device at a predetermined distance from the vibration sensor, with the coupling surface of the acoustic emission sensor in close contact with the surface of the target device's casing. An infrared thermal imager is mounted directly in front of the target device, with its optical axis pointing towards the key heat-generating area of the target device. A synchronization clock generator is activated to simultaneously send sampling trigger pulses to the vibration sensor, acoustic emission sensor, and infrared thermal imager. Upon receiving the sampling trigger pulse, the vibration sensor acquires vibration acceleration values at a fixed sampling frequency and generates a vibration timing signal. Upon receiving the sampling trigger pulse, the acoustic emission sensor acquires acoustic emission amplitude values at a fixed sampling frequency and generates an acoustic emission timing signal. Upon receiving the sampling trigger pulse, the infrared thermal imager acquires thermal imaging images at a fixed frame rate and generates a thermal imaging sequence.
[0023] See Figure 2 Each sampling point in the vibration time-series signal is labeled with an absolute timestamp, derived from the output of the synchronization clock generator. The same absolute timestamp is assigned to each sampling point in the acoustic emission time-series signal. The same absolute timestamp is assigned to each frame of the thermal imaging sequence. Based on the sampling time points of the vibration time-series signal, cubic spline interpolation is performed on the acoustic emission time-series signal and the thermal imaging sequence to ensure that the data from the three modes have the same time point sequence on the time axis. A global spatial coordinate system for the workshop is established, recording the installation position coordinates of the vibration sensor and acoustic emission sensor on each target device. The optical center position coordinates and optical axis pointing vector of the infrared thermal imager in front of each target device are recorded. The image coordinates of each pixel in the thermal imaging sequence are converted into spatial coordinates in the global spatial coordinate system of the workshop through perspective projection transformation. The data from the vibration time-series signal, acoustic emission time-series signal, and thermal imaging sequence that have the same absolute timestamp and correspond to the target device number are organized according to the time dimension, spatial dimension, and modal dimension to form a multimodal synchronous data cube.
[0024] In one exemplary implementation scenario, four identical CNC milling machines are deployed in a smart workshop as target equipment, and each CNC milling machine requires independent monitoring of its operational status. A piezoelectric vibration sensor is fixedly installed on the main vibration direction of the column sidewall of each CNC milling machine using a magnetic base. The sensitive direction of the vibration sensor is aligned with the vertical vibration direction of the CNC milling machine's spindle box, and the sensor's housing is flush against a flat metal surface near the machining area. 15 cm away from the vibration sensor, on the same housing plane, an acoustic emission sensor is fixedly installed using epoxy resin coupling agent. The ceramic coupling surface of the acoustic emission sensor is tightly bonded to the cast iron housing surface of the CNC milling machine without air bubbles to transmit high-frequency stress wave signals. 2.0 meters directly in front of each CNC milling machine, an uncooled infrared thermal imager is installed. The optical axis of the infrared thermal imager points to the critical heat-generating area formed by the front bearing end cover of the CNC milling machine's spindle and the motor's cooling fins. The pitch and azimuth angles of the infrared thermal imager are adjusted to center the critical heat-generating area within the field of view. The synchronous clock generator, composed of a GPS timing module and a temperature-controlled crystal oscillator, simultaneously sends sampling trigger pulses to all vibration sensors, acoustic emission sensors, and the infrared thermal imager via a coaxial cable of matching length. The repetition period of the sampling trigger pulses is set to 100 microseconds. Upon receiving a sampling trigger pulse, the vibration sensor performs a vibration acceleration value acquisition at a fixed sampling frequency of 10 kHz. After quantization by an internal 24-bit analog-to-digital converter, a vibration timing signal is generated, and each sampling point is marked with a corresponding absolute timestamp. Upon receiving a sampling trigger pulse, the acoustic emission sensor acquires the acoustic emission amplitude at a fixed sampling frequency of 200 kHz. After conditioning by a preamplifier, the acoustic emission timing signal is output. Upon receiving a sampling trigger pulse, the infrared thermal imager exposes one frame of thermal imaging image at a fixed frame rate of 30 frames per second, with an image resolution of 640×512 pixels. The absolute timestamp corresponding to the exposure time is recorded by the infrared thermal imager firmware for each frame of thermal imaging image, and these are continuously accumulated to form a thermal imaging sequence.
[0025] During the time alignment phase, the synchronization clock generator distributes its internally maintained absolute timestamps as 64-bit integers to each sensor. The absolute timestamp is defined as the number of microseconds accumulated since 00:00:00 UTC on January 1, 1970. Each sampling point in the vibration time series signal is assigned an absolute timestamp generated by the synchronization clock generator. The same absolute timestamp system is used for each sampling point in the acoustic emission time series signal. Each frame in the thermal imaging sequence is assigned an absolute timestamp of the same format, ensuring consistency in the time stamp reference for all three modalities. Using the original sampling time sequence of the vibration time series signal as the alignment reference time axis, cubic spline interpolation is performed segment by segment on the acoustic emission time series signal. A cubic spline function is constructed between two adjacent reference time points using the natural boundary conditions of the acoustic emission signal corresponding to the sampling interval of the vibration time series signal. The interpolation yields an acoustic emission signal amplitude sequence that is strictly aligned with the vibration sampling time. The thermal imaging sequence was also subjected to cubic spline interpolation based on the reference time axis. At each reference time point, two original thermal imaging images were selected before and after that moment, and an interpolated thermal imaging image was generated through cubic spline interpolation in the time dimension, ensuring that the interpolated thermal imaging image sequence completely corresponds to the vibration time sequence signal in time. In the spatial coordinate system registration stage, a global spatial coordinate system for the workshop was established with the southeast corner of the workshop floor as the origin, the east direction as the X-axis, the north direction as the Y-axis, and the vertical upward as the Z-axis. A laser tracker was used to determine the three-dimensional coordinates of the installation positions of the vibration sensors and acoustic emission sensors on each CNC milling machine, the three-dimensional coordinates of the optical center position of the infrared thermal imager, and the optical axis pointing vector of the infrared thermal imager. The optical axis pointing vector was expressed in the form of a unit vector. Perspective projection transformation is used to convert the pixel coordinates of each frame of interpolated thermal imaging image to the workshop global spatial coordinate system. Specifically, based on the intrinsic parameter matrix of the infrared thermal imager, the rotation matrix determined by the optical axis pointing vector and the sensor lateral axis, and the optical center position translation vector, the rays corresponding to the pixel coordinates are spatially mapped. Combined with the depth distance value measured from the infrared thermal imager to the equipment surface, the three-dimensional spatial coordinates of each pixel in the workshop global spatial coordinate system are calculated. After completing time alignment and spatial registration, the vibration time sequence signal, acoustic emission time sequence signal, and thermal imaging sequence data with spatial coordinates belonging to the same target equipment are aligned in the time dimension according to the consistency of absolute timestamps, divided in the spatial dimension according to spatial coordinates, and stored in the modal dimension according to vibration acceleration, acoustic emission amplitude, and thermal imaging temperature values. This is assembled into a multimodal synchronous data cube, where each element of the multimodal synchronous data cube corresponds to a specific time point, a specific spatial location, and a specific sensing mode of observation data.
[0026] In one embodiment of the present invention, vibration data slices within the current time window are segmented along the time dimension from the multimodal synchronization data cube. A fast Fourier transform is performed on the vibration data slices to obtain the vibration frequency domain amplitude spectrum. The frequency shift amplitude, harmonic amplitude, and high-frequency noise floor amplitude are extracted from the vibration frequency domain amplitude spectrum to form a vibration frequency domain feature vector. Acoustic emission data slices within the current time window are also segmented along the time dimension from the multimodal synchronization data cube. The acoustic emission signal envelope at each time point is calculated for the acoustic emission data slices to obtain an acoustic emission energy envelope sequence. The peak energy value, average energy value, and energy rise rate are extracted from the acoustic emission energy envelope sequence to form an acoustic emission energy feature vector. A thermal imaging temperature matrix of the target device surface within the current time window is also segmented along the spatial dimension from the multimodal synchronization data cube. The horizontal and vertical temperature gradients are calculated on the thermal imaging temperature matrix to obtain a temperature gradient field. The maximum temperature gradient value and temperature gradient direction angle are extracted from the temperature gradient field to form a thermal imaging temperature gradient feature vector. The vibration frequency domain feature vector, acoustic emission energy feature vector, and thermal imaging temperature gradient feature vector are concatenated sequentially to generate a joint feature vector.
[0027] In one exemplary implementation scenario, a multimodal synchronous data cube is constructed for a VM850 vertical machining center in a smart workshop. This cube contains vibration data, acoustic emission data, and thermal imaging temperature data spanning a continuous 600-second time span. The temporal resolution of the multimodal synchronous data cube is 0.1 seconds, and the modal dimension includes vibration acceleration channels, acoustic emission amplitude channels, and a thermal imaging temperature spatial matrix. Vibration data slices within the current time window are segmented along the time dimension from the multimodal synchronous data cube. The start and end times of the current time window are calculated by looking back 3.0 seconds from the current time. Each vibration data slice contains vibration acceleration samples at all time points within that 3.0-second window. A Fast Fourier Transform (FFT) is performed on the vibration data slices using a Hanning window with 4096 transform points, yielding a vibration frequency domain amplitude spectrum with a frequency resolution of 0.244 Hz. The amplitude corresponding to the spindle rotation frequency is searched in the vibration frequency domain amplitude spectrum and taken as the rotational frequency amplitude. The spindle rotation frequency is calculated from the spindle speed set in the current process. The amplitude corresponding to the 2nd, 3rd, and 4th harmonics of the rotational frequency is searched in the vibration frequency domain amplitude spectrum and the largest amplitude is taken as the harmonic amplitude. In the high-frequency range of 5 kHz to 10 kHz in the vibration frequency domain amplitude spectrum, the arithmetic mean of the amplitude spectrum is calculated as the high-frequency noise floor amplitude. The rotational frequency amplitude, harmonic amplitude, and high-frequency noise floor amplitude are arranged in sequence to form the vibration frequency domain feature vector, which is a 3-dimensional column vector.
[0028] In practical implementation, acoustic emission data slices within the same current time window as the vibration data slices are segmented from the multimodal synchronous data cube along the time dimension. The time range and sampling time point sequence of the acoustic emission data slices are strictly aligned with those of the vibration data slices. The acoustic emission signal envelope at each time point is calculated for each acoustic emission data slice. Specifically, a Hilbert transform is performed on the time series of the acoustic emission data slice to obtain the imaginary part sequence of the analytic signal. The square root of the sum of the squares of the original real part and the imaginary part at each time point is calculated to obtain the envelope amplitude of the acoustic emission signal envelope at each time point. The envelope amplitudes at all time points are arranged in chronological order to form the acoustic emission energy envelope sequence. The peak energy value is extracted from the acoustic emission energy envelope sequence, which is defined as the maximum envelope amplitude in the acoustic emission energy envelope sequence. The average energy value is extracted, which is defined as the arithmetic mean of all envelope amplitudes in the acoustic emission energy envelope sequence. The energy rise rate is extracted, which is defined as the difference between the average envelope amplitude in the last 1.5 seconds and the average envelope amplitude in the first 1.5 seconds of the acoustic emission energy envelope sequence, divided by the duration of the last 1.5 seconds. The energy rise rate is positive when the average envelope amplitude in the latter part is greater than that in the former part, and negative or zero otherwise. The peak energy value, average energy value, and energy rise rate are arranged in sequence to form an acoustic emission energy feature vector, which is a 3-dimensional column vector.
[0029] In practical implementation, a thermal imaging temperature matrix of the target equipment surface within the current time window is segmented along the spatial dimension from the multimodal synchronous data cube. This thermal imaging temperature matrix corresponds to the spatial temperature distribution of the front cover area of the spindle box of the VM850 vertical machining center in the global spatial coordinate system of the workshop at the end of the current time window. Each element in the matrix represents the temperature value measured by the infrared thermal imager and spatially registered at that spatial coordinate location, in degrees Celsius. The horizontal and vertical temperature gradients are calculated for the thermal imaging temperature matrix. The horizontal direction of the thermal imaging temperature matrix is taken as the X-axis direction of the global spatial coordinate system of the workshop, and the vertical direction is taken as the Z-axis direction. The horizontal temperature gradient is obtained by taking the central difference of each row of temperature values along the X-axis direction, with one-sided differencing used at the matrix boundary points. The vertical temperature gradient is obtained by taking the central difference of each column of temperature values along the Z-axis direction, with one-sided differencing used at the matrix boundary points. The horizontal and vertical temperature gradients at the same spatial location are combined into a single gradient vector, forming a temperature gradient field. Each spatial location in the temperature gradient field corresponds to a gradient vector. The maximum temperature gradient value is extracted by searching for the maximum magnitude of the gradient vector in the temperature gradient field. Simultaneously, the angle between the gradient vector at the spatial location of the maximum temperature gradient value and the horizontal direction is extracted as the temperature gradient direction angle, ranging from 0 radians to π radians (converted to 0 degrees to 180 degrees). The maximum temperature gradient value and the temperature gradient direction angle are then arranged sequentially to form a thermal imaging temperature gradient feature vector, which is a 2D column vector.
[0030] In practice, the vibration frequency domain feature vector, acoustic emission energy feature vector, and thermal imaging temperature gradient feature vector are concatenated sequentially, with the three scalar values of the vibration frequency domain feature vector first, followed by the three scalar values of the acoustic emission energy feature vector, and finally the two scalar values of the thermal imaging temperature gradient feature vector. This generates an 8-dimensional joint feature vector. The first to eighth dimensions of the joint feature vector correspond to the frequency shift amplitude, harmonic amplitude, high-frequency noise floor amplitude, peak energy value, average energy value, energy rise rate, maximum temperature gradient value, and temperature gradient direction angle, respectively, and are expressed in the following arrangement:
[0031] in: Indicates the frequency amplitude. Indicates the harmonic amplitude. Indicates the amplitude of the high-frequency noise floor. Indicates the peak energy value. Indicates the average energy value. Indicates the rate of energy increase. This represents the maximum temperature gradient value. Indicates the direction angle of the temperature gradient, superscript This represents the transpose of the vector. The joint feature vector is updated once at the end of each current time window and serves as the input feature for the state prediction module.
[0032] In one embodiment of the present invention, see [reference] Figure 3 The process involves obtaining a list of device IDs and device location coordinates for all equipment within the smart workshop. The Euclidean distance between any two devices is calculated based on the device location coordinates. Each device is defined as a graph node, and a corresponding device ID is assigned to each graph node. For any two devices, if the Euclidean distance is less than a preset connection distance threshold, an undirected edge is established between the corresponding graph nodes of the two devices. The reciprocal of the Euclidean distance is used as the edge weight of the undirected edge; a larger edge weight indicates stronger spatial coupling between the two devices. All graph nodes, all undirected edges, and their edge weights are organized into a device relationship graph. The joint feature vector is used as the input feature of the graph node corresponding to the target device at the current time step. The input features of all graph nodes are organized into a graph structure input tensor according to the adjacency relationship of the device relationship graph. This graph structure input tensor is input into a pre-trained graph convolutional network. Each layer of the graph convolutional network performs neighborhood aggregation and feature transformation on the features of each graph node.
[0033] The final graph convolutional layer of the graph convolutional network outputs a hidden state representation vector for each graph node. This vector integrates the node's multimodal features and the spatial coupling features of neighboring nodes. The hidden state representation vectors of all graph nodes are stacked into a hidden state matrix in device number order. The hidden state matrix at the current time step is concatenated with the hidden state matrices from a predetermined number of previous time steps along the time dimension to obtain the temporal context tensor. This temporal context tensor is then input into the temporal prediction network. The temporal prediction network uses a gated recurrent unit (GRU) structure to recursively process the temporal context tensor along the time dimension, updating the hidden state of the GRU at each time step. The fully connected layer of the temporal prediction network receives the hidden state of the GRU at the last time step and outputs a sequence of predicted values within the future time window. Each predicted value in the sequence corresponds to a predicted running state at a specific point in the future time window. This sequence of predicted values is then used as the running state prediction sequence.
[0034] In one exemplary implementation scenario, a smart workshop is equipped with six target devices: three horizontal machining centers of the same model and three gantry milling machines of the same model. The device numbers of the six target devices are HC-001, HC-002, HC-003, GM-001, GM-002, and GM-003, respectively. A list of device numbers and a list of device location coordinates for all six target devices in the smart workshop are obtained. The device location coordinates are represented using three-dimensional coordinates in the workshop's global spatial coordinate system. The device location is taken as the three-dimensional coordinates of the geometric center of its base in the workshop's global spatial coordinate system, and the coordinate values are measured and recorded by a laser tracker. Based on the list of device location coordinates, the Euclidean distance between each pair of devices is calculated. The Euclidean distance is calculated based on the square root of the sum of the squares of the differences between the X-axis, Y-axis, and Z-axis components of the two devices' location coordinates in the workshop's global spatial coordinate system. Each target device is defined as a graph node, and a corresponding device number is assigned to each graph node. For any two target devices, if the calculated Euclidean distance is less than a preset connection distance threshold of 8.0 meters, an undirected edge is established between the graph nodes corresponding to these two target devices. The preset connection distance threshold is used as a reference distance, and the ratio of the reference distance to the Euclidean distance is used as the edge weight of the undirected edge. The formula for calculating the edge weight is as follows:
[0035] in: The device number is The target equipment and equipment number are The edge weights of the undirected edges between the target devices. This indicates the preset connection distance threshold, with a value of 8.0 meters. The device number is The target equipment and equipment number are The Euclidean distance between the target devices is calculated. A larger edge weight indicates a stronger spatial coupling between the two target devices. In this example scenario, the Euclidean distances and corresponding edge weights between the six target devices are calculated and organized, as shown in Table 1. The unit of distance values in the table is meters.
[0036] Table 1: Summary of Euclidean Distance and Edge Weights Between Equipment Rooms
[0037] "—" indicates that the Euclidean distance between corresponding device number pairs is greater than or equal to the preset connection distance threshold of 8.0 meters, and no undirected edges are established between graph nodes. All graph nodes and all undirected edges and their weights are organized into a device relationship graph according to the adjacency list structure. The device relationship graph uses the device number as the node index, and the edge relationships are expressed by the list of adjacent device numbers and the list of corresponding edge weights.
[0038] The joint feature vector serves as the input feature of the graph nodes corresponding to the target device at the current time step. The joint feature vector is an 8-dimensional column vector, and the current time step is defined as the end time of a sliding time window. The input features of all six graph nodes are organized into a graph structure input tensor according to the adjacency relationships of the device relationship graph. This graph structure input tensor contains a 6x8 node feature matrix and a 6x6 adjacency matrix. The element in the i-th row and j-th column of the adjacency matrix is filled with the edge weight when there is an edge connection, and 0 when there is no edge connection. The graph structure input tensor is then fed into a pre-trained graph convolutional network, which contains two layers of graph convolution operations. The first layer of graph convolution maps 8-dimensional input features to 64-dimensional hidden features. The aggregation method involves summing the products of the feature vectors of all neighboring graph nodes and their edge weights for each graph node, then adding this sum to the linearly transformed features of the current graph node, and finally outputting the result after passing through the ReLU activation function. The second layer of graph convolution maps the 64-dimensional hidden features to 128-dimensional hidden features. The aggregation method is the same as the first layer, and the activation function is also ReLU. Each layer of the pre-trained graph convolutional network performs neighborhood aggregation and feature transformation on the features of each graph node. The pre-training process constructs a device relationship graph based on historical multimodal data, and then optimizes the parameters of the linear transformation matrix in the graph convolutional network through supervised learning.
[0039] The final layer (second layer) of the graph convolutional network outputs a hidden state representation vector for each graph node. This vector is a 128-dimensional column vector, incorporating the node's multimodal features and the spatial coupling features of adjacent nodes. The hidden state representation vectors of all six graph nodes are stacked into a hidden state matrix in device number order. This matrix has a size of 6 rows and 128 columns, with the rows arranged in a fixed order: HC-001, HC-002, HC-003, GM-001, GM-002, GM-003. The hidden state matrix at the current time step is concatenated along the time dimension with the hidden state matrices at a preset number of historical time steps (5 time steps). The step interval between each historical time step and the current time step is 0.6 seconds. This concatenation yields a temporal context tensor with dimensions of 6 rows and 128 columns multiplied by 6 time steps. The temporal context tensor is input into the temporal prediction network, which employs a gated recurrent unit (GRU) structure. Each GRU contains an update gate, a reset gate, and a candidate hidden state computation unit, with the hidden state dimension set to 256. The temporal prediction network recursively processes the temporal context tensor along the time dimension, starting from the earliest historical time step. It reads six rows of 128-dimensional hidden state matrix slices step by step. At each time step, the hidden state of the GRU is updated based on the previous hidden state and the current input. After recursive processing for all six time steps, the hidden state of the GRU at the last time step is retained. The fully connected layer of the temporal prediction network receives the hidden state of the gated recurrent unit from the last time step. This fully connected layer is a linear transformation layer that maps the 256-dimensional hidden state to the dimensions required for the predicted value sequence. In this example scenario, the predicted operating state of the target device is a scalar representing the overall vibration intensity of the device. Therefore, the output dimension of the fully connected layer is the number of prediction steps within the future time window. Here, the future time window is set to 3.0 seconds, and the number of prediction steps corresponds to 30 prediction time points. The fully connected layer outputs a 30-dimensional predicted value sequence, where each predicted value corresponds to the predicted operating state at a specific time point within the future time window. This 30-dimensional predicted value sequence is used as the target device's operating state prediction sequence. Each scalar value in the operating state prediction sequence represents the predicted overall vibration intensity of the target device at the corresponding future time point, expressed in millimeters per second.
[0040] In one embodiment of the present invention, after the future time window is actually reached, the measured operating status values within the future time window are collected by sensors deployed on the target device, forming a measured operating status sequence. The predicted operating status sequence and the measured operating status sequence are aligned at the same time points, and the difference between the measured operating status value and the predicted operating status value at each time point is calculated to obtain a residual sequence. The sample mean and sample standard deviation of the residual sequence are calculated. Based on the sample mean and sample standard deviation, each residual value in the residual sequence is standardized to obtain a standardized score sequence, where each standardized score is equal to the residual value minus the sample mean divided by the sample standard deviation. A one-sample Kolmogorov-Smirnov test is performed on the standardized score sequence to test whether the standardized score sequence follows a standard normal distribution. When the test statistic of the one-sample Kolmogorov-Smirnov test is greater than a preset distribution test threshold, the standardized score sequence is determined to deviate from the standard normal distribution. Under the condition that the standardized score sequence deviates from the standard normal distribution, the maximum value of the absolute value of the standardized score sequence is calculated. When the maximum absolute value is greater than the preset abnormal deviation threshold, the target device is determined to have an abnormal operating status.
[0041] Read the device number of the target device from the list of device numbers of the target devices with abnormal operating status. Find the time point in the residual sequence where the absolute value of the standardized score first exceeds the preset abnormal deviation threshold, and record this time point as the abnormal start time point. Find the time point in the residual sequence where the absolute value of the standardized score continuously falls below the preset abnormal deviation threshold, and record this time point as the abnormal end time point. Calculate the time difference between the abnormal end time point and the abnormal start time point, and use this time difference as the abnormal duration. Organize the device number, abnormal start time point, and abnormal duration into a data structure according to the preset alarm data format. Perform a serialization operation on the data structure to generate an abnormal alarm record in string format. Attach a timestamp and a checksum to the abnormal alarm record. The timestamp records the time point when the abnormal determination was completed, and the checksum is used to verify the data integrity of the abnormal alarm record.
[0042] In one exemplary implementation scenario, for a gantry milling machine with device number GM-002, the inference process of a graph convolutional network and a temporal prediction network is described. This yields a predicted sequence of operating states within a 3.0-second future time window from 14:28:30.0 to 14:28:33.0 on March 12, 2025. This sequence contains 30 predicted time points, each spaced 0.1 seconds apart. Each predicted value is the overall vibration intensity of the equipment, expressed in millimeters per second. After the future time window actually arrives, vibration sensors deployed on the side wall of the column of the target device GM-002 gantry milling machine collect measured operating state values within the future time window at an actual sampling rate of 100 Hz. These measured operating state values are also defined as the overall vibration intensity of the equipment. The root mean square value of the original vibration acceleration value is calculated within a 0.1-second window, resulting in 30 measured operating state values. These 30 measured operating state values are arranged chronologically to form a measured operating state sequence. The predicted and measured operating status sequences are aligned at the same time points. The difference between the predicted and measured operating status values at each time point is calculated to obtain a residual sequence containing 30 residual values, each corresponding to a standardized time label. Table 2 lists the predicted, measured, and residual operating status values of the target device GM-002 at some time points within this future time window.
[0043] Table 2: Predicted and Measured Values of Operating Status of Target Equipment GM-002 at Some Future Time Points, and Residual Calculation Table
[0044] Calculate the sample mean and sample standard deviation of the residual sequence. Sum all 30 residual values in the residual sequence and divide by 30 to obtain the sample mean, which is +0.041 mm / s. Calculate the sum of squares of the deviations of each residual value from the sample mean, divide by 29 to obtain the sample variance, and then take the square root to obtain the sample standard deviation, which is 0.057 mm / s. Based on the sample mean and sample standard deviation, standardize each residual value in the residual sequence to obtain a standardized score sequence. The formula for calculating each standardized score is as follows:
[0045] in: Represents the th element in the residual sequence. Standardized scores corresponding to each time point, dimensionless; Represents the th element in the residual sequence. The residual values at each time point are expressed in millimeters per second. This represents the sample mean of the residual sequence, expressed in millimeters per second. The sample standard deviation of the residual sequence is expressed in millimeters per second; subscript The value is an integer from 1 to 30.
[0046] In practice, a one-sample Kolmogorov-Smirnov test is performed on the standardized score sequence to test whether the standardized score sequence follows a standard normal distribution, which has a mean of 0 and a standard deviation of 1. The specific steps are as follows: First, the standardized score sequence is sorted in ascending order of value, generating a sorted standardized score sequence. For each standardized score value in the sorted sequence, the empirical cumulative distribution function (ECF) value is calculated. The ECF value is defined as the number of standardized scores less than or equal to that value divided by the total number of standardized score sequences (30). Then, the theoretical cumulative distribution function (CDF) value of the standard normal distribution at that standardized score value is calculated. The maximum absolute value of the difference between the ECF value and the theoretical CDF value is taken as the test statistic for the one-sample Kolmogorov-Smirnov test, which is calculated to be 0.351. The preset distribution test threshold is set to 0.242, which corresponds to the critical value for the one-sample Kolmogorov-Smirnov test for 30 samples at a significance level of 0.05. When the test statistic of the one-sample Kolmogorov-Smirnov test (0.351) is greater than the preset distribution test threshold of 0.242, the standardized score sequence is determined to deviate from the standard normal distribution. Under the condition that the standardized score sequence deviates from the standard normal distribution, the maximum absolute value of the standardized score sequence is calculated; that is, the absolute value of each standardized score in the standardized score sequence is taken, and the largest absolute value is found. The calculated maximum absolute value of the standardized score sequence is 3.87. The preset outlier deviation threshold is set to 2.58, which corresponds to the quantile of the two-sided 99% confidence interval of the standard normal distribution. When the maximum absolute value (3.87) is greater than the preset outlier deviation threshold (2.58), the target equipment GM-002 gantry milling machine is determined to have an abnormal operating status.
[0047] The device number of the target device, GM-002 gantry milling machine, which was determined to have an abnormal operating status, was read from the device number list. The device number is the string "GM-002". The time point when the absolute value of the standardized score first exceeded the preset abnormal deviation threshold of 2.58 was found from the residual sequence. According to the timeline corresponding to Table 2, the absolute value of the standardized score exceeded the preset abnormal deviation threshold at time point label 2.0. Further backtracking confirmed that the exact time point when the absolute value of the standardized score first exceeded the preset abnormal deviation threshold of 2.58 was at time point label 1.8, which corresponds to the absolute timestamp of March 12, 2025, 14:28:31.8. This time point was recorded as the anomaly start time point. The search begins by identifying time points in the residual sequence where the absolute value of the standardized score continuously falls below a preset anomaly deviation threshold of 2.58. Starting from time label 3.0 seconds prior, if no instance of the absolute value of the standardized score falling below 2.58 is observed in the standardized score sequence, then the end time of the future time window, i.e., 14:28:33.0 on March 12, 2025, is recorded as the anomaly end time. The time difference between the anomaly end time and the anomaly start time is calculated; this difference is 1.2 seconds, and this 1.2 seconds is taken as the anomaly duration. The device ID string "GM-002", the anomaly start time (March 12, 2025, 14:28:31.8), and the anomaly duration (1.2 seconds) are organized into a data structure according to a preset alarm data format. The preset alarm data format is defined as a structure containing three members: device ID, anomaly start timestamp, and anomaly duration. The device ID is a fixed-length 32-byte string, the anomaly start timestamp is a 64-bit integer, and the anomaly duration is a double-precision floating-point number in seconds. A serialization operation is performed on the data structure, converting each of the three members sequentially to a string and concatenating them to generate a string-formatted anomaly alarm record. The serialized anomaly alarm record string is "GM-002|1710255311800|1.200". A timestamp and checksum are appended to the abnormal alarm record. The timestamp records the time when the abnormal judgment was completed. Here, the timestamp is taken as the current system time of March 12, 2025, 14:28:33.5, which is represented as 1710255313500. The checksum uses the CRC32 check algorithm to calculate a 32-bit cyclic redundancy check value for all bytes of the abnormal alarm record string, and appends the checksum as a hexadecimal string to the end of the abnormal alarm record to verify the data integrity of the abnormal alarm record. The final format of the complete abnormal alarm record string after appending the timestamp and checksum is "GM-002|1710255311800|1.200|1710255313500|A3F7C2B1".
[0048] In one embodiment of the present invention, abnormal alarm records are sent to the workshop monitoring terminal via a message queue through the workshop's internal local area network. Upon receiving the abnormal alarm record, the workshop monitoring terminal parses out the device number, the start time of the abnormality, and the duration of the abnormality. The workshop monitoring terminal queries the local database for the pixel coordinates of the device on the workshop layout diagram based on the device number. The workshop monitoring terminal loads a pre-stored workshop layout diagram onto the display interface. A highlighted, flashing marker box is drawn on the workshop layout diagram, centered on the pixel coordinates. The start time of the abnormality and the duration of the abnormality are displayed next to the marker box in text label form. Simultaneously, the workshop monitoring terminal appends the received abnormal alarm records to a local log file, which is stored in volumes by date.
[0049] After anomaly alarm records are pushed to the workshop monitoring terminal, the terminal accumulates the number of alarm records for the same device number. When the number of alarm records for the same device number exceeds a preset frequency threshold within a sliding time window, the terminal generates a device degradation trend warning. This warning, along with the corresponding device number, the start and end times of the sliding time window, is packaged into a degradation report. The terminal sends this report to the designated email address of the equipment maintenance personnel. On the display interface, the terminal changes the color of the corresponding device's marker from a flashing highlight to a solid red. Simultaneously, the terminal disables the push and display of alarm records for the device number within a preset silent period to prevent duplicate alarms from overwhelming the system.
[0050] In one exemplary implementation scenario, the workshop's internal LAN adopts a gigabit Ethernet architecture. The workshop monitoring terminal is an industrial control computer deployed in the workshop's central control room, running a Linux operating system. The fixed IP address of the workshop monitoring terminal is 192.168.10.15. The complete abnormal alarm record string "GM-002|1710255311800|1.200|1710255313500|A3F7C2B1" is sent to the workshop monitoring terminal via the workshop's internal LAN using a message queue. The message queue is implemented using the Advanced Message Queuing Protocol (ALP). The message queue's exchange name is set to "workshop.alarm.exchange", the routing key is set to "alarm.device.status", the queue name is set to "monitor.terminal.queue", the message persistence option is set to enabled, and the message body is the binary encoding of the abnormal alarm record string. A persistent message consumer process runs on the workshop monitoring terminal. The message consumer process subscribes to messages with the routing key "alarm.device.status" in the message queue through a TCP long connection. After receiving a message pushed by the message queue, it immediately reads the message body and parses out three fields: device number "GM-002", timestamp value corresponding to the abnormal start time 1710255311800, and abnormal duration 1.200 seconds.
[0051] The workshop monitoring terminal retrieves the pixel coordinates of device "GM-002" from the local database on the workshop layout diagram based on the device number "GM-002". The local database uses SQLite and maintains a device layout table containing four fields: device number, device name, horizontal pixel coordinates, and vertical pixel coordinates. The SQL query is "SELECT pixel_x, pixel_y FROM device_layout WHERE device_id='GM-002'", and the query returns a horizontal pixel coordinate of 820 and a vertical pixel coordinate of 375. The workshop monitoring terminal then loads a pre-stored workshop layout diagram onto the display interface. The layout diagram is an RGB bitmap file with a resolution of 1920×1080 pixels, stored as a binary large object in the local file system at the path " / var / workshop / layout / factory_plan.png". On the workshop layout diagram, draw a highlighted, blinking marker box centered at pixel coordinates (820, 375). The marker box is 60 pixels wide and 60 pixels high, with a bright yellow border and a 3-pixel width. The blinking effect is achieved by alternating the marker box's transparency between 255 and 40, switching every 400 milliseconds to create a noticeable blinking effect. Offset 10 pixels to the right of the marker box, display the anomaly's start time and duration as a text label. The text label contains the formatted string "Start: 2025-03-12 14:28:31.8 | Duration: 1.20s". The font is 14-point monospaced, the text color is white, and the background fill is semi-transparent black. The workshop monitoring terminal also appends the received abnormal alarm records to the local log file. The local log file is stored in “ / var / workshop / logs / alarm_2025-03-12.log”. The log file is stored in volumes according to date. A new log file named after the new date is automatically created at midnight every day. The write operation adopts the file append mode and immediately calls the file system synchronization primitive to write the buffer contents to disk after each write.
[0052] After pushing abnormal alarm records to the workshop monitoring terminal, the terminal also performs degradation trend analysis and alarm silencing procedures. The terminal maintains an alarm frequency counter mapping table in memory. The key of the mapping table is the device number string, and the values are a linked list of alarm timestamps sorted by time. When an abnormal alarm record is parsed, the alarm frequency counter mapping table searches for the linked list of alarm timestamps corresponding to device number "GM-002". The abnormal start timestamp value of this alarm record, 1710255311800, is inserted at the end of the linked list. Simultaneously, the linked list is traversed to delete all timestamp records earlier than the current time minus the sliding time window duration, which is set to 3600 seconds. The workshop monitoring terminal accumulates the number of abnormal alarm records for the same device number "GM-002" received, which is the total number of elements in the trimmed alarm timestamp chain. When the total number of elements exceeds a preset frequency threshold of 5 times within the sliding time window, the workshop monitoring terminal generates a device degradation trend warning. The device degradation trend warning text "Device GM-002 has triggered 6 abnormal alarms in the last 3600 seconds, and the degradation trend is significant" is packaged into a degradation report along with the corresponding device number "GM-002", the start time of the sliding time window (calculated by subtracting 3600 seconds from the current time), and the end time of the sliding time window (the current time). The degradation report is in JSON format, with the structure members containing four fields: device_id, warning_message, window_start, and window_end. The workshop monitoring terminal sends the degradation report via email to the pre-set email address of the equipment maintenance personnel. The email uses the Simple Mail Transfer Protocol (SMLP), with the sending server being the workshop's internal mail server. The sending address is "alarm@workshop.local", and the receiving email address is "maintenance@workshop.local". The email subject is "[Degradation Warning] Frequent Abnormalities in Equipment GM-002", and the email body is formatted text in a JSON structure. Simultaneously, the workshop monitoring terminal changes the color of the marker box for the corresponding equipment GM-002 on the display interface from a bright flashing mode to a continuous red light mode. The continuous red light mode is achieved by setting the marker box border color to red, fixing the transparency to 255, and disabling the flashing period timer. The workshop monitoring terminal also adds the equipment number "GM-002" to an alarm silence set. Each entry in the alarm silence set includes the silenced equipment number and a silence end timestamp, which is set to the current system time plus the timestamp value 1800 seconds after the preset silence duration.Within the next 1800 seconds, even if the workshop monitoring terminal receives another abnormal alarm record with device number "GM-002", the message consumer process will first check whether there is an entry with device number "GM-002" in the alarm silence set and whose current time is less than the silence end timestamp after parsing. If it exists, the abnormal alarm record will be discarded directly, and the marker box drawing update and log writing operations will not be performed to avoid the flood of duplicate alarms.
[0053] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments that can be applied to other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
Claims
1. A smart workshop equipment operation status monitoring system based on multimodal perception, characterized in that, include: The data acquisition module synchronously collects operational data of the target equipment through multiple types of sensors deployed in the smart workshop and constructs a multimodal synchronous data cube; The feature extraction module extracts vibration frequency domain features, acoustic emission energy features, and thermal imaging temperature gradient features from the multimodal synchronous data cube, and concatenates the three types of features into a joint feature vector. The state prediction module inputs the joint feature vector into a pre-trained graph convolutional network and outputs the hidden state representation vector corresponding to each node. The hidden state representation vector is then input into a temporal prediction network to output the predicted sequence of the target device's operating state within a future time window. The anomaly detection module calculates the residual sequence between the predicted operating state sequence and the measured operating state sequence, performs statistical hypothesis testing on the residual sequence, and determines that the target device has an abnormal operating state when the test statistic exceeds a preset threshold. The alarm output module encapsulates the device number, the time of occurrence of the anomaly, and the duration of the anomaly of the target device that is determined to be abnormal into an anomaly alarm record, pushes the anomaly alarm record to the workshop monitoring terminal, and highlights the location of the abnormal device on the workshop monitoring terminal.
2. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 1, characterized in that, By synchronously collecting operational data from target equipment using multiple types of sensors deployed within the smart workshop, and constructing a multimodal synchronous data cube, including: Vibration sensors, acoustic emission sensors, and infrared thermal imagers deployed in the smart workshop synchronously collect the operating data of the target equipment, and generate vibration time-series signals, acoustic emission time-series signals, and thermal imaging sequence maps, respectively. The vibration timing signal, the acoustic emission timing signal, and the thermal imaging sequence are time-stamped and spatial coordinates are registered to construct a multimodal synchronous data cube. The process involves synchronously collecting operational data of the target equipment using vibration sensors, acoustic emission sensors, and infrared thermal imagers deployed within the smart workshop, and generating vibration time-series signals, acoustic emission time-series signals, and thermal imaging sequence maps, specifically including: A vibration sensor is fixedly installed on the casing of each target device in the smart workshop, and the sensitive direction of the vibration sensor is aligned with the main vibration direction of the target device. An acoustic emission sensor is fixedly installed on the outer shell of the target device at a preset distance from the vibration sensor, and the coupling surface of the acoustic emission sensor is in close contact with the surface of the outer shell of the target device; An infrared thermal imager is mounted directly in front of the target device, with the optical axis of the infrared thermal imager pointing towards the key heat-generating area of the target device. The synchronous clock generator is activated to simultaneously send sampling trigger pulses to the vibration sensor, acoustic emission sensor, and infrared thermal imager. After receiving a sampling trigger pulse, the vibration sensor acquires vibration acceleration values at a fixed sampling frequency and generates a vibration timing signal. After receiving a sampling trigger pulse, the acoustic emission sensor acquires the acoustic emission amplitude at a fixed sampling frequency and generates an acoustic emission timing signal. After receiving a sampling trigger pulse, the infrared thermal imager acquires thermal images at a fixed frame rate and generates a thermal imaging sequence.
3. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 2, characterized in that, The vibration timing signal, the acoustic emission timing signal, and the thermal imaging sequence are time-stamped and registered in spatial coordinate system to construct a multimodal synchronous data cube, specifically including: Each sampling point in the vibration timing signal is labeled with an absolute timestamp, which is derived from the output of the synchronous clock generator; Each sampling point in the acoustic emission timing signal is labeled with the same absolute timestamp; Each frame in the thermal imaging sequence is labeled with the same absolute timestamp; Based on the sampling time points of the vibration time series signal, cubic spline interpolation is performed on the acoustic emission time series signal and the thermal imaging sequence map so that the data of the three modes have the same time point sequence on the time axis; Establish a global spatial coordinate system for the workshop and record the installation coordinates of the vibration sensor and acoustic emission sensor on each target device; Record the optical center position coordinates and optical axis pointing vector of the infrared thermal imager in front of each target device; The image coordinates of each pixel in the thermal imaging sequence are converted into spatial coordinates in the workshop global spatial coordinate system through perspective projection transformation. The vibration timing signal, acoustic emission timing signal, and thermal imaging sequence image data with the same absolute timestamp and corresponding to the target device number are organized according to the time dimension, spatial dimension, and modal dimension to form the multimodal synchronization data cube.
4. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 3, characterized in that, Vibration frequency domain features, acoustic emission energy features, and thermal imaging temperature gradient features are extracted from the multimodal synchronous data cube, and the three types of features are concatenated into a joint feature vector, including: Vibration data slices within the current time window are segmented along the time dimension from the multimodal synchronization data cube; Perform a fast Fourier transform on the vibration data slices to obtain the vibration frequency domain amplitude spectrum; The frequency shift amplitude, harmonic amplitude, and high-frequency noise floor amplitude are extracted from the vibration frequency domain amplitude spectrum to form a vibration frequency domain feature vector; Slice the acoustic emission data within the current time window from the multimodal synchronization data cube along the time dimension; The acoustic emission signal envelope at each time point is calculated for the acoustic emission data slices to obtain the acoustic emission energy envelope sequence; The peak energy value, average energy value, and energy rise rate are extracted from the acoustic emission energy envelope sequence to form an acoustic emission energy feature subvector. The thermal imaging temperature matrix of the target device surface within the current time window is segmented along the spatial dimension from the multimodal synchronous data cube. The horizontal and vertical temperature gradients are calculated from the thermal imaging temperature matrix to obtain the temperature gradient field. The maximum temperature gradient value and the temperature gradient direction angle are extracted from the temperature gradient field to form a thermal imaging temperature gradient feature vector. The vibration frequency domain feature vector, the acoustic emission energy feature vector, and the thermal imaging temperature gradient feature vector are concatenated sequentially to generate the joint feature vector.
5. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 1, characterized in that, The joint feature vector is input into a pre-trained graph convolutional network, including: The graph convolutional network constructs a device relationship graph using devices within the workshop as nodes and physical distances between devices as edge weights. Obtain a list of device numbers and device location coordinates for all equipment within the smart workshop; Calculate the Euclidean distance between every two devices based on the device location coordinate list; Define each device as a graph node, and bind the corresponding device number to each graph node; For any two devices, when the Euclidean distance value is less than the preset connection distance threshold, an undirected edge is established between the graph nodes corresponding to the two devices. The reciprocal of the Euclidean distance value is used as the edge weight of the undirected edge. The larger the edge weight, the stronger the spatial coupling between the two devices. Organize all graph nodes and all undirected edges and their weights into the device relationship graph; The joint feature vector is used as the input feature of the graph node corresponding to the target device at the current time step; Organize the input features of all graph nodes into a graph structure input tensor according to the adjacency relationship of the device relationship graph; The graph structure input tensor is input into a pre-trained graph convolutional network, and each layer of the graph convolutional network performs neighborhood aggregation and feature transformation on the features of each graph node.
6. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 5, characterized in that, The joint feature vector is input into a pre-trained graph convolutional network, which outputs a hidden state representation vector for each node. This hidden state representation vector is then input into a temporal prediction network to output a predicted sequence of the target device's operating state within a future time window, including: The last layer of the graph convolutional network outputs a hidden state representation vector for each graph node. The hidden state representation vector integrates the node's own multimodal features and the spatial coupling features of adjacent nodes. Stack the hidden state representation vectors of all graph nodes into a hidden state matrix in order of device number; The hidden state matrix at the current time step is concatenated with the hidden state matrices at a preset number of historical time steps along the time dimension to obtain the temporal context tensor. The temporal context tensor is input into the temporal prediction network; The time-series prediction network employs a gated recurrent unit structure to recursively process the time-series context tensor along the time dimension, updating the hidden state of the gated recurrent unit at each time step. The fully connected layer of the time series prediction network receives the hidden state of the gated recurrent unit at the last time step and outputs the sequence of predicted values within the future time window. Each predicted value in the predicted value sequence corresponds to the predicted running status at a point in time within a future time window. The predicted value sequence is used as the operating state prediction sequence.
7. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 6, characterized in that, Calculate the residual sequence between the predicted operating state sequence and the measured operating state sequence, perform statistical hypothesis testing on the residual sequence, and determine that the target device has an abnormal operating state when the test statistic exceeds a preset threshold, including: After the future time window is actually reached, the measured operating status values within the future time window are collected by sensors deployed on the target device to form the measured operating status sequence. Align the predicted operating state sequence with the measured operating state sequence at the same time point, and calculate the difference between the measured operating state value and the predicted operating state value at each time point to obtain the residual sequence. Calculate the sample mean and sample standard deviation of the residual sequence; Based on the sample mean and the sample standard deviation, each residual value in the residual sequence is standardized to obtain a standardized score sequence, wherein each standardized score is equal to the residual value minus the sample mean and divided by the sample standard deviation. Perform a one-sample Kolmogorov-Smirnov test on the standardized score sequence to test whether the standardized score sequence follows a standard normal distribution; When the test statistic of the single-sample Kolmogorov-Smirnov test is greater than the preset distribution test threshold, the standardized score sequence is determined to deviate from the standard normal distribution. Given that the standardized score sequence deviates from the standard normal distribution, calculate the maximum absolute value of the standardized score sequence. When the maximum value of the absolute value is greater than the preset abnormal deviation threshold, it is determined that the target device has an abnormal operating status.
8. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 7, characterized in that, The device number of the target device identified as abnormal, the time of the abnormality, and the duration of the abnormality are encapsulated into an abnormal alarm record, including: Read the device number of the target device from the list of device numbers of the target devices that are determined to have abnormal operating status; Find the time point when the absolute value of the standardized score first exceeds the preset abnormal deviation threshold from the residual sequence, and record the time point as the abnormal start time point; Find the time point in the residual sequence where the absolute value of the standardized score is continuously lower than the preset abnormal deviation threshold, and record the time point as the abnormal end time point; Calculate the time difference between the end time of the anomaly and the start time of the anomaly, and use the time difference as the duration of the anomaly. The device number, the abnormal start time, and the abnormal duration are organized into a data structure according to a preset alarm data format. Perform a serialization operation on the data structure to generate the exception alarm record in string format; A timestamp and a checksum are attached to the abnormal alarm record. The timestamp records the time when the abnormality determination is completed, and the checksum is used to verify the data integrity of the abnormal alarm record.
9. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 8, characterized in that, Pushing the abnormal alarm record to the workshop monitoring terminal and highlighting the location of the abnormal equipment on the workshop monitoring terminal includes: The abnormal alarm records are sent to the workshop monitoring terminal via message queue through the workshop's internal LAN. After receiving abnormal alarm records, the workshop monitoring terminal parses out the equipment number, the start time of the abnormality, and the duration of the abnormality. The workshop monitoring terminal queries the local database for the pixel coordinates of the device on the workshop layout map based on the device number; The workshop monitoring terminal loads a pre-stored workshop layout diagram onto the display interface; Draw a highlighted, flashing marker box centered on the pixel coordinates on the workshop layout diagram; The anomaly start time and duration are displayed as text labels next to the marked box. The workshop monitoring terminal also appends the received abnormal alarm records to the local log file, which is stored in volumes by date.
10. The intelligent workshop equipment operation status monitoring system based on multimodal perception according to claim 9, characterized in that, After the abnormal alarm record is pushed to the workshop monitoring terminal, the following steps are performed: The number of abnormal alarm records with the same device number received by the workshop monitoring terminal; When the number of abnormal alarm records for the same device number exceeds a preset frequency threshold within a sliding time window, the workshop monitoring terminal generates a device deterioration trend warning. The device degradation trend warning is packaged with the corresponding device number, the start time and end time of the sliding time window into a degradation report; The workshop monitoring terminal sends the degradation report to the preset email address of the equipment maintenance personnel. The workshop monitoring terminal switches the color of the corresponding device's marker box on the display interface from a bright flashing mode to a continuous red light mode. The workshop monitoring terminal also disables the push and display of abnormal alarm records with the device number within a preset silent period in the future, in order to avoid overwhelming the system with duplicate alarms.