Intelligent meter control method and system based on deep learning
By collecting multimodal data in real time and using deep learning models and DQN algorithms to generate optimized control strategies, the problem of dynamic adjustment of intelligent instrument control methods in complex scenarios is solved, improving the accuracy and efficiency of control and reducing the failure rate.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YUNNAN INST OF MEASUREMENT TEST TECH RES
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-19
AI Technical Summary
Existing intelligent instrument control methods are difficult to adapt to complex and ever-changing application scenarios. The control logic cannot be dynamically adjusted, and the multi-source data processing capability is insufficient, resulting in large overshoot and excessive steady-state error. Furthermore, there is a lack of multi-modal data fusion mechanism.
Multimodal data is collected in real time through cameras, sensor networks, and IoT devices. The data is then normalized and outlier removed. A deep learning feature extraction model and DQN algorithm are used to generate an optimized control strategy. The control parameters are then dynamically adjusted by combining a self-attention mechanism and safety constraints.
It achieves comprehensive operational condition perception of smart instruments, improves the accuracy of feature extraction, reduces the equipment failure rate, enhances the timeliness and efficiency of control, and enables dynamic optimization without manual intervention.
Smart Images

Figure CN121834728B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent control technology, specifically to an intelligent instrument control method and system based on deep learning. Background Technology
[0002] As core data interaction terminals in fields such as industrial automation and energy management, the control performance of intelligent instruments directly determines the stability and energy efficiency of system operation. Current intelligent instrument control methods generally rely on traditional PID algorithms or simple machine learning models, making them difficult to adapt to complex and ever-changing application scenarios. Control logic is often based on manually preset thresholds, which cannot dynamically adjust control parameters when faced with disturbances such as instrument pointer jitter, sudden changes in ambient temperature and humidity, and power grid load fluctuations, often resulting in large overshoot and excessive steady-state error. Existing technologies have significant shortcomings in processing multi-source data; most solutions only collect single data such as instrument readings, neglecting key information such as images and vibrations, and lack an effective multi-modal data fusion mechanism. Therefore, a deep learning-based intelligent instrument control method and system are urgently needed. Summary of the Invention
[0003] The purpose of this invention is to solve the above problems by designing an intelligent instrument control method and system based on deep learning.
[0004] The first aspect of this invention provides a deep learning-based intelligent instrument control method, which includes the following steps:
[0005] By using cameras, sensor networks, and IoT devices, instrument images, environmental parameters, and operating status data are collected in real time to build a multimodal dataset;
[0006] The data in the multimodal dataset are normalized, outlier removed, and time series aligned to obtain a standardized dataset.
[0007] The standardized dataset is input into the feature extraction model, and features are extracted through a visual feature extraction network, a temporal data processing network, and an environmental perception fusion network. Feature weights are assigned based on a self-attention mechanism to obtain a fused feature vector.
[0008] Using the fused feature vector as the state input, safety constraints are constructed, and the DQN algorithm is used to generate an optimized control strategy, which is then sent to the actuator.
[0009] Optionally, in a first implementation of the first aspect of the present invention, the step of normalizing, removing outliers, and aligning time series data in the multimodal dataset to obtain a standardized dataset includes:
[0010] The Min-Max normalization method was used to normalize the numerical data in the multimodal dataset;
[0011] Multiple isolated trees are constructed by randomly sampling samples from normalized numerical data. The degree of anomaly is determined by calculating the path length of each data point in the isolated tree. Data with a path length exceeding the preset anomaly threshold is marked as anomalies. The median of all normal data in a sliding window with a length of 5 data points is used to replace the anomalies.
[0012] The instrument images in the multimodal dataset were standardized using the RGB color space standardization method, and brightness and contrast were adjusted.
[0013] Based on the timestamps of camera captures, linear interpolation is used to fill in the missing timestamps in the timestamp discrepancies uploaded by sensor networks and IoT devices, forming a standardized dataset.
[0014] Optionally, in a second implementation of the first aspect of the present invention, the step of inputting a standardized dataset into a feature extraction model, extracting features through a visual feature extraction network, a temporal data processing network, and an environmental perception fusion network, and allocating feature weights based on a self-attention mechanism to obtain a fused feature vector includes:
[0015] In the visual feature extraction network, the ResNet-50 model is used, and the CBAM attention mechanism is added to identify the position of the meter pointer and the scale to obtain the visual feature vector;
[0016] In the time series data processing network, the LSTM-Transformer hybrid model is used to predict future load changes and obtain time series feature vectors.
[0017] In the environmental perception fusion network, a three-layer fully connected network is used to fuse parameters and construct an environmental perception feature vector. The first fully connected network uses the ReLU activation function to adjust the data dimension, the second fully connected network uses the Dropout layer to prevent overfitting, and the third fully connected network outputs the results.
[0018] Visual feature vectors, temporal feature vectors, and environmental perception feature vectors are concatenated to form joint features. Query, Key, and Value matrices are mapped onto the joint features. The similarity matrix between the Query and Key is calculated and normalized by Softmax to obtain the weight matrix. The weight matrix is multiplied by the Value matrix to output the fused feature vector.
[0019] Optionally, in a third implementation of the first aspect of the present invention, the step of using a ResNet-50 model in the visual feature extraction network, and adding a CBAM attention mechanism to identify the position of the meter pointer and the scale to obtain a visual feature vector, includes:
[0020] Add channel attention gating units to each residual block of the ResNet-50 model, and input instrument images from the standardized dataset into the ResNet-50 model;
[0021] Preliminary feature extraction is performed using a 7×7 convolutional layer and a max pooling layer to obtain a basic feature map. The basic feature map is then subjected to global average pooling using residual blocks to obtain channel feature vectors.
[0022] First, the channel feature vectors are processed through a fully connected layer and a Sigmoid activation function to generate channel attention weights. Then, the channel attention weights are multiplied with the instrument image to enhance the regional features. Finally, the visual feature vectors are output through an average pooling layer and a fully connected layer.
[0023] Optionally, in a fourth implementation of the first aspect of the present invention, the step of using an LSTM-Transformer hybrid model to predict future load changes in the time-series data processing network to obtain a time-series feature vector includes:
[0024] The time series data in the standardized dataset is divided into time windows according to time order and then input into the LSTM-Transformer hybrid model;
[0025] The forward LSTM and backward LSTM of the bidirectional LSTM layer capture the forward and backward variation features of the time series data respectively, and output bidirectional time series features;
[0026] The bidirectional temporal features are input into a Transformer encoder containing four attention heads. Long-distance temporal correlations are mined by calculating the self-attention weights between features. After processing through LayerNorm and fully connected layers, the temporal feature vector is output.
[0027] Optionally, in a fifth implementation of the first aspect of the present invention, the step of using the fused feature vector as state input to construct safety constraints, generating an optimized control strategy using the DQN algorithm, and sending the optimized control strategy to the actuator includes:
[0028] The fused feature vectors are used as the state inputs of the DQN algorithm to establish a correspondence table between the state and the actual operating scenario of the instrument.
[0029] Construct a set of safety constraints that include instrument operating range constraints, environmental safety threshold constraints, and equipment lifespan constraints;
[0030] The DQN algorithm is executed to generate an optimized control strategy. The control strategy generated by the DQN algorithm is then converted into an instruction format that can be recognized by the instrument actuator, and the instruction is sent to the actuator.
[0031] Optionally, in a sixth implementation of the first aspect of the present invention, the step of executing the DQN algorithm to generate an optimized control strategy includes:
[0032] Initialize the experience playback pool, target network, and evaluation network. Use an ε-greedy strategy to select actions. When the random number is greater than ε, select the action with the largest Q value output by the evaluation network. Otherwise, select an action randomly. The action types include at least adjusting the instrument sampling frequency and the actuator action amplitude.
[0033] The reward value is positive when the control action satisfies all safety constraints, and negative otherwise. Experience data is randomly extracted from the experience replay pool, the target Q value is calculated through the target network, the mean square error loss function is used to update the evaluation network parameters, and the evaluation network parameters are copied to the target network every 100 steps.
[0034] Repeat the iteration until the network converges, and output the optimized control strategy.
[0035] A second aspect of the present invention provides a deep learning-based intelligent instrument control system, the system comprising:
[0036] The data acquisition module is used to collect instrument images, environmental parameters, and operating status data in real time through cameras, sensor networks, and IoT devices to build a multimodal dataset.
[0037] The data processing module is used to normalize, remove outliers, and align time series data in the multimodal dataset to obtain a standardized dataset.
[0038] The feature extraction module is used to input standardized datasets into the feature extraction model, extract features through a visual feature extraction network, a temporal data processing network, and an environmental perception fusion network, and assign feature weights based on a self-attention mechanism to obtain a fused feature vector.
[0039] The control generation module is used to construct safety constraints using fused feature vectors as state inputs, generate optimized control strategies using the DQN algorithm, and send the optimized control strategies to the actuators.
[0040] A third aspect of the present invention provides a deep learning-based intelligent instrument control device, the deep learning-based intelligent instrument control device comprising a memory and at least one processor, the memory storing instructions; the at least one processor invokes the instructions in the memory to cause the deep learning-based intelligent instrument control device to perform the various steps of the deep learning-based intelligent instrument control method as described in any of the preceding claims.
[0041] A fourth aspect of the present invention provides a computer-readable storage medium storing instructions that, when executed by a processor, implement the steps of the deep learning-based intelligent instrument control method as described in any of the preceding claims.
[0042] The technical solution provided by this invention collects instrument images, environmental parameters, and operating status data in real time through cameras, sensor networks, and IoT devices to construct a multimodal dataset. The data in the multimodal dataset is normalized, outlier removed, and time-series aligned to obtain a standardized dataset. This standardized dataset is then input into a feature extraction model, where features are extracted using a visual feature extraction network, a time-series data processing network, and an environmental perception fusion network. Feature weights are assigned based on a self-attention mechanism to obtain a fused feature vector. Using this fused feature vector as state input, safety constraints are constructed, and an optimized control strategy is generated using the DQN algorithm. This optimized control strategy is then sent to the actuator. This invention combines image and sensor data to achieve comprehensive operating condition perception, improve feature extraction accuracy, reduce equipment failure rates, and achieve dynamic optimization without manual intervention, thereby improving the timeliness and efficiency of intelligent instrument control. Attached Figure Description
[0043] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention.
[0044] Figure 1 A flowchart illustrating a deep learning-based intelligent instrument control method provided in an embodiment of the present invention;
[0045] Figure 2 A schematic diagram of the structure of a deep learning-based intelligent instrument control system provided in an embodiment of the present invention;
[0046] Figure 3 This is a schematic diagram of the structure of a deep learning-based intelligent instrument control device provided in an embodiment of the present invention. Detailed Implementation
[0047] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” or “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, apparatus, product, or device that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.
[0048] For ease of understanding, the specific process of the embodiments of the present invention is described below. Please refer to [link / reference]. Figure 1 The flowchart of the intelligent instrument control method based on deep learning provided in this embodiment of the invention includes the following steps:
[0049] Step 101: Collect instrument images, environmental parameters, and operating status data in real time through cameras, sensor networks, and IoT devices to construct a multimodal dataset;
[0050] In this embodiment, a high-definition smart camera is fixed 1.5-2 meters in front of the instrument, and the lens focal length is adjusted so that the dial area occupies no less than 70% of the image. A complete image data including pointers, scales, and dial markings is acquired using a shooting frame rate of 30 frames per second. At the same time, the camera's autofocus and exposure compensation functions are enabled to adapt to changes in workshop lighting. The sensor network is deployed in a 1 master 4 slave architecture. The master sensor node is responsible for data aggregation, and the slave nodes are equipped with temperature, humidity, vibration, and voltage sensors, respectively, to collect environmental parameters and power supply status data around the instrument at a sampling frequency of 10Hz. Each sensor node has a built-in GPS module for time stamp synchronization. The IoT device is connected to the instrument's main control unit through an RS485 interface to read the instrument's operating current, energy consumption, actuator action feedback, and other operating status data in real time. The data is uploaded to the local edge computing node using the MQTT communication protocol. After receiving multi-source data from the camera and sensor network, the edge computing node performs association matching by the device's unique ID and millisecond-level timestamp, eliminates duplicate data transmissions, and finally constructs a multimodal dataset containing three types of heterogeneous data: image data, numerical environmental parameters, and operating status data.
[0051] Step 102: Normalize, remove outliers, and align time series data in the multimodal dataset to obtain a standardized dataset;
[0052] In this embodiment, the Min-Max normalization method is used to normalize the numerical data in the multimodal dataset; multiple isolated trees are constructed by randomly sampling samples from the normalized numerical data; the degree of abnormality is determined by calculating the path length of each data point in the isolated tree; data whose path length exceeds the preset abnormality judgment threshold are marked as abnormal values; and the median of all normal data in a sliding window with a length of 5 data points is used to replace the abnormal values.
[0053] When processing instrument images using the RGB color space standardization method, the original instrument images are first uniformly adjusted to a fixed pixel size of 224×224 to eliminate image size differences caused by different shooting angles and distances. Then, for each of the RGB color channels, the grayscale value range of all pixels in each channel is calculated. Using the Min-Max standardization principle, the RGB value of each pixel is divided by 255, mapping it to the value range of [0,1], thereby eliminating color distortion caused by differences in imaging from different devices. In the brightness adjustment stage, the lighting conditions are determined by calculating the average grayscale value of the image. If the average grayscale value is below 50, gamma correction is used to increase brightness; if it is above 200, a gamma value of 1.2 is used to decrease brightness. For contrast adjustment, an adaptive histogram equalization algorithm is used to evenly distribute the image grayscale histogram across the entire grayscale range, enhancing the distinction between the instrument pointer, dial background, and scale lines.
[0054] When aligning data based on the camera's capture timestamp, the precise timestamp of each instrument image from the smart camera is first extracted to establish a unified time axis coordinate. For data uploaded from sensor networks and IoT devices, the deviation between their timestamps and the reference time axis is compared one by one. If the deviation is within 100 milliseconds, the corresponding time node is directly matched. If the deviation exceeds the threshold or there is missing data, linear interpolation is used to complete it. During linear interpolation, two adjacent valid data points before and after the missing data are located, and the timestamps and corresponding values of these two data points are recorded. The slope of change is obtained by calculating the time interval and the difference between the two points. Then, based on the time difference between the missing time and the previous valid data point, the missing time value is derived by combining the slope. After completion, the instrument images, sensor environmental parameters, and IoT device operating status data at the same time node are associated and bound to form a standardized dataset with consistent time sequence and complete data.
[0055] Step 103: Input the standardized dataset into the feature extraction model, extract features through the visual feature extraction network, the time series data processing network and the environmental perception fusion network, and assign feature weights based on the self-attention mechanism to obtain the fused feature vector;
[0056] In this embodiment, the ResNet-50 model is used in the visual feature extraction network. By adding the CBAM attention mechanism, the position of the meter pointer and the scale are identified to obtain the visual feature vector. In the time series data processing network, the LSTM-Transformer hybrid model is used to predict future load changes to obtain the time series feature vector.
[0057] In the environmental perception fusion network, a three-layer fully connected network is used to fuse parameters and construct an environmental perception feature vector. The first fully connected network uses the ReLU activation function to adjust the data dimension, the second fully connected network uses the Dropout layer to prevent overfitting, and the third fully connected network outputs the results.
[0058] Visual feature vectors, temporal feature vectors, and environmental perception feature vectors are concatenated to form joint features. Query, Key, and Value matrices are mapped onto the joint features. The similarity matrix between the Query and Key is calculated and normalized by Softmax to obtain the weight matrix. The weight matrix is multiplied by the Value matrix to output the fused feature vector.
[0059] In this embodiment, a channel attention gating unit is added to each residual block of the ResNet-50 model, and the instrument images from the standardized dataset are input into the ResNet-50 model. After preliminary feature extraction through a 7×7 convolutional layer and a max pooling layer, a basic feature map is obtained. The basic feature map is then subjected to global average pooling through the residual block to obtain the channel feature vector. The channel feature vector is first passed through a fully connected layer and a sigmoid activation function to generate channel attention weights. Then, the channel attention weights are multiplied with the instrument image to enhance the region features. Finally, the visual feature vector is output through an average pooling layer and a fully connected layer.
[0060] In this embodiment, the time series data in the standardized dataset is divided into time windows according to time order and input into the LSTM-Transformer hybrid model; the forward LSTM and backward LSTM of the bidirectional LSTM layer capture the forward and backward change features of the time series data respectively, and output bidirectional time series features; the bidirectional time series features are input into the Transformer encoder containing 4 attention heads, and long-distance time series correlations are mined by calculating the self-attention weights between features, and then the time series feature vector is output after processing by the LayerNorm layer and the fully connected layer.
[0061] In this embodiment, in the environmental perception fusion network, processed environmental parameter data such as temperature, humidity, vibration, and air pressure are integrated into a unified input vector, which serves as the initial input to a three-layer fully connected network. After receiving the input vector, the first fully connected layer increases the data dimension from the original 6 dimensions to 128 dimensions using a preset weight matrix. Simultaneously, it embeds a ReLU activation function to perform a non-linear mapping on the output result after linear transformation, effectively activating the effective feature information in the data, discarding ineffective linear redundancy components, and avoiding the model falling into the limitations of linear expression. After passing through the first layer, the data enters the second fully connected layer. Based on the feature-preserving transformation from 128-dimensional data to 128-dimensional data, a Dropout layer is introduced with a dropout probability of 0.3. By randomly and temporarily shutting down some neurons during model training, excessive correlation between features is severed, preventing the model from overfitting to local features of the training data and improving the network's generalization ability to different environmental scenarios. The feature data processed by the Dropout layer is fed into the third fully connected network. This layer compresses the data dimension from 128-dimensional to 64-dimensional through compact weight parameters, achieving data dimensionality reduction while preserving core environmental features, and finally outputting an environmentally aware feature vector.
[0062] Step 104: Using the fused feature vector as the state input, construct safety constraints, use the DQN algorithm to generate the optimized control strategy, and send the optimized control strategy to the actuator.
[0063] In this embodiment, the fused feature vector is used as the state input of the DQN algorithm to establish a correspondence table between the state and the actual operating scenario of the instrument; a set of safety constraints including instrument operating range constraints, environmental safety threshold constraints and equipment life constraints is constructed; the DQN algorithm is executed to generate an optimized control strategy, the control strategy generated by the DQN algorithm is converted into an instruction format that can be recognized by the instrument actuator, and the instruction is sent to the actuator.
[0064] In this embodiment, the experience replay pool, target network, and evaluation network are initialized. An ε-greedy strategy is used to select actions. When the random number is greater than ε, the action with the largest Q value output by the evaluation network is selected; otherwise, an action is randomly selected. The action types include at least adjusting the instrument sampling frequency and the actuator action amplitude. The reward value is positive when the control action satisfies all safety constraints; otherwise, it is negative. Experience data is randomly extracted from the experience replay pool, and the target Q value is calculated through the target network. The evaluation network parameters are updated using the mean squared error loss function. Every 100 steps, the evaluation network parameters are copied to the target network. This process is repeated iteratively until the network converges, and the optimized control strategy is output.
[0065] Please see Figure 2 A schematic diagram of the structure of a deep learning-based intelligent instrument control system provided in this embodiment of the invention. The system includes:
[0066] The data acquisition module is used to collect instrument images, environmental parameters, and operating status data in real time through cameras, sensor networks, and IoT devices to build a multimodal dataset.
[0067] The data processing module is used to normalize, remove outliers, and align time series data in the multimodal dataset to obtain a standardized dataset.
[0068] The feature extraction module is used to input standardized datasets into the feature extraction model, extract features through a visual feature extraction network, a temporal data processing network, and an environmental perception fusion network, and assign feature weights based on a self-attention mechanism to obtain a fused feature vector.
[0069] The control generation module is used to construct safety constraints using fused feature vectors as state inputs, generate optimized control strategies using the DQN algorithm, and send the optimized control strategies to the actuators.
[0070] Figure 3 This is a schematic diagram of a deep learning-based intelligent instrument control device 300 provided in an embodiment of the present invention. The deep learning-based intelligent instrument control device 300 can vary significantly due to different configurations or performance characteristics. It may include one or more central processing units (CPUs) 310 (e.g., one or more processors) and a memory 320, and one or more storage media 330 (e.g., one or more mass storage devices) for storing application programs 333 or data 332. The memory 320 and storage media 330 can be temporary or persistent storage. The program stored in the storage media 330 may include one or more modules (not shown in the diagram), each module including a series of instruction operations on the deep learning-based intelligent instrument control device 300. Furthermore, the processor 310 may be configured to communicate with the storage media 330 and execute the series of instruction operations in the storage media 330 on the deep learning-based intelligent instrument control device 300 to implement the method provided in the above embodiment.
[0071] The deep learning-based intelligent instrument control device 300 may also include one or more power supplies 340, one or more wired or wireless network interfaces 350, one or more input / output interfaces 360, and / or one or more operating systems 331, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will understand that... Figure 3The structure of the deep learning-based intelligent instrument control device shown does not constitute a limitation on the computer device provided by the present invention. It may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0072] The present invention also provides a computer-readable storage medium, which can be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium, wherein the computer-readable storage medium stores instructions that, when the instructions are executed on a computer, cause the computer to perform the various steps of the deep learning-based intelligent instrument control method provided in the above embodiments.
[0073] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described equipment or apparatus / unit can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0074] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0075] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely preferred examples and are not intended to limit the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended technical solutions and their equivalents.
Claims
1. A deep learning-based intelligent meter control method, characterized by, The method includes the following steps: By using cameras, sensor networks, and IoT devices, instrument images, environmental parameters, and operating status data are collected in real time to build a multimodal dataset; The data in the multimodal dataset are normalized, outlier removed, and time series aligned to obtain a standardized dataset. The standardized dataset is input into the feature extraction model, and features are extracted through a visual feature extraction network, a temporal data processing network, and an environmental perception fusion network. Feature weights are assigned based on a self-attention mechanism to obtain a fused feature vector. Using the fused feature vector as the state input, safety constraints are constructed, and the DQN algorithm is used to generate an optimized control strategy, which is then sent to the actuator. The process involves inputting a standardized dataset into a feature extraction model, extracting features through a visual feature extraction network, a temporal data processing network, and an environmental perception fusion network, and assigning feature weights based on a self-attention mechanism to obtain a fused feature vector, including: In the visual feature extraction network, the ResNet-50 model is used, and the CBAM attention mechanism is added to identify the position of the meter pointer and the scale to obtain the visual feature vector; In the time series data processing network, the LSTM-Transformer hybrid model is used to predict future load changes and obtain time series feature vectors. In the environmental perception fusion network, a three-layer fully connected network is used to fuse parameters and construct an environmental perception feature vector. The first fully connected network uses the ReLU activation function to adjust the data dimension, the second fully connected network uses the Dropout layer to prevent overfitting, and the third fully connected network outputs the results. Visual feature vectors, temporal feature vectors, and environmental perception feature vectors are concatenated to form joint features. Query, Key, and Value matrices are mapped onto the joint features. The similarity matrix between Query and Key is calculated and normalized by Softmax to obtain the weight matrix. The weight matrix is multiplied by the Value matrix to output the fused feature vector. The process of constructing safety constraints using fused feature vectors as state input, generating an optimized control strategy using the DQN algorithm, and sending the optimized control strategy to the actuator includes: The fused feature vectors are used as the state inputs of the DQN algorithm to establish a correspondence table between the state and the actual operating scenario of the instrument. Construct a set of safety constraints that include instrument operating range constraints, environmental safety threshold constraints, and equipment lifespan constraints; The DQN algorithm is executed to generate an optimized control strategy. The control strategy generated by the DQN algorithm is converted into an instruction format that can be recognized by the instrument actuator, and the instruction is sent to the actuator. The process of generating an optimized control strategy using the DQN algorithm includes: Initialize the experience playback pool, target network, and evaluation network. Use an ε-greedy strategy to select actions. When the random number is greater than ε, select the action with the largest Q value output by the evaluation network. Otherwise, select an action randomly. The action types include at least adjusting the instrument sampling frequency and the actuator action amplitude. The reward value is positive when the control action satisfies all safety constraints, and negative otherwise. Experience data is randomly extracted from the experience replay pool, the target Q value is calculated through the target network, the mean square error loss function is used to update the evaluation network parameters, and the evaluation network parameters are copied to the target network every 100 steps. Repeat the iteration until the network converges, and output the optimized control strategy. 2.The deep learning based smart meter control method of claim 1, wherein, The process of normalizing, removing outliers, and aligning time series data in the multimodal dataset to obtain a standardized dataset includes: The Min-Max normalization method was used to normalize the numerical data in the multimodal dataset; Multiple isolated trees are constructed by randomly sampling samples from normalized numerical data. The degree of anomaly is determined by calculating the path length of each data point in the isolated tree. Data with a path length exceeding the preset anomaly threshold is marked as anomalies. The median of all normal data in a sliding window with a length of 5 data points is used to replace the anomalies. The instrument images in the multimodal dataset were standardized using the RGB color space standardization method, and brightness and contrast were adjusted. Based on the timestamps of camera captures, linear interpolation is used to fill in the missing timestamps in the timestamp discrepancies uploaded by sensor networks and IoT devices, forming a standardized dataset.
3. The intelligent instrument control method based on deep learning as described in claim 1, characterized in that, In the visual feature extraction network, the ResNet-50 model is used, and the CBAM attention mechanism is added to identify the position of the meter pointer and the scale, obtaining a visual feature vector, including: Add channel attention gating units to each residual block of the ResNet-50 model, and input instrument images from the standardized dataset into the ResNet-50 model; Preliminary feature extraction is performed using a 7×7 convolutional layer and a max pooling layer to obtain a basic feature map. The basic feature map is then subjected to global average pooling using residual blocks to obtain channel feature vectors. First, the channel feature vectors are processed through a fully connected layer and a Sigmoid activation function to generate channel attention weights. Then, the channel attention weights are multiplied with the instrument image to enhance the regional features. Finally, the visual feature vectors are output through an average pooling layer and a fully connected layer.
4. The intelligent instrument control method based on deep learning as described in claim 1, characterized in that, In the time-series data processing network, an LSTM-Transformer hybrid model is used to predict future load changes, resulting in a time-series feature vector, including: The time series data in the standardized dataset is divided into time windows according to time order and then input into the LSTM-Transformer hybrid model; The LSTM-Transformer hybrid model uses bidirectional LSTM layers to capture the forward and backward LSTM features of time series data, respectively, and outputs bidirectional time series features. The bidirectional temporal features are input into a Transformer encoder containing four attention heads. Long-distance temporal correlations are mined by calculating the self-attention weights between features. After processing through LayerNorm and fully connected layers, the temporal feature vector is output.
5. A deep learning-based intelligent instrument control system, characterized in that, The system includes: The data acquisition module is used to collect instrument images, environmental parameters, and operating status data in real time through cameras, sensor networks, and IoT devices to build a multimodal dataset. The data processing module is used to normalize, remove outliers, and align time series data in the multimodal dataset to obtain a standardized dataset. The feature extraction module is used to input standardized datasets into the feature extraction model. Features are extracted through a visual feature extraction network, a time-series data processing network, and an environmental perception fusion network. Feature weights are assigned based on a self-attention mechanism to obtain a fused feature vector. In the visual feature extraction network, a ResNet-50 model is used, and a CBAM attention mechanism is added to identify the position and scale of the meter pointer, resulting in a visual feature vector. In the time-series data processing network, an LSTM-Transformer hybrid model is used to predict future load changes, resulting in a time-series feature vector. In the environmental perception fusion network, a three-layer fully connected network is used for parameter fusion to construct an environmental perception feature vector. The first fully connected layer uses the ReLU activation function to adjust the data dimension, the second fully connected layer uses a Dropout layer to prevent overfitting, and the third fully connected layer outputs the result. The visual feature vector, time-series feature vector, and environmental perception feature vector are concatenated to form a joint feature. Query, Key, and Value matrices are mapped onto the joint feature. The similarity matrix between the Query and Key is calculated and normalized using Softmax to obtain the weight matrix. The weight matrix is multiplied by the Value matrix to output the fused feature vector. The control generation module is used to construct safety constraints using fused feature vectors as state input, generate an optimized control strategy using the DQN algorithm, and send the optimized control strategy to the actuator. It establishes a correspondence table between the fused feature vectors as state input to the DQN algorithm and the actual operating scenario of the instrument; constructs a set of safety constraints including instrument operating range constraints, environmental safety threshold constraints, and equipment lifespan constraints; executes the DQN algorithm to generate the optimized control strategy; converts the DQN-generated control strategy into an instruction format recognizable by the instrument actuator; and sends the instructions to the actuator. The initial experience replay pool, target network, and evaluation network are initialized. An ε-greedy strategy is used to select actions. When the random number is greater than ε, the action with the largest Q value output by the evaluation network is selected; otherwise, an action is randomly selected. The action types include at least adjusting the instrument sampling frequency and the actuator amplitude. The reward value is positive when the control action satisfies all safety constraints; otherwise, it is negative. Experience data is randomly extracted from the experience replay pool, and the target Q value is calculated through the target network. The mean squared error loss function is used to update the parameters of the evaluation network. Every 100 steps, the parameters of the evaluation network are copied to the target network. This process is repeated iteratively until the network converges, and the optimized control strategy is output.
6. A deep learning-based intelligent instrument control device, characterized in that, The deep learning-based intelligent instrument control device includes a memory and at least one processor, wherein the memory stores instructions; the at least one processor invokes the instructions in the memory to cause the deep learning-based intelligent instrument control device to perform the various steps of the deep learning-based intelligent instrument control method as described in any one of claims 1-4.
7. A computer-readable storage medium storing instructions thereon, characterized in that, When the instructions are executed by the processor, they implement the various steps of the deep learning-based intelligent instrument control method as described in any one of claims 1-4.