A waste battery surface damage detection method based on a multi-modal fusion strategy network
By using a multimodal fusion strategy network that combines RGB images, infrared thermal imaging, and 3D depth data, the problem of low efficiency and high false positive rate in detecting surface damage in waste batteries is solved, achieving high-precision, robust, and automated detection with adaptive optimization for different environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI SECOND POLYTECHNIC UNIVERSITY
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for detecting surface damage in waste batteries rely on manual visual inspection, which is inefficient and has a high error rate. Traditional image processing methods are not robust to complex backgrounds and multiple types of damage, and single-modal detection is difficult to capture the characteristics of multiple types of damage.
A multimodal fusion strategy network is adopted, which combines RGB images, infrared thermal imaging and 3D depth data. Through time alignment, image preprocessing, feature extraction and multimodal data fusion, features are extracted using EfficientNet-B4, TIE-CNN and 3D-CNN models, and multimodal feature fusion is achieved through attention mechanism and channel and spatial attention weight adjustment.
It improves the accuracy and robustness of damage detection, reduces detection time, enhances detection capabilities under complex lighting conditions, enables automated detection, has strong adaptability, and achieves a detection accuracy of 89.7% under extreme conditions, which is far higher than traditional methods.
Smart Images

Figure CN122244612A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision and industrial inspection technology, and more specifically, to an intelligent detection method for surface damage of waste batteries based on deep learning and multimodal data fusion. Background Technology
[0002] With the widespread application of new energy vehicles, the recycling and disposal of used batteries has become an urgent problem to be solved. Detection of surface damage in used batteries is crucial for assessing their safety and reusability. However, existing detection technologies have many shortcomings, mainly in the following aspects:
[0003] 1. Surface damage detection of hazardous waste batteries relies on manual visual inspection, which is inefficient and highly subjective. It is easily affected by factors such as the experience and fatigue of the inspectors, resulting in a high misjudgment rate.
[0004] 2. Traditional image processing methods (such as edge detection and threshold segmentation) are not robust to complex backgrounds, minor damage or changes in lighting, and are difficult to accurately identify and locate various types of damage on the surface of waste batteries, such as cracks, dents, leakage, and corrosion.
[0005] 3. Most existing detection systems rely solely on RGB images for damage detection; however, a single modality (RGB images only) is insufficient to capture multiple types of damage features (such as leakage, which requires infrared imaging assistance).
[0006] The preceding description is intended to provide general background information and does not necessarily constitute prior art. Summary of the Invention
[0007] The purpose of this invention is to provide a method for detecting surface damage of waste batteries based on a multimodal fusion strategy network. This method is used to automatically identify surface damage of waste batteries and solves the technical problems of low efficiency and high misjudgment rate of traditional manual detection.
[0008] This invention provides a method for detecting surface damage in waste batteries based on a multimodal fusion strategy network, comprising the following steps:
[0009] S1: RGB image acquisition, infrared thermal imaging, 3D depth data acquisition;
[0010] S2: Employs a precise time protocol to achieve time alignment of RGB images, infrared thermal imaging, and 3D depth data;
[0011] S3: Image data preprocessing;
[0012] S4: Image feature extraction;
[0013] S5: Multimodal data fusion.
[0014] Furthermore, in step S1, RGB image acquisition includes acquiring color images of the battery surface using a high-resolution RGB industrial camera, infrared thermal imaging includes detecting thermal anomalies on the battery surface using an infrared thermal imager, and 3D depth data acquisition includes acquiring the three-dimensional morphology of the battery surface using a 3D structured light scanner.
[0015] Further, step S3 includes:
[0016] S31: Handle missing values by imputing forward or backward using the mean, median, or model prediction;
[0017] S32: Use the statistical method Z-score to identify and handle outliers;
[0018] S33: Delete duplicate records;
[0019] S34: Perform data normalization and standardization to transform the data to the same scale.
[0020] Furthermore, the normalization in step S34 includes scaling the data to between 0 and 1, as shown in the formula:
[0021] ;
[0022] The standardization involves transforming the data into a distribution with a mean of 0 and a standard deviation of 1, using the following formula:
[0023] .
[0024] Further, step S4 includes:
[0025] S41: RGB image feature extraction is performed using the EfficientNet-B4 model;
[0026] S42: Infrared thermal imaging feature extraction;
[0027] S43: 3D-CNN extracts geometric features from depth images using 3D convolutional neural networks.
[0028] Further, step S41 includes:
[0029] S411: Uses multiple convolution kernels to perform convolution operations on RGB images to extract low-level features of edges and textures in the image;
[0030] S412: Use the modified linear unit activation function;
[0031] S413: Use max pooling or average pooling operations to reduce the spatial dimension of the feature map;
[0032] S414: EfficientNet-B4 uses depthwise separable convolution to break down standard convolution into two smaller operations;
[0033] S415: Through multiple convolution and pooling operations, low-level features are gradually fused into high-level features, ultimately forming a feature vector.
[0034] Further, step S42 includes:
[0035] S421: Use the convolutional neural network TIE-CNN to extract thermal features from infrared images;
[0036] S422: Extract local invariant features from infrared images;
[0037] S423: Use feature descriptors to quantify the extracted local features and form feature vectors.
[0038] Further, step S5 includes:
[0039] S51: Fuse feature vectors from different modalities at the feature layer;
[0040] S52: Dynamically adjust the weights of different modal features through an attention mechanism.
[0041] Further, step S51 includes:
[0042] Step S51 includes:
[0043] S511: Weighted summation of the feature vectors of different modalities, using the following formula:
[0044] ;
[0045] in, It is the fused feature vector. These are RGB, infrared, and depth feature vectors, respectively, with α, β, and γ being weighting coefficients.
[0046] S512: Perform tensor product operation on the feature vectors of different modalities to generate high-dimensional feature vectors, as shown in the formula:
[0047] ;
[0048] Here, ⊗ represents the tensor product operation.
[0049] Further, step S52 includes:
[0050] S521: Calculate the correlation between different modal features and generate channel attention weights, using the following formula:
[0051] ;
[0052] in, It is a weight matrix. It is a concatenated vector of features from different modalities. It is the channel attention weight;
[0053] S522: Use channel attention weights to weight features from different modalities, using the following formula:
[0054] ;
[0055] S523: Generate a spatial attention heatmap to locate damage-sensitive areas. The formula is:
[0056] ;
[0057] in, These are spatial attention maps for RGB, infrared, and depth features, respectively, and As(x,y) is the fused spatial attention heatmap.
[0058] S524: Use spatial attention heatmaps to weight feature maps.
[0059] This invention provides a method for detecting surface damage in waste batteries based on a multimodal fusion strategy network. By fusing RGB images, infrared thermal imaging, and 3D depth data, it can comprehensively capture multiple features of battery surface damage. Compared with single-modal detection methods, multimodal fusion significantly improves the accuracy and robustness of damage detection. Under complex lighting conditions, infrared thermal imaging can effectively supplement the deficiencies of RGB images, improving the detection capability for heat-related damage such as leakage. Employing deep learning models and strategy network optimization enables rapid damage detection and decision-making. Compared with traditional manual detection methods, automated detection significantly improves detection efficiency and reduces detection time. By optimizing detection parameters, robustness under extreme lighting conditions reaches 89.7%, far exceeding the 61.3% of traditional methods. The strategy network of this invention can dynamically adjust detection parameters according to different detection environments. Compared with fixed-threshold detection methods, this adaptability allows the system to maintain high accuracy under different lighting, temperatures, and movement speeds. Through reinforcement learning algorithms, the system can automatically optimize classification thresholds and non-maximum suppression overlap rates, further improving detection performance. Attached Figure Description
[0060] Figure 1 This is a flowchart illustrating the surface damage detection method for waste batteries based on a multimodal fusion strategy network provided in an embodiment of the present invention. Detailed Implementation
[0061] The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and examples. The following examples are for illustrative purposes only and are not intended to limit the scope of the invention.
[0062] The terms "first," "second," "third," "fourth," etc., used in the specification and claims of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0063] Example 1
[0064] Figure 1 This is a flowchart illustrating the surface damage detection method for waste batteries based on a multimodal fusion strategy network provided in an embodiment of the present invention. Please refer to... Figure 1 The present invention provides a method for detecting surface damage of used batteries based on a multimodal fusion strategy network, comprising the following steps:
[0065] S1: RGB image acquisition, infrared thermal imaging, 3D depth data acquisition;
[0066] Specifically, in step S1 of the present invention, RGB image acquisition includes acquiring color images of the battery surface using a high-resolution RGB industrial camera. These images can provide information on the texture, color, and shape of the battery surface and are suitable for detecting visible damage such as cracks and dents.
[0067] Infrared thermal imaging involves using an infrared thermal imager to detect thermal anomalies on the battery surface. Infrared imaging can capture the temperature distribution on the battery surface and is particularly effective for detecting heat-related damage such as leakage.
[0068] 3D depth data acquisition involves obtaining the three-dimensional morphology of the battery surface using a 3D structured light scanner. This data provides the geometric features of the battery surface, which is crucial for detecting geometric deformations such as dents.
[0069] S2: To ensure the temporal consistency of multimodal data, a Precision Time Protocol (PTP) is used to align the time of RGB images, infrared thermal imaging, and 3D depth data. This synchronization mechanism can control the time error of data acquisition to the microsecond level, ensuring the synchronization of different modal data.
[0070] S3: Image data preprocessing;
[0071] It should be noted that the collected multimodal data undergoes correction, including illumination correction, temperature correction, and geometric correction. These correction steps can eliminate the influence of environmental factors on the data, improving data quality and usability. Specifically, they include:
[0072] S31: Handle missing values by imputing forward or backward using the mean, median, or model prediction; where the appropriate missing value handling method is selected based on the characteristics of the data;
[0073] S32: Use the statistical method Z-score to identify and handle outliers;
[0074] S33: Remove duplicate records to ensure that there are no duplicate observations in the dataset in order to avoid model bias;
[0075] S34: Perform data normalization and standardization to transform the data to the same scale.
[0076] Specifically, in step S34 of the present invention, normalization includes scaling the data to between 0 and 1, as shown in the formula:
[0077] ;
[0078] The standardization involves transforming the data into a distribution with a mean of 0 and a standard deviation of 1, using the following formula:
[0079] .
[0080] S4: Image feature extraction;
[0081] Specifically, step S4 of the present invention includes:
[0082] S41: RGB image feature extraction is performed using the EfficientNet-B4 model;
[0083] It should be noted that EfficientNet-B4 is a high-efficiency convolutional neural network architecture that improves model performance through a compound scaling method (simultaneously expanding depth, width, and resolution); its features are as follows:
[0084] S411: Convolutional layer: Uses multiple convolutional kernels to perform convolution operations on the RGB image to extract low-level features of edges and textures in the image;
[0085] S412: Activation function: Use the Rectified Linear Unit (ReLU) activation function to increase the non-linearity of the model, enabling the network to learn more complex feature representations;
[0086] S413: Pooling layer: Employs max pooling or average pooling operations to reduce the spatial dimension of the feature map, reduce computational cost, and retain important features.
[0087] S414: Depthwise Separable Convolution: EfficientNet-B4 uses depthwise separable convolution, which breaks down standard convolution into two smaller operations;
[0088] S415: Feature Fusion: Through multi-layer convolution and pooling operations, low-level features are gradually fused into high-level features, ultimately forming a feature vector for subsequent damage detection.
[0089] S42: Infrared thermal imaging feature extraction;
[0090] It should be noted that infrared thermal imaging feature extraction mainly focuses on temperature distribution and thermal anomaly areas; since infrared images typically have low resolution and poor contrast, specific processing methods are required; the following are the steps for extracting features from infrared images:
[0091] S421: Feature Extraction: Thermal features in infrared images are extracted using a convolutional neural network TIE-CNN. TIE-CNN effectively enhances the contrast and details of infrared images through brightness domain and residual learning techniques.
[0092] S422: Local Invariant Features: Extracts local invariant features from infrared images, such as corner, edge, and texture features. These features are highly robust to imaging distortion and noise.
[0093] S423: Feature descriptors: Use feature descriptors (SIFT, SURF) to quantify and describe the extracted local features, forming feature vectors.
[0094] S43: 3D-CNN uses a 3D convolutional neural network to extract geometric features from depth images. 3D-CNN can process three-dimensional data and extract spatial structure information from depth images.
[0095] It should be noted that 3D depth data provides geometric information about the battery surface, which is crucial for detecting geometric deformations such as dents.
[0096] S5: Multimodal data fusion;
[0097] It should be noted that multimodal data fusion is a key step in this detection method. Its purpose is to integrate features extracted from RGB images, infrared thermal imaging, and 3D depth data to improve the accuracy and robustness of damage detection. The specific steps of multimodal data fusion are as follows:
[0098] S51: Fuse feature vectors from different modalities at the feature layer;
[0099] Specifically, step S51 of the present invention includes:
[0100] S511: Weighted summation of the feature vectors of different modalities, using the following formula:
[0101] ;
[0102] in, It is the fused feature vector. These are RGB, infrared, and depth feature vectors, respectively, with α, β, and γ being weighting coefficients that are dynamically adjusted based on the correlation between damage type and modal features.
[0103] S512: Perform tensor product operation on the feature vectors of different modalities to generate high-dimensional feature vectors, as shown in the formula:
[0104] ;
[0105] Where ⊗ represents the tensor product operation, and the generated high-dimensional feature vector can capture the interaction information between different modalities;
[0106] S52: Dynamically adjust the weights of different modal features through an attention mechanism, enabling the model to focus more on important modal information; the specific steps are as follows:
[0107] S521: Calculate the correlation between different modal features and generate channel attention weights, using the following formula:
[0108] ;
[0109] in, It is a weight matrix. It is a concatenated vector of features from different modalities. It is the channel attention weight;
[0110] S522: Use channel attention weights to weight features from different modalities, using the following formula:
[0111] ;
[0112] S523: Generate a spatial attention heatmap to locate damage-sensitive areas. The formula is:
[0113] ;
[0114] in, These are spatial attention maps for RGB, infrared, and depth features, respectively, and As(x,y) is the fused spatial attention heatmap.
[0115] S524: Use spatial attention heatmaps to weight feature maps, enhancing the ability to detect damaged areas;
[0116] It should be noted that, through feature-level fusion and attention mechanisms, this invention can effectively integrate features extracted from RGB images, infrared thermal imaging, and 3D depth data, improving the accuracy and robustness of surface damage detection in waste batteries. These methods provide a solid foundation for subsequent deep learning model training and damage detection.
[0117] As can be seen from the above description, the advantages of this invention are:
[0118] 1. High precision and robustness: This invention, by fusing RGB images, infrared thermal imaging, and 3D depth data, can comprehensively capture multiple features of battery surface damage. Compared with single-modal detection methods, multi-modal fusion significantly improves the accuracy and robustness of damage detection; under complex lighting conditions, infrared thermal imaging can effectively supplement the deficiencies of RGB images, improving the detection capability for heat-related damage such as leakage.
[0119] 2. Real-time performance and efficiency: By employing deep learning models and policy network optimization, this invention enables rapid damage detection and decision-making. Compared with traditional manual detection methods, automated detection significantly improves detection efficiency and reduces detection time. Through optimization of detection parameters, this invention achieves robustness of 89.7% under extreme lighting conditions, far exceeding the 61.3% of traditional methods.
[0120] 3. Adaptability and self-adaptability: The strategy network of this invention can dynamically adjust the detection parameters according to different detection environments; compared with the detection method with a fixed threshold, this adaptability enables the system to maintain high accuracy under different lighting, temperature and motion speeds; through reinforcement learning algorithms, the system can automatically optimize the classification threshold and non-maximum suppression overlap rate, further improving detection performance.
[0121] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for detecting surface damage of waste batteries based on a multi-modal fusion strategy network, characterized in that, Includes the following steps: S1: RGB image acquisition, infrared thermal imaging, 3D depth data acquisition; S2: Employs a precise time protocol to achieve time alignment of RGB images, infrared thermal imaging, and 3D depth data; S3: Image data preprocessing; S4: Image feature extraction; S5: Multimodal data fusion.
2. The method for waste battery surface damage detection based on the multi-modal fusion strategy network according to claim 1, characterized in that, In step S1, RGB image acquisition includes acquiring color images of the battery surface using a high-resolution RGB industrial camera; infrared thermal imaging includes detecting thermal anomalies on the battery surface using an infrared thermal imager; and 3D depth data acquisition includes obtaining the three-dimensional morphology of the battery surface using a 3D structured light scanner. 3.The waste battery surface damage detection method based on the multi-modal fusion strategy network of claim 1, wherein, Step S3 includes: S31: Handle missing values by imputing forward or backward using the mean, median, or model prediction; S32: Use the statistical method Z-score to identify and handle outliers; S33: Delete duplicate records; S34: Perform data normalization and standardization to transform the data to the same scale.
4. The method of claim 3, wherein the method is characterized by, In step S34, normalization includes scaling the data to a range between 0 and 1, as shown in the formula: ; The standardization involves transforming the data into a distribution with a mean of 0 and a standard deviation of 1, using the following formula: 。 5. The method for waste battery surface damage detection based on the multi-modal fusion strategy network according to claim 1, characterized in that, Step S4 includes: S41: RGB image feature extraction is performed using the EfficientNet-B4 model; S42: Infrared thermal imaging feature extraction; S43: 3D-CNN extracts geometric features from depth images using 3D convolutional neural networks.
6. The method of claim 5, wherein the method is based on a multi-modal fusion strategy network. Step S41 includes: S411: Uses multiple convolution kernels to perform convolution operations on RGB images to extract low-level features of edges and textures in the image; S412: Use the modified linear unit activation function; S413: Use max pooling or average pooling operations to reduce the spatial dimension of the feature map; S414: EfficientNet-B4 uses depthwise separable convolution to break down standard convolution into two smaller operations; S415: Through multiple convolution and pooling operations, low-level features are gradually fused into high-level features, ultimately forming a feature vector.
7. The method of claim 5, wherein the method is characterized by, Step S42 includes: S421: Use the convolutional neural network TIE-CNN to extract thermal features from infrared images; S422: Extract local invariant features from infrared images; S423: Use feature descriptors to quantify the extracted local features and form feature vectors. 8.The waste battery surface damage detection method based on the multi-modal fusion strategy network of claim 1, wherein, Step S5 includes: S51: Fuse feature vectors from different modalities at the feature layer; S52: Dynamically adjust the weights of different modal features through an attention mechanism.
9. The method of claim 8, wherein the method is based on a multi-modal fusion strategy network. Step S51 includes: S511: Weighted summation of the feature vectors of different modalities, using the following formula: ; wherein, is the fused feature vector, are the RGB, infrared and depth feature vectors, respectively, and a, β and γ are weight coefficients. S512: Perform tensor product operation on the feature vectors of different modalities to generate high-dimensional feature vectors, as shown in the formula: ; Here, ⊗ represents the tensor product operation.
10. The method of claim 8, wherein the method is based on a multi-modal fusion strategy network. Step S52 includes: S521: Calculate the correlation between different modal features and generate channel attention weights, using the following formula: ; wherein, is a weight matrix, is a concatenation vector of different modal features, is a channel attention weight; S522: Use channel attention weights to weight features from different modalities, using the following formula: ; S523: Generate a spatial attention heatmap to locate damage-sensitive areas. The formula is: ; wherein, are spatial attention maps for RGB, infrared, and depth features, respectively, and As(x, y) is the fused spatial attention heat map. S524: Use spatial attention heatmaps to weight feature maps.