A battery shell surface defect detection method based on frequency domain enhancement and dynamic region attention

By employing a detection method combining frequency domain enhancement and dynamic region attention, along with a dynamic region segmentation and phase-shift convolution module and a learnable residual high-frequency perceptron, the problem of difficulty in identifying minute defects in battery casing surface defect detection is solved, achieving high-precision and robust detection results.

CN122244020APending Publication Date: 2026-06-19SOUTHWEAT UNIV OF SCI & TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTHWEAT UNIV OF SCI & TECH
Filing Date
2026-05-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to capture minute defect features in battery casing surface defect detection. Furthermore, the models are prone to false positives or false negatives due to factors such as changes in lighting, scaling, and mechanical vibration. Moreover, the models have weak generalization ability, making it difficult to transfer them to the detection of batteries on other production lines or models.

Method used

A detection method based on frequency domain enhancement and dynamic region attention is adopted. The robustness of the model to spatial transformation and scale change is enhanced by dynamic region segmentation and phase-shifted convolution module (DRP-PConv-FA). Furthermore, a learnable residual high-frequency perceptron (LR-HFP) module is embedded in the feature encoding network to enhance the ability to represent small defect features.

Benefits of technology

It significantly improves the model's ability to extract features and resist interference from minute defects on the battery casing surface, providing a high-precision and robust detection solution that can accurately identify and locate minute defects in complex backgrounds.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244020A_ABST
    Figure CN122244020A_ABST
Patent Text Reader

Abstract

This invention discloses a method for detecting surface defects in battery casings based on frequency domain enhancement and dynamic region attention, belonging to the technical field of battery defect detection. The method includes: acquiring an image of the battery casing surface; constructing a defect detection model; inputting the battery casing surface image into a backbone network to output initial feature maps at multiple scales; inputting the initial feature maps into the corresponding encoding branches of the feature encoder to output enhanced feature maps; and inputting the enhanced feature maps into a detection head to output the battery casing surface defect classification result and bounding box coordinates. This invention, by combining frequency domain analysis and a dynamic attention mechanism, effectively enhances the model's feature extraction capability and anti-interference capability for minute defects on the battery casing surface, providing a high-precision and robust solution for industrial quality inspection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of battery defect detection, specifically relating to a method for detecting surface defects in battery casings based on frequency domain enhancement and dynamic region attention. Background Technology

[0002] As a core component in key fields such as new energy vehicles and energy storage systems, the quality of the surface of industrial batteries directly affects the safety, reliability, and lifespan of the products. Surface defects such as scratches, dents, stains, and uneven plating not only affect aesthetics but can also lead to serious safety hazards such as internal short circuits and leakage. Therefore, achieving rapid and accurate detection of battery casing surface defects on automated production lines has crucial industrial value.

[0003] Currently, deep learning-based visual inspection technology has become the mainstream method for surface defect detection. However, battery casing defect detection still faces many severe challenges. The first challenge is that defects are small in size, varied in shape, and have low contrast with complex background textures, making it difficult for traditional convolutional neural networks to capture their subtle features. The second challenge is that factors such as changes in lighting, scale scaling, and mechanical vibration in the production environment can easily lead to false positives or false negatives. The third challenge is that models trained in specific scenarios are difficult to directly transfer to the inspection of other production lines or different battery models. Summary of the Invention

[0004] The purpose of this invention is to address the above-mentioned shortcomings in the prior art by providing a battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention, so as to solve the problems of small defects, complex backgrounds, and weak model generalization ability in the existing industrial battery casing surface defect detection.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows: A method for detecting surface defects in battery casings based on frequency domain enhancement and dynamic region attention includes the following steps: S1. Obtain an image of the battery casing surface; S2. Construct a defect detection model, which includes a backbone network, a feature encoder, and a detection head; S3. Input the battery casing surface image into the backbone network, and output initial feature maps at multiple scales through multiple cascaded mixing stages and dynamic region segmentation and phase-shifting convolution modules. , and ; S4. Transfer the initial feature map , and The inputs are respectively fed into the encoding branches corresponding to the feature encoder, and through the learnable residual high-frequency perceptron module, an enhanced feature map is output. , and ; S5. The enhanced feature map , and Input the detection head, and output the classification results of surface defects on the battery casing and the coordinates of the bounding box.

[0006] Furthermore, in S3, the dynamic region segmentation and phase-shift convolution module is embedded between adjacent mixing stages. It includes a phase-shift convolution module and a region attention module. The phase-shift convolution module performs phase-shift convolution processing on the output features of the corresponding mixing generation stage, and the region attention module performs dynamic region segmentation and region feature aggregation processing on the features processed by the phase-shift convolution module.

[0007] Furthermore, the phase-shifted convolution module uses four-way asymmetric padding convolution to extract input features in parallel. The padding parameters of the four convolutions are P(1,0,0,3), P(0,3,0,1), P(0,1,3,0), and P(3,0,1,0), respectively, and the convolution kernel sizes are 1×3, 3×1, 1×3, and 3×1. The output feature maps of the four convolutions are concatenated by channels, and then normalized in dimension by a 2×2 unpadded convolution to obtain the feature map after phase-shift convolution processing: In the formula, The phase-shifted convolutional module outputs a feature map. This is a preset value for the height of the final output feature map of the phase-shifted convolution module. This is a preset value for the width of the final output feature map of the phase-shifted convolution module. The number of channels in the final output feature map of the phase-shifted convolution module; For activation function, For batch normalization, The output is the concatenated result of four interleaved convolutions. The height of the output feature map after the first layer of interleaved convolution. This represents the width of the output feature map after the first layer of interleaved convolutions. This represents the number of channels in the output feature map after the first layer of interleaved convolution. The kernel is 2×2; This is a convolution operation; The number of parameters in the phase-shifted convolution module is expressed as follows: In the formula, This represents the number of parameters in the phase-shifted convolution module. The number of channels of the input tensor for the phase-shift convolution module.

[0008] Furthermore, the region attention module performs dynamic region segmentation and region feature aggregation on the features processed by the phase-shift convolution module, specifically including: Flatten the features processed by the phase-shift convolution module; Calculate the similarity between the flattened features and the expanded region center parameter matrix to obtain the similarity matrix; The operation of taking the maximum value index of the similarity matrix in the region dimension assigns a unique region index to each pixel and generates a region index matrix; Based on the region mask, the pixel features in each region are aggregated by weighted average to obtain region-level features; The region-level features are updated using a multi-head attention mechanism, and combined with the region index matrix, the updated region-level features are mapped back to the pixel level through index mapping to obtain the output features of the dynamic region segmentation and phase-shift convolution module.

[0009] Furthermore, based on the region mask, the pixel features within each region are weighted and averaged to obtain the region-level features, which are represented as follows: In the formula, for Regional-level characteristics, This represents the total number of spatial pixels. For pixel index, For region mask weights, For regional indexes, For sample index, For channel indexing, It is a very small constant. for In the nth sample, the nth The pixel in the first Pixel feature values ​​on a channel.

[0010] Furthermore, by combining the region index matrix, the updated region-level features are mapped back to the pixel level through index mapping, thus updating the region representation. Based on regional allocation A Propagate back to each pixel: In the formula, To output the feature tensor, Representation from the region Middle index batch b and region The corresponding feature vector; The output tensor G belongs to a three-dimensional tensor space over the real number field, with dimensions B×N×C. For batch size, The number of feature channels, It is the total number of spatial pixels; Finally, by applying the output linear transformation, the module output characteristics are obtained: Rearrange the output back to a spatial format to obtain: In the formula, The features are those processed by the linear layer. Represents a linear transformation operation; This is the output of the rearranged feature map format. This indicates a rearrangement back into space operation. H For height, W For width.

[0011] Furthermore, in S4, each of the coding branches includes a channel projection layer, a learnable residual high-frequency perceptron module, a Transformer encoder, and a feature pyramid and path aggregation network configured sequentially. The initial feature map is sequentially processed by channel projection, feature enhancement and fusion, global context modeling, and bidirectional fusion of multi-scale features to obtain the enhanced feature map.

[0012] Furthermore, the learnable residual high-frequency perceptron module performs feature enhancement and fusion processing, specifically including: The first stage: frequency domain filtering and feature extraction; A two-dimensional discrete cosine transform is performed on the initial feature map after channel projection to obtain the frequency domain spectrum; an ideal rectangular high-pass filter mask is used to perform high-pass filtering on the frequency domain spectrum to obtain the high-frequency spectrum; then an inverse two-dimensional discrete cosine transform is performed on the high-frequency spectrum to convert it back to the spatial domain to obtain the high-frequency response feature map. Second stage: dual-path attention weight generation; Based on the high-frequency response feature map, two complementary attention weights are generated through two parallel paths, including channel attention weights and spatial attention weights. The third stage: adaptive feature fusion and enhancement; The channel attention weights and spatial attention weights are multiplied element-wise with the input of the learnable residual high-frequency perceptron module to obtain the channel enhancement features and spatial enhancement features. The sum of the two features is then smoothed by a 3×3 convolution to obtain the residual features. Finally, a learnable scalar parameter β is introduced to weight and fuse the residual features with the input of the learnable residual high-frequency perceptron module to obtain the enhanced feature map.

[0013] Furthermore, in the second stage, the channel attention weights are expressed as follows: In the formula, For channel attention weights, Refers to the ReLU function. It is a multilayer perceptron. For global average pooling, This is a high-frequency response feature map. This is for global max pooling; Spatial attention weights are represented as follows: In the formula, Spatial attention weights, It is a 1×1 convolutional layer.

[0014] Furthermore, in the third stage, a learnable scalar parameter β is introduced to weight and fuse the residual features with the input of the learnable residual high-frequency perceptron module, resulting in an enhanced feature map, which is represented as follows: In the formula, To enhance the feature map; The initial feature map after channel projection represents the original input feature map of the learnable residual high-frequency perceptron module. This represents the residual characteristics.

[0015] The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention provided by this invention has the following beneficial effects: The method of this invention, by combining frequency domain analysis and dynamic attention mechanism, effectively enhances the model's ability to extract features and resist interference from minute defects on the surface of the battery casing, providing a high-precision and robust solution for industrial quality inspection.

[0016] In the backbone network, this invention embeds a module based on dynamic region partitioning and phase-shift convolution (DRP-PConv-FA), which enhances the model's robustness to spatial transformations and scale changes through the combination of phase shift and region attention.

[0017] In the feature encoding network, this invention embeds a Learnable Residual High-frequency Perceiver (LR-HFP) module. By integrating discrete cosine transform frequency domain filtering, dual-path attention mechanism and learnable adaptive residual connection, it significantly enhances the ability to represent small defect features. Attached Figure Description

[0018] Figure 1 This is a flowchart of the battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention in the embodiment.

[0019] Figure 2 This is a network flow diagram of the defect detection model in the embodiment.

[0020] Figure 3 This is a network structure diagram of the dynamic region partitioning and phase-shifting convolution module in the embodiment.

[0021] Figure 4 The diagram shows the network structure of the learnable residual high-frequency sensor module in this embodiment.

[0022] Figure 5 This is a distribution diagram of the defect dataset in the example. Detailed Implementation

[0023] The specific embodiments of the present invention are described below to enable those skilled in the art to understand the present invention. However, it should be understood that the present invention is not limited to the scope of the specific embodiments. For those skilled in the art, various changes are obvious as long as they are within the spirit and scope of the present invention as defined and determined by the appended claims. All inventions utilizing the concept of the present invention are protected.

[0024] This embodiment provides a battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention. It outperforms existing methods in both detection accuracy and robustness, offering an efficient and reliable solution for battery surface defect detection. (Reference) Figure 1 Specifically, it includes the following: S1. Obtain an image of the battery casing surface; S2. Construct a defect detection model, which includes a backbone network, a feature encoder, and a detection head. The specific network structure is as follows: Figure 2 As shown; S3. Input the battery casing surface image into the backbone network. Through multiple cascaded mixing stages and dynamic region segmentation and phase-shifting convolution modules, output initial feature maps at multiple scales. , and ; Current battery casing defect detection methods often suffer from false positives or false negatives due to changes in illumination, scale scaling, and spatial deformation. Traditional convolutional neural networks lack robustness to complex textures and minute defects, and their multi-scale feature fusion efficiency is low. Therefore, a module that can adapt to spatial transformations and enhance feature representation capabilities is needed. This embodiment proposes the DRP-PConv-FA module, which combines Dynamic Region Partitioning (DRP) with Phase-Shift Convolution (PConv) to achieve efficient aggregation of multi-scale features, significantly improving the model's sensitivity to defects and its generalization ability.

[0025] Specifically, in this embodiment, the dynamic region partitioning and phase-shifted convolution module DRP-PConv-FA is embedded between adjacent mixing stages. It includes the phase-shifted convolution module PConv and the area attention module AreaAttention. The phase-shifted convolution module performs phase-shifted convolution processing on the output features of the corresponding mixing generation stage, and the area attention module performs dynamic region partitioning and area feature aggregation processing on the features processed by the phase-shifted convolution module.

[0026] In some embodiments, reference Figure 3 The phase-shift convolution module uses four asymmetric-padded convolutions to extract input features in parallel. The padding parameters for the four convolutions are P(1,0,0,3), P(0,3,0,1), P(0,1,3,0), and P(3,0,1,0), respectively, with corresponding kernel sizes of 1×3, 3×1, 1×3, and 3×1. The first layer of the phase-shift convolution module Pconv performs the following convolution: In the formula, This is the output feature map of the first path. The first layer, first path input feature map is padded asymmetrically. The kernel size is 1×3; This is the output feature map of the second path. The first layer, second path input feature map is asymmetrically padded. The kernel size is 3×1; This is the output feature map of the third path; The first layer third path input feature map is asymmetrically padded. The kernel size is 1×3; This is the output feature map of the fourth path. The first layer, fourth path input feature map is asymmetrically padded. The kernel size is 3×1; , , Representing the input tensors respectively Height, width, and number of channels; The height of the output feature map after the first layer of interleaved convolution. This represents the width of the output feature map after the first layer of interleaved convolutions. This represents the number of channels in the output feature map after the first layer of interleaved convolutions. Fill the first edge with 1 pixel and the right edge with 3 pixels. Fill the bottom edge of the second path with 3 pixels and the right edge with 1 pixel. Fill the bottom edge of the third path with 1 pixel and the left edge with 3 pixels. Fill the fourth path with 3 pixels at the top edge and 1 pixel at the left edge; For activation function, For batch normalization; This is a convolution operation.

[0027] The height of the output feature map is after the first layer of interleaved convolution. ,width and number of channels The relationship with the input feature map is as follows: In the formula, s is the number of channels in the final output feature map of the phase-shifted convolution module; s is the convolution stride.

[0028] The results of the first layer of interleaved convolutions are concatenated, and the output is represented as follows: In the formula, The output of the four-way interleaved convolution is the concatenated result (the concatenated tensor). This indicates a splicing operation.

[0029] Finally, the concatenated tensor is passed through a convolution kernel. Normalization is performed, but no padding is applied. The height and width of the output feature map are adjusted to preset values. and This allows PConv to be interchanged with Conv layers and serves as a channel attention mechanism to analyze the contributions of different convolutional directions; the final output... The calculation is as follows: In the formula, The phase-shifted convolutional module outputs a feature map. It is a 2×2 convolution kernel.

[0030] The effectiveness of the receptive field decreases outwards, similar to a Gaussian distribution. Furthermore, the smaller the target, the more concentrated its features become, highlighting the importance of central features. Notably, PConv utilizes grouped convolutions, significantly increasing the receptive field while minimizing the number of parameters. The number of parameters for a standard convolution is calculated as follows: In the formula, This represents the number of parameters in a standard convolution. It refers to the kernel size; here, we are using... Convolution The value is 9. For the total number of parameters, The number of weight parameters; if the output phase-shifted convolution module finally outputs the feature map with a certain number of channels. Equal to the number of input tensor channels Then the parameters of the standard convolution are: The PConv parameter is calculated as follows: In the formula, This represents the number of parameters in the phase-shifted convolution module.

[0031] In some embodiments, reference Figure 3 The region attention module performs dynamic region segmentation and region feature aggregation on the features processed by the phase-shift convolution module. Specifically, it divides the feature map into multiple semantic regions, performs attention calculations at the region level, and then broadcasts the information back to the pixel level to achieve efficient feature enhancement. This includes the following: (1) Flatten the features processed by the phase-shift convolution module. The flattened features are as follows: , The output tensor G belongs to a three-dimensional tensor space over the real number field, with dimensions B×N×C. For batch size, The number of feature channels, It is the total number of spatial pixels; (2) Calculate the similarity between the flattened features and the expanded region center parameter matrix to obtain the similarity matrix. ; The core of region division lies in the learnable region central parameter moments. ,in For the preset number of areas, The spatial dimension of the feature map is given; the similarity between pixel features and region centers is calculated: in, It is The result after expanding on the batch dimension.

[0032] This step calculates each pixel using matrix multiplication. With each regional center The similarity; this enables dynamic content-based partitioning, rather than a fixed grid partitioning.

[0033] (3) The operation of taking the maximum value index of the similarity matrix in the region dimension is performed to assign a unique region index to each pixel and generate a region index matrix. : Specifically, for each region m∈[0,M] 1], Here, m is an index variable used to iterate through the number of each region. M is the total number of regions; when M is 16, m takes values ​​from 0 to 15.

[0034] (4) For each region m, aggregate the pixel features assigned to this region and calculate the region representation. This process utilizes its mask. A weighted average of the features of all pixels belonging to this region is calculated as follows: In the formula, for Regional-level characteristics, For pixel index, For region mask weights, For sample index, For channel indexing, It is a very small constant. for In the nth sample, the nth The pixel in the first Pixel feature values ​​on a channel.

[0035] (5) Perform multi-head attention mechanism update on the regional features, and combine with the regional index matrix to map the updated regional features back to the pixel level through index mapping to obtain the output features of the dynamic region division and phase shift convolution module. Specifically, all region representations are stacked into region-level features. Then, a multi-head attention mechanism is used for inter-region information exchange. First, regional features are... Query (Q), key (K), and value (V) vectors are generated by projecting a learnable weight matrix. Then, Q, K, and V are rearranged into a multi-head form. Attention weights are calculated. as follows: in, Scaling factor d For each dimension of attention head, , h Number of heads for multi-head attention. Attention output. for: Merge multiple outputs: In the formula, This indicates a rearrangement back to space operation; Updated region representation Based on the region index matrix A Propagate back to each pixel: in, Representation from the region Middle index batch b and region The corresponding feature vector.

[0036] Finally, apply the output linear transformation: Rearrange the output back to a spatial format: in, H For height, W For width, The features are those processed by the linear layer. Represents a linear transformation operation; This is the output of the rearranged feature map format.

[0037] S4. Transfer the initial feature map , and The inputs are respectively fed into the encoding branches corresponding to the feature encoder, and through the learnable residual high-frequency perceptron module, an enhanced feature map is output. , and ; refer to Figure 2 Each encoding branch includes a channel projection layer, a learnable residual high-frequency perceptron module, a Transformer encoder, and a feature pyramid and path aggregation network configured sequentially. The initial feature map is sequentially processed by channel projection, feature enhancement and fusion, global context modeling, and bidirectional fusion of multi-scale features to obtain the enhanced feature map.

[0038] In some embodiments, defects such as scratches, cracks, and particle protrusions on the battery casing surface typically manifest as local abrupt changes, edges, and discontinuous textures in the image. These features primarily correspond to high-frequency components in the frequency domain. Meanwhile, smooth, uniform background areas are mainly distributed at low frequencies. Therefore, this embodiment proposes an LR-HFP module, which suppresses low-frequency background information in the frequency domain, enhances high-frequency features related to defects, and guides a deep network to focus on suspected defect areas through an attention mechanism using this "purified" feature map. This method achieves accurate identification and localization of minute defects, effectively solving the problem of traditional methods being insensitive to weak defect features in complex backgrounds. Specifically, it includes the following: refer to Figure 4 The learnable residual high-frequency perceptron module performs feature enhancement and fusion processing, specifically including: The first stage, frequency domain filtering and feature extraction, aims to separate high-frequency components that are strongly correlated with defects from the input image. Initial feature map after channel projection Perform a two-dimensional discrete cosine transform to obtain the frequency domain spectrum. : In the formula, Represents the two-dimensional discrete cosine transform; A high-pass filter mask with an ideal rectangular shape is used to perform high-pass filtering on the frequency domain spectrum. In the DCT spectrum, low-frequency energy is concentrated in the upper left corner, representing the smooth background of the image. The spectral coefficients in this region are set to zero to suppress the background; the remaining high-frequency coefficients are retained to enhance defect edges and texture information, thus obtaining the high-frequency spectrum. In the formula, It is a high-frequency spectrum. This is a high-pass filter mask. This indicates element-wise multiplication; Then, perform an inverse two-dimensional discrete cosine transform on the high-frequency spectrum to convert it back to the spatial domain, obtaining the high-frequency response characteristic map. : In the formula, It is the inverse two-dimensional discrete cosine transform; Second stage: dual-path attention weight generation; Using the purified high-frequency feature map Two complementary attention weights are generated through two parallel paths: the channel path and the spatial path.

[0039] The channel path evaluation assesses the importance of different feature channels for defect detection. First, it examines high-frequency feature maps. Simultaneously, global average pooling and global max pooling are performed to obtain two channel descriptors that aggregate global information. These descriptors capture the global average response and maximum response of the feature, respectively; GMP is particularly crucial for local anomalies such as defects. Next, the two descriptors are fed into a shared small multilayer perceptron, where ReLU and Sigmoid activations are performed to generate channel attention weights. : In the formula, For channel attention weights, Refers to the ReLU function. It is a multilayer perceptron. For global average pooling, This is for global max pooling; Spatial path analysis aims to locate spatial regions where defects may occur. First, a 1×1 convolution is used on the high-frequency feature map. The channel dimensions are compressed and integrated to obtain a spatial saliency map. Then, it is normalized into a spatial attention mask using the sigmoid function. : In the formula, Spatial attention weights, It is a 1×1 convolutional layer; The third stage: adaptive feature fusion and enhancement; The channel attention weights and spatial attention weights are multiplied element-wise with the input of the learnable residual high-frequency perceptron module to obtain the channel enhancement features and spatial enhancement features. The sum of the two is then smoothed by a 3×3 convolution to obtain the residual features. Finally, a learnable scalar parameter β is introduced to weight and fuse the residual features with the input of the learnable residual high-frequency perceptron module to obtain the enhancement feature map. More specifically, firstly, the channel attention weights Spatial attention weights Compared with the original input Element-wise multiplication yields the channel enhancement features. and spatial enhancement features : Secondly, enhance the channel features and spatial enhancement features The features are added together to form a preliminary fused feature. Subsequently, a 3×3 convolutional layer further integrates and spatially smooths this fused feature, extracting more robust local contextual information to obtain the residual feature. : In the formula, It is a 3×3 convolution; Finally, a learnable scalar parameter β is introduced to weight and fuse the residual features with the input of the learnable residual high-frequency perceptron module, resulting in an enhanced feature map, represented as: In the formula, To enhance the feature maps, this residual structure ensures that even if a module fails to learn effective enhancements, the output will not be inferior to the original input, guaranteeing the stability of the training process. The network can adaptively control the enhancement intensity by optimizing β.

[0040] S5. Enhance the feature map , and Input the detection head, and output the classification results of surface defects on the battery casing and the coordinates of the bounding box.

[0041] To further verify the effectiveness of the method, this embodiment includes corresponding verification experiments. Dataset The experiments in this embodiment were primarily conducted on the self-made battery surface defect dataset Defect_Battery. Furthermore, the proposed method was evaluated using the public steel plate surface defect dataset GC10-DET, which has similar defects to the self-made dataset, to more comprehensively test its performance and conduct a fairer comparison. GC10-DET is a surface defect dataset collected in real-world industrial environments. It contains ten types of surface defects: punching (Pu), weld (Wl), crescent-shaped gap (Cg), water stain (Ws), oil stain (Os), thread stain (Ss), inclusion (In), rolling crater (Rp), crease (Cr), and waist crease (Wf). All collected defects are on the surface of steel plates. This dataset includes 2292 grayscale images, each with a size of 2048. The 1000 dataset is arbitrarily divided into training and testing sets in an 8:2 ratio.

[0042] In this embodiment, the self-made dataset categorizes battery surface defects into five types: tear (LS), pressure tear (YS), bulge (GB), crack (LK), and U-shape (UX). The original self-made battery dataset, Defect_Battery, contains 4050 images, arbitrarily divided in an 8:2 ratio. 3240 images are used as the training set, and 810 images are used as the test set. Figure 5 The experimental environment is shown.

[0043] This embodiment of the experiment was built on the deep learning framework PyTorch and the relevant tests were conducted using the Ubuntu system. The DEIM-D-FINE-S network model was used as a benchmark for improvement and training, and no pre-trained weights were used during the training process.

[0044] Evaluation indicators; To verify the model's performance, this embodiment uses three evaluation metrics—precision, recall, and average precision (mAP)—as assessment standards. These metrics are all calculated based on the confusion matrix. As shown in Table 1, TP indicates that the original data is a positive sample, and the model predicts it as a positive sample as well. FN indicates that the original data is a positive sample, and the model predicts it as a negative sample as well. TN indicates that the original data is a negative sample, and the model predicts it as a negative sample as well. FP indicates that the original data is a negative sample, but the model incorrectly predicts it as a positive sample.

[0045] Table 1 Confusion Matrix Precision (P) measures the accuracy of detection results; it is the proportion of actual positive samples among those predicted as positive. A higher precision value indicates fewer false positives and higher detection accuracy. Recall (R) measures the model's coverage of positive samples; it is the proportion of actual positive samples correctly detected. A higher recall value describes the model's effectiveness in capturing true instances, meaning higher sensitivity to the target object. Mean Accuracy (mAP) considers both precision and recall for each class under different IOU thresholds. It is calculated by averaging the average accuracy (AP) of all classes (assuming C classes). A higher mAP value indicates excellent overall performance in object detection tasks, with improved precision and recall across all classes. The formulas for calculating these three metrics are as follows: Experimental results; To comprehensively evaluate the performance of the detection-based method proposed in this invention, systematic comparative experiments and ablation studies were conducted on the self-made dataset Defect_Battery and the public dataset GC10-DET. Experimental results show that the complete model proposed in this invention achieves significantly better performance than the benchmark model DEIM-D-FINE-S on the self-made battery defect dataset.

[0046] Through systematic ablation experiments, the contributions of the DRP-PConv-FA and LR-HFP modules were analyzed in depth. Introducing the DRP-PConv-FA module alone significantly improved mAP, mainly due to its effective enhancement of the model's adaptability to spatial transformations and scale changes through dynamic region partitioning and phase-shifted convolution. Introducing the LR-HFP module alone also showed significant improvement, demonstrating the advantages of frequency domain enhancement and dual-path attention mechanisms in capturing minute defect features. When the two modules work together, a significant synergistic effect is achieved, resulting in optimal performance.

[0047] Cross-domain evaluation on the GC10-DET common steel plate defect dataset further validates the strong generalization ability of the proposed method. This result demonstrates that the feature enhancement strategy proposed in this paper is not only applicable to specific types of battery defect detection, but can also be effectively transferred to defect detection tasks on other industrial surfaces, exhibiting strong universality.

[0048] In summary, this invention addresses the challenges of detecting surface defects in industrial battery casings, such as the small size of defects, complex backgrounds, and weak model generalization ability, and proposes an improved method. The core innovation of this method lies in the design of two collaborative modules: a DRP-PConv-FA module embedded in the backbone network and an LR-HFP module embedded in the feature encoder. Experimental results show that on the self-made Defect_Battery dataset, the proposed method significantly outperforms the benchmark model in terms of precision, recall, and mAP. Cross-domain evaluation on the public dataset GC10-DET further demonstrates the good generalization performance of this method. Ablation experiments verify the effectiveness of each module component, especially the frequency domain filtering technique, which plays a crucial role in improving the detection accuracy of small defects.

[0049] Although specific embodiments of the invention have been described in detail with reference to the accompanying drawings, this should not be construed as limiting the scope of protection of this patent. Various modifications and variations that can be made by a person skilled in the art without inventive effort within the scope described in the claims still fall within the scope of protection of this patent.

Claims

1. A method for detecting surface defects in battery casings based on frequency domain enhancement and dynamic region attention, characterized in that, Includes the following steps: S1. Obtain an image of the battery casing surface; S2. Construct a defect detection model, which includes a backbone network, a feature encoder, and a detection head; S3. Input the battery casing surface image into the backbone network, and output initial feature maps at multiple scales through multiple cascaded mixing stages and dynamic region segmentation and phase-shifting convolution modules. , and ; S4. Transfer the initial feature map , and The inputs are respectively fed into the encoding branches corresponding to the feature encoder, and through the learnable residual high-frequency perceptron module, an enhanced feature map is output. , and ; S5. The enhanced feature map , and Input the detection head, and output the classification results of surface defects on the battery casing and the coordinates of the bounding box.

2. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 1, characterized in that, In S3, the dynamic region segmentation and phase-shift convolution module is embedded between adjacent mixing stages. It includes a phase-shift convolution module and a region attention module. The phase-shift convolution module performs phase-shift convolution processing on the output features of the corresponding mixing generation stage, and the region attention module performs dynamic region segmentation and region feature aggregation processing on the features processed by the phase-shift convolution module.

3. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 2, characterized in that, The phase-shifted convolution module uses four-way asymmetric padding convolution to extract input features in parallel. The padding parameters of the four convolutions are P(1,0,0,3), P(0,3,0,1), P(0,1,3,0), and P(3,0,1,0), respectively, and the convolution kernel sizes are 1×3, 3×1, 1×3, and 3×1, respectively. The output feature maps of the four convolutions are concatenated by channels, and then normalized in dimension by a 2×2 unpadded convolution to obtain the feature map after phase-shift convolution processing: In the formula, The phase-shifted convolutional module outputs a feature map. This is a preset value for the height of the final output feature map of the phase-shifted convolution module. This is a preset value for the width of the final output feature map of the phase-shifted convolution module. The number of channels in the final output feature map of the phase-shifted convolution module; For activation function, For batch normalization, The output is the concatenated result of four interleaved convolutions. The height of the output feature map after the first layer of interleaved convolution. This represents the width of the output feature map after the first layer of interleaved convolutions. This represents the number of channels in the output feature map after the first layer of interleaved convolution. The kernel is 2×2; This is a convolution operation; The number of parameters in the phase-shifted convolution module is expressed as follows: In the formula, This represents the number of parameters in the phase-shifted convolution module. The number of channels of the input tensor for the phase-shift convolution module.

4. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 2, characterized in that, The region attention module performs dynamic region segmentation and region feature aggregation on the features processed by the phase-shift convolution module, specifically including: Flatten the features processed by the phase-shift convolution module; Calculate the similarity between the flattened features and the expanded region center parameter matrix to obtain the similarity matrix; The operation of taking the maximum value index of the similarity matrix in the region dimension assigns a unique region index to each pixel and generates a region index matrix; Based on the region mask, the pixel features in each region are aggregated by weighted average to obtain region-level features; The region-level features are updated using a multi-head attention mechanism, and combined with the region index matrix, the updated region-level features are mapped back to the pixel level through index mapping to obtain the output features of the dynamic region segmentation and phase-shift convolution module.

5. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 4, characterized in that, Based on the region mask, the pixel features within each region are weighted and averaged to obtain the region-level features, which are represented as follows: In the formula, for Regional-level characteristics, This represents the total number of spatial pixels. For pixel index, For region mask weights, For regional indexes, For sample index, For channel index, It is a very small constant. for In the nth sample, the nth The pixel in the first Pixel feature values ​​on a channel.

6. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 5, characterized in that, By combining the region index matrix, the updated region-level features are mapped back to the pixel level through index mapping, thus representing the updated region. Based on regional allocation A Propagate back to each pixel: In the formula, To output the feature tensor, Representation from the region Middle index batch b and region The corresponding feature vector; The output tensor G belongs to a three-dimensional tensor space over the real number field, with dimensions B×N×C. For batch size, The number of feature channels, It is the total number of spatial pixels; Finally, by applying the output linear transformation, the module output characteristics are obtained: Rearrange the output back to a spatial format to obtain: In the formula, The features are those processed by the linear layer. Represents a linear transformation operation; This is the output of the rearranged feature map format. This indicates a rearrangement back into space operation. H For height, W For width.

7. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 1, characterized in that, In S4, each of the coding branches includes a channel projection layer, a learnable residual high-frequency perceptron module, a Transformer encoder, and a feature pyramid and path aggregation network configured sequentially. The initial feature map is sequentially processed by channel projection, feature enhancement and fusion, global context modeling, and bidirectional fusion of multi-scale features to obtain the enhanced feature map.

8. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 7, characterized in that, The learnable residual high-frequency perceptron module performs feature enhancement and fusion processing, specifically including: The first stage: frequency domain filtering and feature extraction; A two-dimensional discrete cosine transform is performed on the initial feature map after channel projection to obtain the frequency domain spectrum; an ideal rectangular high-pass filter mask is used to perform high-pass filtering on the frequency domain spectrum to obtain the high-frequency spectrum; then an inverse two-dimensional discrete cosine transform is performed on the high-frequency spectrum to convert it back to the spatial domain to obtain the high-frequency response feature map. Second stage: dual-path attention weight generation; Based on the high-frequency response feature map, two complementary attention weights are generated through two parallel paths, including channel attention weights and spatial attention weights. The third stage: adaptive feature fusion and enhancement; The channel attention weights and spatial attention weights are multiplied element-wise with the input of the learnable residual high-frequency perceptron module to obtain the channel enhancement features and spatial enhancement features. The sum of the two features is then smoothed by a 3×3 convolution to obtain the residual features. Finally, a learnable scalar parameter β is introduced to weight and fuse the residual features with the input of the learnable residual high-frequency perceptron module to obtain the enhanced feature map.

9. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 8, characterized in that, In the second stage, the channel attention weights are represented as follows: In the formula, For channel attention weights, Refers to the ReLU function. It is a multilayer perceptron. For global average pooling, This is a high-frequency response feature map. This is for global max pooling; Spatial attention weights are represented as follows: In the formula, Spatial attention weights, It is a 1×1 convolutional layer.

10. The battery casing surface defect detection method based on frequency domain enhancement and dynamic region attention according to claim 8, characterized in that, In the third stage, a learnable scalar parameter β is introduced to weight and fuse the residual features with the input of the learnable residual high-frequency perceptron module, resulting in an enhanced feature map, which is represented as follows: In the formula, To enhance the feature map; The initial feature map after channel projection represents the original input feature map of the learnable residual high-frequency perceptron module. This represents the residual characteristics.