Space object recognition method and device based on attention mechanism adaptive enhancement
By using an adaptive enhancement spatial target recognition method based on an attention mechanism, and by utilizing a tensor processing module and a convolutional block attention module, the low recognition accuracy of traditional CNNs is solved, and more efficient spatial target recognition is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING INST OF ENVIRONMENTAL FEATURES
- Filing Date
- 2026-04-20
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional neural networks (CNNs) have low recognition accuracy and struggle to effectively identify key features of spatial targets, resulting in low recognition accuracy.
An adaptive enhancement spatial target recognition method based on attention mechanism is adopted. By training the spatial target recognition model, including tensor processing modules and convolutional block attention modules in multiple processing stages, the image is convolved and channel transformed, and the channel and element values are adjusted to improve the recognition accuracy.
It significantly improves the accuracy of spatial target recognition, focuses on key features, and enhances feature utilization and recognition efficiency.
Smart Images

Figure CN122265636A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and in particular to a spatial target recognition method and apparatus based on attention mechanism adaptive enhancement. Background Technology
[0002] Detecting, tracking, analyzing, and identifying space targets orbiting the Earth is one of the main tasks of space surveillance systems. Since image data can provide relatively intuitive projection information of space targets, space target image recognition is a key research direction in current space target identification. Target recognition methods are crucial for improving recognition accuracy. However, because optical images of space targets are primarily grayscale images against a deep space background, the texture information of the targets is unclear, and the dynamic range of grayscale values varies greatly across different parts of the target. Using traditional neural network (CNN) recognition methods fails to capture key features, resulting in low recognition accuracy. Summary of the Invention
[0003] The technical problem to be solved by this invention is that the traditional neural network CNN has low recognition accuracy in the prior art. In view of the defects in the prior art, this invention provides a spatial target recognition method and device based on attention mechanism adaptive enhancement.
[0004] To address the aforementioned technical problems, this invention provides a spatial target recognition method based on an adaptive enhancement of an attention mechanism, comprising: acquiring spatial target images; The tensor of the spatial target image is input into a pre-trained spatial target recognition model to obtain the spatial target recognition result; The spatial target recognition model includes multiple processing stage sub-models. For any processing stage sub-model, it includes a tensor processing module set unit and a convolutional block attention module. The tensor processing module set unit includes multiple tensor processing modules connected in sequence. The tensor processing modules are used to perform convolution and channel transformation on the input tensors, and / or transform the tensor element values. The convolutional block attention module is used to perform channel filtering on the tensors output by the tensor processing module set unit to improve recognition accuracy.
[0005] In one implementation, training the spatial target recognition model includes: Obtain raw sample images of space targets; The original sample images of the space target are subjected to sample enhancement processing to increase the number of samples; The increased sample size is then input into the spatial target recognition model to train the model.
[0006] In one embodiment, the original sample image of the space target is subjected to sample enhancement processing, including one or more of the following: The original sample image of the spatial target is cropped to obtain a cropped image, and the cropped image is scaled to a predetermined size to obtain a scaled image; The original sample image of the spatial target is randomly flipped vertically; The original sample image of the spatial target is randomly flipped horizontally.
[0007] In one implementation, training the spatial target recognition model includes: During the training of the spatial target recognition model, after each calculation, it is determined whether the number of channels in the spatial target image needs to be adjusted according to the first judgment condition; if so, the channel number adjustment parameter in the tensor processing module is adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
[0008] In one implementation, training the spatial target recognition model includes: During the training of the spatial target recognition model, after each calculation, it is determined whether the tensor element values in the tensor need to be adjusted according to the second judgment condition; if so, the tensor element value adjustment parameters in the tensor processing module are adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
[0009] In one embodiment, the tensor processing module includes at least a convolutional layer with a large-scale convolutional kernel. The convolutional layer with the large-scale convolutional kernel is located at the beginning of the tensor processing module and is used to perform convolution processing on the input tensor to obtain a global correlation feature tensor within the receptive field. The generalization module, located at the end of the tensor processing module, is used to discard some channels in the tensor to improve generalization ability.
[0010] In one embodiment, the tensor processing module further includes: The layer normalization module is used to standardize the element values in the channels of the global correlation feature tensor to obtain a standardized feature tensor. The first channel adjustment module is used to perform convolution processing on the standardized feature tensor, adjust the number of channels, adjust the number of first channels to the number of second channels, and obtain the first channel adjusted feature tensor; The activation function module is used to perform function operations on the first channel adjustment feature tensor, adjust the tensor element values, and obtain the first tensor element value adjustment feature tensor to achieve feature enhancement and noise suppression. The second channel adjustment module is used to perform convolution processing on the tensor element value adjustment feature tensor, adjust the number of channels, adjust the number of the second channel to the number of the first channel, and obtain the second channel adjustment feature tensor. The layer scaling module is used to adjust the tensor element values in the channels of the second channel adjustment feature tensor to obtain the second channel tensor element value adjustment feature tensor.
[0011] Secondly, this application proposes a spatial target recognition device based on an attention mechanism-adaptive enhancement, comprising: The acquisition module is used to acquire images of spatial targets. The processing module is used to input the tensor of the spatial target image into a pre-trained spatial target recognition model to obtain the recognition result of the spatial target; The spatial target recognition model includes multiple processing stage sub-models. For any processing stage sub-model, it includes a tensor processing module set unit and a convolutional block attention module. The tensor processing module assembly unit includes multiple tensor processing modules connected in sequence. The tensor processing modules are used to perform convolution and channel transformation on the input spatial target image, and / or transformation processing of tensor element values. The convolutional block attention module is used to perform channel filtering on the tensors output by the tensor processing module set unit to improve recognition accuracy.
[0012] Thirdly, this application proposes an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the spatial target recognition method based on attention mechanism adaptive enhancement as described in any of the preceding claims.
[0013] Fourthly, this application proposes a space observation station, including the aforementioned electronic equipment.
[0014] The technical solution of this application includes a tensor processing module that performs convolution and channel transformation on the input tensor, and / or transforms the tensor element values; this can significantly improve the accuracy of the final recognition result. The convolutional block attention module implements channel filtering, which is beneficial for focusing on key features and improving feature utilization. Attached Figure Description
[0015] Figure 1 This is a flowchart illustrating the spatial target recognition method of the present invention; Figure 2 This is a structural block diagram illustrating the ConvNeXt module of the present invention; Figure 3 This is a structural diagram illustrating the CBAM attention mechanism of the present invention; Figure 4 This is a diagram illustrating the overall structure of the target recognition network model of the present invention; Figure 5 This is a structural block diagram illustrating the spatial target identification device of the present invention; Figure 6 This is a structural block diagram illustrating the electronic device of the present invention. Detailed Implementation
[0016] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0017] like Figure 1 As shown, embodiments of the present invention provide a spatial target recognition method based on an attention mechanism-adaptive enhancement, comprising: In step S102, the spatial target image is acquired.
[0018] In this embodiment, ground-based space observation stations can be used to acquire images of space targets, or satellites in space can be used to photograph space targets to obtain images. Space targets include, but are not limited to, celestial bodies and spacecraft in space.
[0019] In step S104, the tensor of the spatial target image is input into a pre-trained spatial target recognition model to obtain the spatial target recognition result. The spatial target recognition model includes multiple processing stage sub-models. For any processing stage sub-model 01, it includes a tensor processing module set unit 02 and a convolutional block attention module 03.
[0020] In this embodiment, the spatial target image is preprocessed to obtain a tensor of the spatial target image. A tensor is a set of data with three dimensions: the length of the spatial target image, the width of the spatial target image, and the number of channels of the spatial target image.
[0021] The tensor processing module set unit 02 includes multiple tensor processing modules 021 connected in sequence. The tensor processing module 021 is used to perform convolution and channel transformation on the input spatial target image, and / or transform the tensor element values.
[0022] In this embodiment, the tensor processing module 021 performs channel transformation on the input tensor, such as increasing or decreasing the number of channels. The number of channels is dynamically adjusted according to actual needs.
[0023] Adjusting the number of channels allows for more precise allocation of feature processing capabilities to the model. Increasing the number of channels is equivalent to opening more perceptual windows for the model, enabling it to capture finer and more diverse features. For example, it can transform from recognizing only color to simultaneously recognizing color, texture, and edges, making it suitable for complex scenes in spatial object images, subdividing categories, and recognizing blurry images. Reducing the number of channels is equivalent to lightening the burden on the model and removing redundancy by deleting repetitive or ineffective features. For example, some channels may only capture noise, preventing overfitting and reducing computational load, making it suitable for simple scenarios or when model training slows down.
[0024] Tensor processing module 021 transforms the tensor element values of the input tensors. Adjusting the tensor element values can make the features more regular. For example, a layer normalization layer can be used to adjust the tensor element values to avoid some tensor element values being too large and others being too small, making the subsequent calculations of the model more stable and preventing gradient explosion or vanishing.
[0025] Activation functions are used to adjust the values of tensor elements, filtering them, such as removing noise from negative values. At the same time, nonlinearity is injected, enabling the model to learn complex feature relationships, such as distinguishing similar but different objects, avoiding the model's limitation of only knowing linear recognition.
[0026] Adjusting the number of channels optimizes the quantity and dimension of features, while adjusting pixel values optimizes the quality of individual features. When the two are combined, the model can learn the features more efficiently, resulting in more accurate recognition and clearer output, which helps improve the efficiency and accuracy of recognition.
[0027] The convolutional block attention module 03 is used to perform channel filtering on the tensors output by the tensor processing module set unit to improve the recognition accuracy.
[0028] The technical solution of this application includes a tensor processing module that performs convolution and channel transformation on the input tensor, and / or transforms the tensor element values; this can significantly improve the accuracy of the final recognition result. The convolutional block attention module implements channel filtering, which is beneficial for focusing on key features and improving feature utilization.
[0029] In some embodiments, training the spatial target recognition model includes the following steps: Obtain raw sample images of space targets.
[0030] In this embodiment, the original sample image of the space target is obtained from a space device. Since the original sample image of the space target obtained from a space device is clearer than that obtained from a ground-based device, images obtained from space devices are preferred. The space device can be a satellite, space station, or similar equipment in space.
[0031] The original sample images of the space target are subjected to sample enhancement processing to increase the number of samples.
[0032] In this embodiment, the enhancement process may include, but is not limited to, changing the display parameters of the original sample image, such as changing the chroma, changing the contrast, changing the hue, etc.
[0033] The original sample image undergoes changes in size, such as proportional scaling, enlarging, shrinking, or cropping.
[0034] The increased sample size is then input into the spatial target recognition model to train the model.
[0035] In some embodiments, the original sample image of the space target is subjected to sample enhancement processing, including one or more of the following: The original sample image of the space target is cropped to obtain a cropped image, and the cropped image is scaled to a predetermined size to obtain a scaled image.
[0036] In this embodiment, a unique semantic category is assigned to each original image and archived in JSON file format. A region of each image is randomly cropped, and the image is then scaled to 224×224 using bicubic interpolation. This step achieves random scaling, viewpoint shift, and slight distortion of the images, greatly expanding the distribution of effective samples.
[0037] The original sample image of the space target is randomly flipped vertically.
[0038] In this embodiment, the image is randomly flipped vertically, and the pixel matrix is inverted to enhance data diversity and make the model robust to upward and downward view scenes.
[0039] The original sample image of the spatial target is randomly flipped horizontally.
[0040] In this embodiment, the image is randomly flipped left and right, and the left and right information of the pixel matrix is swapped to enhance data diversity and make the model robust to left and right poses and mirrored composition.
[0041] The above-mentioned technical solution can solve the problem of accuracy in identifying individual space targets using satellite image data when the data sample is small, and provides strong support for space target identification.
[0042] In some embodiments, the sample dataset can be divided into a training set and a test set in an 8:2 ratio. 80% of the data is used to train the recognition network model, and the trained model is used to recognize the other 20% of the data to calculate the recognition accuracy.
[0043] See Table 1 for the identification results of satellite optical space target data.
[0044] Table 1
[0045] In this embodiment, the advanced ConvNeXt module is used as the baseline backbone network, and the CBAM attention mechanism is introduced on the basis of the baseline backbone network. This can more effectively improve the model accuracy and increase the recognition accuracy without changing the training data. The recognition accuracy of this invention reaches 94%.
[0046] In some embodiments, after acquiring the original sample image of the space target, the following steps may also be included: The original sample images of spatial targets are converted into tensors, with pixel values linearly mapped to 0-1. The data is then standardized using the mean and standard deviation to obtain standardized data. This normalization and standardization process ensures that the standardized data conforms to the input format of deep learning frameworks and eliminates channel distribution differences. It aligns the data with the sample data used in the official pre-training of the ConvNeXt module. This allows the use of weights from the officially trained ConvNeXt module without needing to train it from scratch, thus accelerating convergence and improving model performance.
[0047] In some embodiments, training the spatial target recognition model includes: During the training of the spatial target recognition model, after each calculation, it is determined whether the number of channels in the spatial target image needs to be adjusted according to the first judgment condition; if so, the channel number adjustment parameter in the tensor processing module is adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
[0048] In this embodiment, the first criterion for determining whether the number of channels in the spatial target image needs to be adjusted can be based on the function value of the loss function and the individual feature maps in the feature tensor. If the feature maps are relatively blurry, it indicates poor overall feature extraction, and channels need to be increased. If the model is overfitting, the loss function oscillates, memory overflows, the inference computation is too large, and the processing time for each frame far exceeds real-time requirements, it indicates that the number of channels is too high, and channels need to be reduced.
[0049] The channel number adjustment parameters include, but are not limited to, the number of convolutional kernels, the number of retained channels, and the number of spliced channels.
[0050] In some embodiments, training the spatial target recognition model includes: During the training of the spatial target recognition model, after each calculation, it is determined whether the tensor element values in the tensor need to be adjusted according to the second judgment condition; if so, the tensor element value adjustment parameters in the tensor processing module are adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
[0051] In this embodiment, the second determination condition for whether the tensor element values in the tensor need to be adjusted can be one or more of the following: The feature response is blurry and has low discriminative power. After visualizing the feature tensor, it was found that the grayscale image contains useful signals, bright spots and noise, and the boundaries of dark areas are unclear. The values are all squeezed in the middle, neither close to 0 nor close to the saturation value. In this case, adjusting the element values can enhance the effective features and suppress the ineffective noise.
[0052] If the model's gradient vanishes, converges slowly, or the loss decreases very slowly during training, or stalls prematurely (e.g., the sigmoid function causes the gradient to approach 0), it is necessary to change the loss function and adjust the element values to allow the gradient to propagate more smoothly and help the model converge faster.
[0053] Feature redundancy and obscuring of effective information: Many elements in the tensor have similar values, causing useful features to be submerged in irrelevant information. For example, multi-channel features may be unremarkable. In such cases, it is necessary to adjust the element values to highlight the numerical differences of key features, making it easier for subsequent layers to capture effective information.
[0054] The tensor element value adjustment parameter is used to adjust the tensor element values, including but not limited to the weights and biases of each convolution kernel.
[0055] In some embodiments, the tensor processing module includes at least a convolutional layer with a large-scale convolutional kernel. The convolutional layer with the large-scale convolutional kernel is disposed at the beginning of the tensor processing module and is used to perform convolution processing on the input tensor to obtain a global correlation feature tensor within the receptive field.
[0056] In this embodiment, the size of the large-scale convolution kernel is greater than 5. A size of 5, for example, a size of 9. 9, 8 8, 7 7, 6 6. Preferably, the size is 7. A convolution kernel of 7.
[0057] The generalization module, located at the end of the tensor processing module, is used to discard some channels in the tensor to improve generalization ability.
[0058] In this embodiment, the generalization module can be implemented using the Drop Path module.
[0059] In some embodiments, the tensor processing module further includes: The layer normalization module is used to standardize the pixel feature values in the channels of the global associated feature tensor to obtain a standardized feature tensor.
[0060] The first channel adjustment module is used to perform convolution processing on the standardized feature tensor, adjust the number of channels, adjust the number of first channels to the number of second channels, and obtain the first channel adjusted feature tensor.
[0061] The activation function module is used to perform function operations on the first channel adjustment feature tensor to adjust the tensor element values and obtain the first tensor element value adjustment feature tensor to achieve feature enhancement and noise suppression.
[0062] In this embodiment, the activation function module can be the GELU module.
[0063] The second channel adjustment module is used to perform convolution processing on the tensor element value adjustment feature tensor, adjust the number of channels, adjust the number of the second channel to the number of the first channel, and obtain the second channel adjustment feature tensor. The layer scaling module is used to adjust the tensor element values in the channels of the second channel adjustment feature tensor to obtain the second channel tensor element value adjustment feature tensor.
[0064] In this embodiment, see Appendix Figure 2The tensor processing module can be implemented using the ConvNeXt module. The ConvNeXt module is used to construct the feature extraction backbone network for extracting rich features of spatial targets.
[0065] In some embodiments, the construction of a ConvNeXt module includes the following steps: Step 1 involves constructing depthwise separable convolutional layers. This step uses a large 7×7 convolutional kernel. Large kernel convolutions have become increasingly popular in recent years, while classic deep learning backbone networks tend to use smaller kernels, such as VGG16 which uses a 3×3 kernel. This is because using large kernels significantly increases the number of parameters and reduces model efficiency. To address this issue, the ConvNeXt benchmark module employs depthwise separable convolutions with a large 7×7 kernel. Using a large kernel can efficiently increase the effective receptive field, thereby improving network performance.
[0066] Step 2, Layer Normalization: The ConvNeXt baseline module uses fewer normalization layers, added only after the depthwise separable convolutional layers of the ConvNeXt baseline module. Its normalization layer is Layer Normalization (LN). This step not only reduces the internal covariance shift but also reduces the computational burden of the network, thereby improving the network's performance and accuracy.
[0067] Step 3, 1×1 convolutional layer, used to change the number of channels. Preferably, the channel dimension at each spatial location can be increased to four times the original. This frees up more representation space for subsequent nonlinear activations, allowing the network to learn more complex inter-channel relationships.
[0068] Step 4: Gaussian Error Linear Unit (GELU) activation function. The GELU activation function can change the feature value of the pixel in each channel. The ConvNeXt baseline module uses fewer activation functions, adding it only after the 1×1 convolutional layer of the ConvNeXt baseline module, choosing the GELU activation function. This function provides a smooth nonlinear response, reduces gradient discretization, and helps deep residual networks converge efficiently.
[0069] Step 5, 1×1 convolutional layer, compresses the channel dimension back to its original size, thus fully mixing channel information without increasing the final representation size.
[0070] Step 6, layer scaling, introduces a learnable scaling factor for each channel, which helps to suppress the output amplitude of residual branches and stabilize deep training.
[0071] Step 7, Drop Path layer. The Drop Path layer deactivates the main branch structure in the model with a certain probability. That is, the Drop Path layer changes the output of the main node to 0 with a certain probability, which is equivalent to only the shortcut branch constituting the output. This effectively improves the generalization of the model.
[0072] Furthermore, in one embodiment of the present invention, a lightweight self-attention module can be added between the activation function module and the second channel adjustment module. The feature map output by the activation function module has already been upgraded in dimensionality and has richer representation capabilities. The lightweight self-attention module is used to perform effective global and local relationship modeling in this high-dimensional space. This process can enhance the ability to capture key features of spatial targets while suppressing noise and irrelevant information. Then, the second channel adjustment module is used to reduce the dimensionality, which can maximize the preservation of key features of spatial targets.
[0073] In some embodiments, see Appendix Figure 3 The diagram shows the structure of the Convolutional Block Attention Module (CBAM).
[0074] The construction of the attention mechanism includes the following steps: The attention mechanism selected is the CBAM module, which is used for hybrid channel and spatial attention. The CBAM module serves as the baseline attention mechanism module. The CBAM module is an attention mechanism module used for feature enhancement in convolutional neural networks. It can improve the feature representation capability of convolutional neural networks by adaptively learning channel and spatial attention weights.
[0075] The CBAM module mainly includes two attention mechanisms: Channel Attention Mechanism (CAM) and Spatial Attention Mechanism (SAM).
[0076] In this process, CAM performs average pooling and max pooling on the feature maps to reduce the dimensionality to 1×1×C. After passing through a multilayer perceptron (MLP), the dimensionality is further reduced to 1×1×C / r. Subsequently, the two output feature maps are summed, and the channel attention weight Mc is generated using the sigmoid function, with the following formula: ; Where F is the input feature map; σ is the Sigmoid activation function; AvgPool and MaxPool are average pooling and max pooling, respectively.
[0077] SAM enhances the extraction of key spatial features and compensates for the shortcomings of channel attention mechanisms. The formula for calculating the spatial attention weight Ms is as follows: ; Conv3×3 is a 3×3 convolution operation.
[0078] The CBAM module can be easily integrated into existing convolutional neural network architectures, thereby improving the network's feature representation capabilities.
[0079] like Figure 4 As shown, in this embodiment, the entire target recognition process comprises four stages. Each stage includes a ConvNeXt baseline module and a CBAM module. Each ConvNeXt baseline module includes multiple ConvNeXt modules, which can be considered as a collection of multiple ConvNeXt modules. The feature tensor output by the ConvNeXt baseline module is first input to the channel attention branch of the CBAM module, then to the spatial attention branch, and the resulting weighted features are then input to the next stage. This integrates the CBAM mechanism with ConvNeXt, enhancing information filtering capabilities and better extracting the structural features of spatial targets.
[0080] Furthermore, cross-stage feedback connections can be introduced, whereby the channel attention weights and spatial attention weights output by the CBMA module in stage i (i=1, 2, 3) are processed and passed to each lightweight self-attention module in stage i+1 as guidance information.
[0081] Specifically, we can assume that the input feature of a certain ConvNeXt module in the (i+1)th stage is X∈R. B×C×H×W The calculation method of its internal lightweight self-attention module is as follows: The query Q, key K, and value V are generated by three 1×1 convolutions, each with a dimension of B×. C ′×H×W, C ′ represents the number of channels after dimensionality upgrade; Receive feedback information from the previous stage CBAM, which includes the channel attention weights of stage i. Spatial attention weights To match the current feature dimension, the channel attention weights are copied and expanded along the spatial dimension, resulting in... The spatial attention weights are replicated and expanded along the channel dimension, resulting in... .
[0082] To incorporate feedback information into the self-attention calculation, the extended channel attention weights and extended spatial attention weights are first multiplied element-wise, and then multiplied with the attention score of the lightweight self-attention module.
[0083] It should be noted that, due to the different resolutions of the feature maps at each stage, the attention weights used in the feedback need to be size-adapted. Channel attention weight dimension C i It can be transformed into the input channel number C of the (i+1)th stage through linear interpolation or 1×1 convolution. i+1 The matched vector; The spatial attention weight dimension is H i ×W i It can be scaled to the spatial size H of the input feature map in the (i+1)th stage using bilinear interpolation. i+1 ×W i+1 .
[0084] To ensure the effectiveness of the feedback information, a feedback adaptation module can be set after each stage of CBAM, such as consisting of 1×1 convolutions and upsampling layers, to convert the attention weights into a form suitable for the next stage.
[0085] In this way, during model training, the feedback loop allows gradients to propagate backward from subsequent stages to the CBAM module in the preceding stage, prompting the CBAM module to learn more meaningful attention weights. This closed-loop structure makes the attention mechanisms mutually reinforcing. CBAM provides priors for self-attention, and the features extracted by self-attention are further optimized by subsequent CBAM, forming a virtuous cycle.
[0086] Secondly, see the appendix. Figure 5 This application proposes a spatial target recognition device based on an attention mechanism-adaptive enhancement, comprising: Acquisition module 21 is used to acquire spatial target images; Processing module 22 is used to input the tensor of the spatial target image into a pre-trained spatial target recognition model to obtain the recognition result of the spatial target; The spatial target recognition model includes multiple processing stage sub-models. For any processing stage sub-model, it includes a tensor processing module set unit and a convolutional block attention module. The tensor processing module assembly unit includes multiple tensor processing modules connected in sequence. The tensor processing modules are used to perform convolution and channel transformation on the input spatial target image, and / or transformation processing of tensor element values. The convolutional block attention module is used to perform channel filtering on the tensors output by the tensor processing module set unit to improve recognition accuracy.
[0087] Processing module 22 is further configured to perform sample enhancement processing on the original sample image of the spatial target, including one or more of the following: The original sample image of the spatial target is cropped to obtain a cropped image, and the cropped image is scaled to a predetermined size to obtain a scaled image; The original sample image of the spatial target is randomly flipped vertically; The original sample image of the spatial target is randomly flipped horizontally.
[0088] The processing module 22 is also used to determine, after each calculation is completed, whether the number of channels in the spatial target image needs to be adjusted according to the first judgment condition; if so, the channel adjustment parameters in the tensor processing module are adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
[0089] The processing module 22 is also used to determine, after each calculation is completed, whether the tensor element values in the tensor need to be adjusted according to the second judgment condition; if so, the tensor element value adjustment parameters in the tensor processing module are adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
[0090] Thirdly, see appendix. Figure 6 This application proposes an electronic device including a memory 32, a processor 31, and a computer program stored in the memory and executable on the processor. When the processor 31 executes the computer program, it implements the spatial target recognition method based on attention mechanism adaptive enhancement as described in any of the preceding claims.
[0091] The aforementioned electronic devices can be computing devices such as desktop computers, laptops, handheld computers, and cloud servers. These electronic devices may include, but are not limited to, processors and memory. Those skilled in the art will understand that the figures are merely examples of electronic devices and do not constitute a limitation on the electronic devices. They may include more or fewer components than illustrated, or combine certain components, or different components. For example, the aforementioned electronic devices may also include input / output devices, network access devices, buses, etc.
[0092] The processor referred to can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.
[0093] Fourthly, this application proposes a space observation station, including the aforementioned electronic equipment.
[0094] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A spatial target recognition method based on attention mechanism adaptive enhancement, characterized in that, include: Acquire spatial target images; The tensor of the spatial target image is input into a pre-trained spatial target recognition model to obtain the spatial target recognition result; The spatial target recognition model includes multiple processing stage sub-models. For any processing stage sub-model, it includes a tensor processing module set unit and a convolutional block attention module. The tensor processing module set unit includes multiple tensor processing modules connected in sequence. The tensor processing modules are used to perform convolution and channel transformation on the input tensors, and / or transform the tensor element values. The convolutional block attention module is used to perform channel filtering on the tensors output by the tensor processing module set unit to improve recognition accuracy.
2. The spatial target recognition method based on attention mechanism adaptive enhancement according to claim 1, characterized in that, Training the spatial target recognition model includes: Obtain raw sample images of space targets; The original sample images of the space target are subjected to sample enhancement processing to increase the number of samples; The increased sample size is then input into the spatial target recognition model to train the model.
3. The spatial target recognition method based on attention mechanism adaptive enhancement according to claim 2, characterized in that, The original sample image of the space target is subjected to sample enhancement processing, including one or more of the following: The original sample image of the spatial target is cropped to obtain a cropped image, and the cropped image is scaled to a predetermined size to obtain a scaled image; The original sample image of the spatial target is randomly flipped vertically; The original sample image of the spatial target is randomly flipped horizontally.
4. The spatial target recognition method based on attention mechanism adaptive enhancement according to claim 1, characterized in that, Training the spatial target recognition model includes: During the training of the spatial target recognition model, after each calculation, it is determined whether the number of channels in the spatial target image needs to be adjusted according to the first judgment condition; if so, the channel number adjustment parameter in the tensor processing module is adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
5. The spatial target recognition method based on attention mechanism adaptive enhancement according to claim 1, characterized in that, Training the spatial target recognition model includes: During the training of the spatial target recognition model, after each calculation, it is determined whether the tensor element values in the tensor need to be adjusted according to the second judgment condition; if so, the tensor element value adjustment parameters in the tensor processing module are adjusted to obtain the adjusted tensor processing module, and the adjusted tensor processing module is used for training in the next calculation.
6. The spatial target recognition method based on attention mechanism adaptive enhancement according to claim 1, characterized in that, The tensor processing module includes at least a convolutional layer with a large-scale convolutional kernel. The convolutional layer with the large-scale convolutional kernel is set at the beginning of the tensor processing module and is used to perform convolution processing on the input tensor to obtain a global correlation feature tensor within the receptive field. The generalization module, located at the end of the tensor processing module, is used to discard some channels in the tensor to improve generalization ability.
7. The spatial target recognition method based on attention mechanism adaptive enhancement according to claim 6, characterized in that, The tensor processing module further includes: The layer normalization module is used to standardize the element values in the channels of the global correlation feature tensor to obtain a standardized feature tensor. The first channel adjustment module is used to perform convolution processing on the standardized feature tensor, adjust the number of channels, adjust the number of first channels to the number of second channels, and obtain the first channel adjusted feature tensor; The activation function module is used to perform function operations on the first channel adjustment feature tensor, adjust the tensor element values, and obtain the first tensor element value adjustment feature tensor to achieve feature enhancement and noise suppression. The second channel adjustment module is used to perform convolution processing on the tensor element value adjustment feature tensor, adjust the number of channels, adjust the number of the second channel to the number of the first channel, and obtain the second channel adjustment feature tensor. The layer scaling module is used to adjust the tensor element values in the channels of the second channel adjustment feature tensor to obtain the second channel tensor element value adjustment feature tensor.
8. A spatial target recognition device based on an attention mechanism with adaptive enhancement, characterized in that, include: The acquisition module is used to acquire images of spatial targets. The processing module is used to input the tensor of the spatial target image into a pre-trained spatial target recognition model to obtain the recognition result of the spatial target; The spatial target recognition model includes multiple processing stage sub-models. For any processing stage sub-model, it includes a tensor processing module set unit and a convolutional block attention module. The tensor processing module assembly unit includes multiple tensor processing modules connected in sequence. The tensor processing modules are used to perform convolution and channel transformation on the input spatial target image, and / or transformation processing of tensor element values. The convolutional block attention module is used to perform channel filtering on the tensors output by the tensor processing module set unit to improve recognition accuracy.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the spatial target recognition method based on attention mechanism adaptive enhancement as described in any one of claims 1 to 7.