A method and apparatus for cross-domain target detection based on semantic communication

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a semantic feature encoder and decoder and combining it with an adversarial network for global feature alignment, the impact of channel noise on target detection in semantic communication systems is resolved, thereby improving the accuracy of cross-domain target detection.

CN118823309BActive Publication Date: 2026-06-30XIDIAN UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIDIAN UNIV
Filing Date: 2024-06-24
Publication Date: 2026-06-30

Application Information

Patent Timeline

24 Jun 2024

Application

30 Jun 2026

Publication

CN118823309B

IPC: G06V10/25; G06V10/40; G06V10/42; G06V10/764; G06V10/766; G06V10/774; G06V10/82; G06N3/045; G06N3/0455; G06N3/0464; G06N3/0475; G06N3/084; G06N3/094; G06N3/096; H04B15/00

AI Tagging

Technology Topics

Pattern recognition Semantic vector

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Weed detection method and device based on hypergraph enhanced YOLOv11 framework
CN122265829AMake up for the limitationsImplement long-range dependency captureCharacter and pattern recognition Biological models Pattern recognitionWeed detection
Image filtering method, device, equipment, storage medium and program product
CN122265042AGuaranteed continuityLow latencyImage enhancement Processor architectures/configuration Pattern recognition Imaging processing
Reference frame selection based on camera pose for video encoding
WO2026123146A1Image analysis Digital video signal modification Pattern recognition Gyroscope
Video surveillance system with advantageous viewpoint transformation
CN115297295BPattern recognition Video monitoring
Image processing method, electronic device, and storage medium
CN122265779ACharacter and pattern recognition Biological models Pattern recognition Imaging processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing semantic communication systems cannot effectively combat channel noise in target detection tasks, resulting in decreased detection performance. Furthermore, they exhibit poor model transfer performance and are time-consuming when performing cross-domain target detection.

Method used

We construct a semantic feature encoder and decoder, achieve cross-domain target detection through adversarial training, utilize the semantic feature encoder to transmit and recover features in the channel, and combine it with an adversarial network for global feature alignment to improve detection accuracy.

Benefits of technology

It effectively combats channel noise, improves transmission efficiency, and aligns feature distributions across different domains, thereby enhancing cross-domain target detection accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118823309B_ABST

Patent Text Reader

Abstract

This invention discloses a method and apparatus for cross-domain target detection based on semantic communication, relating to the field of wireless communication technology. The method includes: acquiring labeled source domain images and unlabeled target domain images, inputting them into a trained target detection network model for processing to obtain semantic features of the source domain image and the target domain image; inputting these features into a semantic feature encoder for encoding to obtain semantic vectors of the source domain image and the target domain image; transmitting these semantic vectors to a channel; and having the receiving end of the semantic feature decoder recover the source domain image semantic features and target domain image semantic features from the noisy source domain image semantic vectors and target domain image semantic vectors using the semantic feature decoder; and performing global feature alignment of the recovered source domain image semantic features and recovered target domain image semantic features using adversarial training. This invention can improve the accuracy of cross-domain target detection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of wireless communication technology, specifically relating to a method and apparatus for cross-domain target detection oriented towards semantic communication. Background Technology

[0002] With the innovation of communication technology and the development of smart terminals, services such as video and image transmission have grown rapidly, and traditional communication faces problems such as channel capacity approaching the Shannon limit. Systems oriented towards semantic communication can transmit relevant semantic information according to the task, reduce the transmission of irrelevant information, and achieve higher transmission efficiency than traditional communication, thus gaining widespread attention and research.

[0003] In semantic communication scenarios, visual information processing is crucial; among these, object detection, as a commonly used visual information processing task, plays a vital role in scenarios such as connected vehicles. Under existing general frameworks of semantic communication, object detection tasks cannot effectively combat channel noise, thus hindering visual information transmission and leading to a significant drop in detection performance under task-oriented communication. Therefore, designing object detection algorithms for semantic communication is of considerable research value. Furthermore, due to the varying data distributions across different scenarios, applying detectors trained on source scenarios to target scenarios involves time-consuming data acquisition and annotation, often resulting in a significant performance degradation. Considering cross-domain object detection, which treats different scenarios as different domains, enabling model transfer across different data distributions improves model generalization. Therefore, in task-oriented communication scenarios, how to transfer object detection models to a new domain is of great research significance. Summary of the Invention

[0004] To address the aforementioned problems in the prior art, this invention provides a method and apparatus for cross-domain target detection based on semantic communication. The technical problem to be solved by this invention is achieved through the following technical solution:

[0005] In a first aspect, the present invention provides a method for cross-domain target detection oriented towards semantic communication, comprising:

[0006] Obtain labeled source domain images and unlabeled target domain images;

[0007] Labeled source domain images and unlabeled target domain images are input into a trained object detection network model for processing to obtain semantic features of the source domain images and semantic features of the target domain images.

[0008] The semantic features of the source domain image and the semantic features of the target domain image are input into the semantic feature encoder for encoding to obtain the semantic vectors of the source domain image and the target domain image. The semantic vectors of the source domain image and the target domain image are then sent to the channel. The receiving end of the semantic feature decoder recovers the source domain image semantic features and the target domain image semantic features from the source domain image semantic vectors and the target domain image semantic vectors with noise interference through the semantic feature decoder.

[0009] The recovered semantic features of the source domain image and the recovered semantic features of the target domain image are aligned globally using adversarial training.

[0010] Secondly, the present invention also provides an apparatus for cross-domain target detection oriented towards semantic communication, comprising:

[0011] The image acquisition module is used to acquire labeled source domain images and unlabeled target domain images;

[0012] Image processing module one is used to input labeled source domain images and unlabeled target domain images into a trained target detection network model for processing, so as to obtain semantic features of source domain images and semantic features of target domain images.

[0013] Image processing module two is used to input the semantic features of the source domain image and the semantic features of the target domain image into the semantic feature encoder for encoding, so as to obtain the semantic vectors of the source domain image and the semantic vectors of the target domain image; the semantic vectors of the source domain image and the semantic vectors of the target domain image are sent to the channel, and the receiving end of the semantic feature decoder recovers the source domain image semantic features and the semantic vectors of the target domain image with noise interference through the semantic feature decoder.

[0014] Image processing module three is used to perform global feature alignment between the restored source domain image semantic features and the restored target domain image semantic features using adversarial training.

[0015] The beneficial effects of this invention are:

[0016] This invention provides a method and apparatus for cross-domain target detection based on semantic communication. It constructs a semantic feature encoder to obtain semantic features for target detection, effectively combating channel noise and improving transmission efficiency. The semantic feature encoder and decoder extract the semantic vectors required for cross-domain target detection. Without sharing feature extractor parameters, based on the adversarial principle, this invention sets up a domain discriminator and feature extractor to form an adversarial network. Through adversarial training, the feature distribution of the target domain approximates the feature distribution of the source domain, thereby maximizing the loss of the domain discriminator and improving the accuracy of cross-domain target detection.

[0017] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0018] Figure 1 This is a flowchart of a cross-domain target detection method based on semantic communication provided in an embodiment of the present invention;

[0019] Figure 2 This is a schematic diagram of a first semantic feature encoder provided in an embodiment of the present invention;

[0020] Figure 3 This is a schematic diagram of a first semantic feature decoder provided in an embodiment of the present invention;

[0021] Figure 4 This is a schematic diagram illustrating the comparison of average accuracy (mAP) under different signal-to-noise ratio values provided in an embodiment of the present invention. Detailed Implementation

[0022] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.

[0023] Please see Figure 1 , Figure 1 This is a flowchart of a cross-domain target detection method for semantic communication provided by an embodiment of the present invention. The cross-domain target detection method for semantic communication provided by the present invention includes:

[0024] S101. Obtain the labeled source domain image and the unlabeled target domain image.

[0025] S102. Input the labeled source domain image and the unlabeled target domain image into the trained target detection network model for processing to obtain the semantic features of the source domain image and the semantic features of the target domain image.

[0026] Specifically, in this embodiment, the process of obtaining the trained object detection network model includes:

[0027] Obtain the training dataset, which includes labeled source domain images;

[0028] The training dataset is input into the feature extractor for processing to obtain the semantic features of the first source domain image;

[0029] Optionally, a ResNet50 convolutional neural network with a Feature Pyramid Network (FPN) is used as the feature extractor to extract features from the source domain image. The ResNet50 convolutional neural network is pre-trained on the large image recognition dataset ImageNet. Specifically, the FPN generates P3, P4, and P5 based on the C3, C4, and C5 scale features output by the ResNet50 convolutional neural network. Then, P6 is obtained by passing a 3×3 convolutional layer with a stride of 2 on top of P5. Finally, P7 is obtained by passing another 3×3 convolutional layer with a stride of 2 on top of P6, resulting in five features at different scales (P3 to P7). These source domain image features are used as input to subsequent networks for iterative updates. In this embodiment, stochastic gradient descent (SGD) is used as the optimizer for network training, and the learning rate is set to 0.0025.

[0030] The semantic features of the first source domain image are input into the semantic first feature encoder for encoding, resulting in the semantic vector x of the first source domain image, whose expression is:

[0031] x = T α (f s );

[0032] Among them, T α (·) represents the semantic feature encoder, f s Represents source domain image features;

[0033] Optionally, please see Figure 2 , Figure 2 This is a schematic diagram of a first semantic feature encoder provided in an embodiment of the present invention. The first semantic feature encoder based on convolution consists of a convolutional layer Conv and a Leakey ReLU activation layer, and finally obtains a three-dimensional image semantic vector. In the convolutional layer Conv, k1 represents the convolution kernel size of 1, s1 represents the convolution stride of 1, and c128 represents the number of channels of the convolution output vector of 128.

[0034] The semantic vector of the first source domain image is transmitted to the physical channel to simulate noise interference. The receiving end recovers the semantic features of the first source domain image from the noisy first source domain image semantic vector through the first semantic feature decoder. The expression for the noisy first source domain image semantic vector y is:

[0035] y = h*x + ω;

[0036] Where h represents the channel covariance coefficient, and ω represents an independent, identically distributed vector;

[0037] The recovered first source domain image semantic features f s The expression for ' is:

[0038] f s =R β (y);

[0039] Among them, R β (·) represents a semantic feature decoder;

[0040] Optionally, please see Figure 3 , Figure 3 This is a schematic diagram of a first semantic feature decoder provided in an embodiment of the present invention. The convolution-based image semantic feature decoder consists of a convolutional layer Conv and a Leakey ReLU activation layer, and finally restores the three-dimensional image semantic features with the same size as before encoding. In the convolutional layer Conv, k1 represents the convolution kernel size of 1, s1 represents the convolution stride of 1, and c256 represents the number of channels of the convolution output vector of 256.

[0041] The semantic features of the recovered first source domain image are input into a preset target detection network model for training, resulting in a trained target detection network model.

[0042] In this embodiment, the loss function L of the trained object detection network model DET The expression is:

[0043] L DET =αL cls +βL res ;

[0044] Among them, L cls L represents the classification loss. res This represents the regression loss, where α and β represent the weights, respectively.

[0045] S103. Input the source domain image semantic features and the target domain image semantic features into the semantic feature encoder for encoding to obtain the source domain image semantic vector and the target domain image semantic vector; send the source domain image semantic vector and the target domain image semantic vector to the channel, and the receiving end of the semantic feature decoder will recover the source domain image semantic features and the target domain image semantic features from the source domain image semantic vector and the target domain image semantic vector with noise interference through the semantic feature decoder.

[0046] Specifically, in this embodiment, the source domain image semantic vector x s and the semantic vector x of the target domain image t The expressions are as follows:

[0047]

[0048] in, This represents the source domain semantic feature encoder. f represents the target domain semantic feature encoder. s f represents the semantic features of the source domain image. t It represents the semantic features of the target domain image.

[0049] In this embodiment, the source domain image semantic vector y with noise interference s and the semantic vector y of the target domain image t The expressions are as follows:

[0050] y s =h*x s +ω;

[0051] y t =h*x t +ω;

[0052] Where h represents the channel covariance coefficient, ω represents an independent and identically distributed vector, and x s x represents the semantic vector of the source domain image. t This represents the semantic vector of the target domain image.

[0053] In this embodiment, the recovered source domain image semantic features f s 'and the recovered target domain image semantic features f t The expressions for ' are as follows:

[0054]

[0055] in, This represents the source domain semantic feature decoder. y represents the target domain semantic feature decoder. s y represents the semantic vector of the source domain image with noise interference. t This represents the semantic vector of the target domain image with noise interference.

[0056] Specifically, in this embodiment, the source domain feature extractor and the target domain feature extractor share the same network structure. The parameters of the pre-trained source domain feature extractor are used to initialize the target domain feature extractor. Then, labeled source domain images and unlabeled target domain images are loaded into the trained object detection network model and passed through their respective feature extractors to obtain source domain image features and target domain image features. The source domain feature extractor is fixed and no longer updated. The target domain feature extractor uses stochastic gradient descent (SGD) as the optimizer for network training, and its learning rate is set to 0.001.

[0057] S104. Use adversarial training to perform global feature alignment between the restored source domain image semantic features and the restored target domain image semantic features.

[0058] Specifically, in this embodiment, the recovered source domain image semantic vector and the recovered target domain image semantic vector are aligned globally using adversarial training, including:

[0059] The recovered source domain image semantic features and the recovered target domain image semantic features are input into a domain discriminator for training to distinguish the domain labels of the recovered source domain image semantic vectors and the recovered target domain image semantic vectors. Optionally, based on other adversarial discriminators, the domain label d is set to 0, and the source domain and target domain are set to 0. The domain discriminator is trained using mean squared error loss. Optionally, the domain discriminator includes 4 convolutional layers, 4 batch normalization layers, and 4 Leakey ReLU layers, where the batch normalization layer represents the batch normalization layer, and the Leakey ReLU layer represents the activation function layer.

[0060] The target domain feature extractor is used as a generator, and the target domain feature labels are bound to the source domain feature labels to train the generator.

[0061] During the training of the domain discriminator and the generator, alternating iterative training is performed, that is, fixing the parameters of the domain discriminator and optimizing the parameters of the generator, and fixing the parameters of the generator and optimizing the parameters of the domain discriminator.

[0062] It's important to note that the domain discriminator aims to identify the domain from which features originate, while the target domain feature generator aims to generate features similar to those in the source domain, thus deceiving the domain discriminator as much as possible. During this process, both the target domain feature generator and the domain discriminator optimize their networks, creating a competitive dynamic. Ultimately, through adversarial training, global alignment of source and target domain image features is achieved, improving the accuracy of cross-domain object detection. After numerous experiments and fine-tuning, stochastic gradient descent (SGD) was used as the optimizer for both the domain discriminator and the target domain feature extractor, with a learning rate set to 0.0001.

[0063] In this embodiment, the loss function of the domain discriminator is:

[0064]

[0065] in, Let d represent the semantic features of the source domain image and the semantic features of the target domain image, where d represents the label of the source domain or the target domain, and N represents the number of samples.

[0066] In this embodiment, the loss function of the generator is:

[0067]

[0068] in, denoted by , p represents the semantic features of the target domain image, p represents the source domain label, and N represents the number of samples.

[0069] In summary, this invention provides a cross-domain target detection method oriented towards semantic communication. It constructs a semantic feature encoder to extract semantic features for target detection, effectively combating channel noise and improving transmission efficiency. The semantic feature encoder and decoder extract the semantic vectors required for cross-domain target detection. Without sharing feature extractor parameters, based on the adversarial principle, this invention sets up a domain discriminator and feature extractor to form an adversarial network. Through adversarial training, the feature distribution of the target domain approximates the feature distribution of the source domain, thereby maximizing the loss of the domain discriminator and improving the accuracy of cross-domain target detection.

[0070] In an optional embodiment of the present invention, the effectiveness of the cross-domain target detection method for semantic communication provided in the above embodiment is verified by simulation experiments, specifically as follows:

[0071] I. Simulation Conditions

[0072] Operating system: Ubuntu 20.04, Python 3.8;

[0073] Experimental platform: PyTorch-GPU-1.12.0;

[0074] Processor: Intel Core i7-7700k CPU @ 4.20GHz × 4;

[0075] Graphics card: NVIDIA GeForce 3090Ti GPU;

[0076] Memory: 32GB.

[0077] II. Simulation Content and Result Analysis

[0078] Simulation Experiment 1: Effectiveness Experiment of Target Detection Method Based on Semantic Communication.

[0079] In the source domain target detection experiment, this embodiment selects the Cityscapes dataset as the source domain dataset.

[0080] Cityscapes is a widely used autonomous driving dataset, a collection of urban street scene images from 27 cities under clear weather conditions. The Cityscapes dataset contains 2975 training images and 500 validation images with example segmentation annotations, which can be converted into bounding box annotations for 8 categories. All images are 3-channel RGB images, captured by an onboard camera, with the same resolution of 1024×2048.

[0081] Table 1 compares the average accuracy (mAP) of target detection under different methods in the simulation experiments provided in this embodiment of the invention. Experiments were conducted using three methods: the first is a target detection method without a channel; the second is a target detection method using a semantic feature encoder and after passing through a channel; and the third is a target detection method based on the traditional JPEG codec architecture (i.e., target detection is performed on the image after JPEG decoding). It should be noted that the simulated channel is assumed to be an additive white Gaussian noise (AWGN) channel, and the signal-to-noise ratio is the same during training and testing phases. The target detection method based on semantic communication was trained and tested under the condition that the SNR was 17dB, after 2 / 3 code rate LDPC and 8-QAM modulation.

[0082] Table 1. Comparison of target detection average accuracy (mAP) under different methods in the simulation experiment provided by this invention.

[0083]

[0084] As shown in Table 1, compared with the results without the AWGN channel, the results of this invention do not show a significant decrease and even show some improvement in categories such as people and cyclists. In contrast, the target detection method based on the traditional JPEG encoding / decoding architecture reconstructs poor image quality after channel reception, resulting in a significant decrease in the average accuracy (mAP) of target detection, with varying degrees of decrease across different categories, especially in the train category where the accuracy drops to 0.1%. Therefore, this demonstrates that introducing a semantic feature encoder can effectively extract features related to target detection and reconstruct them at the receiving end, effectively combating noise and improving transmission efficiency.

[0085] Simulation Experiment 2: Effectiveness Experiment of Adversarial Cross-Domain Target Detection Method Based on Semantic Communication.

[0086] In this adversarial cross-domain object detection experiment, the Cityscapes dataset was selected as the source domain dataset, and the Foggy Cityscapes dataset as the target domain dataset. Foggy Cityscapes was constructed by simulating different levels of fog on Cityscapes images, generating three simulated fog levels based on depth maps and a physical model. Since adversarial network training is difficult to converge, after multiple experiments and fine-tuning, stochastic gradient descent (SGD) was used as the optimizer for the feature extractor, semantic feature encoder / decoder, and classifier, with a learning rate set to 0.0001. The model was tested using different SNR values, and the test results are the average of five iterations.

[0087] like Figure 4 As shown, Figure 4 This is a schematic diagram illustrating the comparison of average accuracy (mAP) under different signal-to-noise ratio (SNR) values provided in this embodiment of the invention. It shows the comparison of average accuracy mAP under different SNR values. It can be observed that as the channel SNR increases, the average accuracy mAP calculated by the method based on the gradient inversion layer (GRL) increases. The diagram also shows that the average accuracy mAP calculated by the method in this example is more stable compared to the target detection method based on the traditional JPEG codec architecture. Compared to the target detection method based on the traditional JPEG codec architecture, the channel quality drops sharply when the channel quality degrades. The target detection method based on the traditional JPEG codec architecture encodes the image into marker bits and data bits, which can transmit errors at low SNR. Errors in transmission severely affect image reconstruction and thus cross-domain target detection. This phenomenon is known as the "cliff effect." The method proposed in this invention can learn channel changes during the training phase, obtain semantic-level error correction capabilities, effectively extract semantic features, and thus achieve robust transmission to channel changes, avoiding the "cliff effect."

[0088] Table 2 shows the adaptation results from clear to foggy weather under AWGN channel conditions and SNR=17dB. Compared with non-cross-domain target detection and target detection methods based on traditional JPEG codec architecture, the proposed adversarial domain adaptive cross-domain target detection method achieves the best detection performance. Compared with non-cross-domain target detection, the proposed method shows varying degrees of improvement across different categories, especially a 14.4% improvement in AP for the vehicle category and a 7% improvement in average precision mAP. Compared with target detection methods based on traditional JPEG codec architecture, the proposed method achieves a 15.6% improvement in average precision mAP, demonstrating that the proposed method can significantly improve the accuracy of cross-domain target detection.

[0089] Table 2. Adaptation results from sunny to foggy days.

[0090]

[0091]

[0092] Based on the same inventive concept, this invention also provides an apparatus for cross-domain target detection oriented towards semantic communication, applied to the method for cross-domain target detection oriented towards semantic communication provided in the above embodiments of this invention. Embodiments of the method are described above and will not be repeated here. The apparatus includes:

[0093] The image acquisition module is used to acquire labeled source domain images and unlabeled target domain images;

[0094] Image processing module one is used to input labeled source domain images and unlabeled target domain images into a trained target detection network model for processing, so as to obtain semantic features of source domain images and semantic features of target domain images.

[0095] Image processing module two is used to input the semantic features of the source domain image and the semantic features of the target domain image into the semantic feature encoder for encoding, so as to obtain the semantic vectors of the source domain image and the semantic vectors of the target domain image; the semantic vectors of the source domain image and the semantic vectors of the target domain image are sent to the channel, and the receiving end of the semantic feature decoder recovers the source domain image semantic features and the semantic vectors of the target domain image with noise interference through the semantic feature decoder.

[0096] Image processing module three is used to perform global feature alignment between the restored source domain image semantic features and the restored target domain image semantic features using adversarial training.

[0097] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that an article or device comprising a list of elements includes not only those elements but also other elements not expressly listed. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the article or device comprising said element. Terms such as "connected" or "linked" are not limited to physical or mechanical connections but can include electrical connections, whether direct or indirect. The orientations or positional relationships indicated by terms such as "upper," "lower," "left," and "right" are based on the orientations or positional relationships shown in the accompanying drawings and are used only for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the invention.

[0098] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features or characteristics described may be combined in any suitable manner in one or more embodiments or examples. In addition, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.

[0099] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A method for cross-domain target detection oriented towards semantic communication, characterized in that, include: Obtain labeled source domain images and unlabeled target domain images; The labeled source domain image and the unlabeled target domain image are input into a trained target detection network model for processing to obtain semantic features of the source domain image and semantic features of the target domain image. The source domain image semantic features and the target domain image semantic features are input into a semantic feature encoder for encoding to obtain source domain image semantic vectors and target domain image semantic vectors; the source domain image semantic vectors and the target domain image semantic vectors are then transmitted to a channel, and the receiving end of the semantic feature decoder recovers the source domain image semantic features and target domain image semantic features from the source domain image semantic vectors and target domain image semantic vectors with noise interference through the semantic feature decoder. The recovered semantic features of the source domain image and the recovered semantic features of the target domain image are globally aligned using adversarial training; The process of obtaining the trained object detection network model includes: Obtain a training dataset, which includes labeled source domain images; The training dataset is input into a feature extractor for processing to obtain semantic features of the first source domain image. The semantic features of the first source domain image are input into the semantic first semantic feature encoder for encoding to obtain the semantic vector of the first source domain image. The first source domain image semantic vector is transmitted to the physical channel to simulate noise interference. The receiving end recovers the first source domain image semantic features by the first semantic feature decoder after the first source domain image semantic vector with noise interference is transmitted to the physical channel. The semantic features of the recovered first source domain image are input into a preset target detection network model for training, and the trained target detection network model is obtained. The source domain image semantic vector and target domain image semantic vector The expressions are as follows: ；； in, This represents the source domain semantic feature encoder. This represents the semantic feature encoder of the target domain. Represents the semantic features of the source domain image. Represents the semantic features of the target domain image; The recovered source domain image semantic features and the recovered target domain image semantic features The expressions are as follows: ；； in, This represents the source domain semantic feature decoder. Decoder representing semantic features of the target domain. This represents the semantic vector of the source domain image with noise interference. This represents the semantic vector of the target domain image with noise interference.

2. The method for cross-domain target detection based on semantic communication according to claim 1, characterized in that, The loss function of the trained object detection network model The expression is: ； in, Represents classification loss. Indicates regression loss, and These represent the weights.

3. The method for cross-domain target detection based on semantic communication according to claim 1, characterized in that, The source domain image semantic vector with noise interference and target domain image semantic vector The expressions are as follows: ；； in, Represents the channel covariance coefficient. Represents independent, identically distributed vectors. This represents the semantic vector of the source domain image. This represents the semantic vector of the target domain image.

4. The method for cross-domain target detection based on semantic communication according to claim 1, characterized in that, The step of aligning the recovered source domain image semantic features and the recovered target domain image semantic features using adversarial training for global feature alignment includes: The recovered source domain image semantic features and the recovered target domain image semantic features are input into a domain discriminator for training, so as to distinguish the domain labels of the recovered source domain image semantic vectors and the domain labels of the recovered target domain image semantic vectors. The target domain feature extractor is used as a generator, and the target domain feature labels are bound to the source domain feature labels to train the generator. During the training of the domain discriminator and the generator, alternating iterative training is performed, that is, fixing the parameters of the domain discriminator and optimizing the parameters of the generator, and fixing the parameters of the generator and optimizing the parameters of the domain discriminator.

5. The method for cross-domain target detection based on semantic communication according to claim 4, characterized in that, The loss function of the domain discriminator is: ； in, Representing the semantic features of the source domain image and the semantic features of the target domain image. Tags indicating the source or target domain. Indicates the number of samples.

6. The method for cross-domain target detection based on semantic communication according to claim 4, characterized in that, The loss function of the generator is: ； in, Represents the semantic features of the target domain image. Indicates the source domain tag. Indicates the number of samples.

7. A device for cross-domain target detection oriented towards semantic communication, characterized in that, include: The image acquisition module is used to acquire labeled source domain images and unlabeled target domain images; Image processing module one is used to input the labeled source domain image and the unlabeled target domain image into a trained target detection network model for processing, so as to obtain semantic features of the source domain image and semantic features of the target domain image. Image processing module two is used to input the source domain image semantic features and the target domain image semantic features into a semantic feature encoder for encoding to obtain source domain image semantic vectors and target domain image semantic vectors; the source domain image semantic vectors and the target domain image semantic vectors are then transmitted to a channel, and the receiving end of the semantic feature decoder recovers the source domain image semantic features and target domain image semantic features from the source domain image semantic vectors and target domain image semantic vectors with noise interference through the semantic feature decoder; Image processing module three is used to perform global feature alignment between the restored source domain image semantic features and the restored target domain image semantic features using adversarial training; The process of obtaining the trained object detection network model includes: Obtain a training dataset, which includes labeled source domain images; The training dataset is input into a feature extractor for processing to obtain semantic features of the first source domain image. The semantic features of the first source domain image are input into the semantic first semantic feature encoder for encoding to obtain the semantic vector of the first source domain image. The first source domain image semantic vector is transmitted to the physical channel to simulate noise interference. The receiving end recovers the first source domain image semantic features by the first semantic feature decoder after the first source domain image semantic vector with noise interference is transmitted to the physical channel. The semantic features of the recovered first source domain image are input into a preset target detection network model for training, and the trained target detection network model is obtained. The source domain image semantic vector and target domain image semantic vector The expressions are as follows: ；； in, This represents the source domain semantic feature encoder. This represents the semantic feature encoder of the target domain. Represents the semantic features of the source domain image. Represents the semantic features of the target domain image; The recovered source domain image semantic features and the recovered target domain image semantic features The expressions are as follows: ；； in, This represents the source domain semantic feature decoder. Decoder representing semantic features of the target domain. This represents the semantic vector of the source domain image with noise interference. This represents the semantic vector of the target domain image with noise interference.