An image semantic communication system based on smart metasurface

By employing a deep joint source-channel image compression coding method based on intelligent metasurfaces, and utilizing complex domain neural networks and spatial modulation techniques, the problems of slow computation speed, small system capacity, and high power consumption in wireless communication systems are solved, achieving efficient image semantic communication and demonstrating a bright future for wireless networks.

CN116320464BActive Publication Date: 2026-06-23HARBIN INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN INST OF TECH
Filing Date
2023-03-20
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing wireless communication systems suffer from slow computation speed, small system capacity, high power consumption, and narrow spectrum range when implementing semantic communication, making it difficult to meet the needs of future wireless networks.

Method used

A deep joint source-channel image compression coding method based on intelligent metasurfaces is adopted. Through offline training and physical deduction, the intelligent metasurface is used for in-flight computation offloading. Combined with complex domain neural networks and spatial modulation technology, efficient image semantic communication is achieved.

Benefits of technology

It achieves high computing speed, large system capacity, low computing power consumption, and wide spectrum range, and has a bright future for wireless networks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116320464B_ABST
    Figure CN116320464B_ABST
Patent Text Reader

Abstract

The application discloses an image semantic communication system based on an intelligent metasurface, adopts a deep joint source channel image compression and coding method based on the intelligent metasurface, and specifically comprises an offline training process and a physical deduction process. The image semantic communication system based on the intelligent metasurface has the advantages of high calculation speed, large system capacity, low calculation power consumption, a wide frequency spectrum range, real-time programmability and the like, and has a bright future in future wireless networks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of wireless communication technology, and in particular to an image semantic communication system based on a smart metasurface. Background Technology

[0002] Due to the development of artificial intelligence (AI), practical semantic communication systems have attracted considerable attention over the past four to five years. Recent research can be broadly categorized into two types: data reconstruction-oriented semantic communication and task-oriented semantic communication. The former shares the same goal as traditional communication—reconstructing source data. The latter directly utilizes semantic features to perform tasks. By focusing on the communication objective, task-oriented semantic extraction is strictly related to the information conveyed from the source data. Simultaneously, the emergence of application-specific integrated circuits (ASICs) for AI has significantly improved computing performance and reduced power consumption, making semantic communication suitable for future wireless networks.

[0003] Meanwhile, aerial artificial intelligence based on smart metasurfaces is a novel research topic that has attracted widespread attention. Recently, researchers have created prototype devices to implement the functionality of a certain layer of an artificial neural network or a specific neural network. This device utilizes electromagnetic wave characteristics to achieve parallel computation and simulates the neural network structure at the speed of light, demonstrating good computational offloading effects in the air. Various studies have indicated the industrial potential of modulation schemes based on smart metasurfaces, such as RIS-QAM, RIS-MBM (media-based modulation), RIS-SM / GSM (generalized spatial modulation), RIS-QSM (orthogonal spatial modulation), and RIS-SSM (co-occurrence spatial modulation). Since the transmission or reflection amplitude and phase of each tunable unit can be different and controllable, a single radio frequency MIMO transmitter can be constructed on a smart metasurface.

[0004] Due to the excellent performance of smart metasurfaces in the fields of communication and artificial intelligence, this paper argues that they possess the feasibility of realizing semantic communication over wireless signals. To the best of our knowledge, this patent is the first in the industry to propose a solution combining smart metasurfaces with semantic communication, while fully considering their physical characteristics, and presenting an image semantic communication system based on smart metasurfaces and simulation results. Summary of the Invention

[0005] The purpose of this invention is to provide an image semantic communication system based on intelligent metasurfaces, which has many advantages such as high computing speed, large system capacity, low computing power consumption, wide spectrum range, and real-time programmability, and has a bright future in future wireless networks.

[0006] To achieve the above objectives, this invention provides an image semantic communication system based on intelligent metasurfaces, which employs a deep joint source-channel image compression coding method based on intelligent metasurfaces, specifically including an offline training process and a physical deduction process.

[0007] Preferably, the offline training process specifically includes the following steps:

[0008] S1. Preparations before conducting network training;

[0009] S2. Input the batched data into the training network built in step S1, and let it pass through the semantic source channel encoder, channel space, and receiver generator in sequence. Calculate the corresponding loss function value and update the network parameters using stochastic gradient descent and backpropagation.

[0010] S3. Update the network parameters sequentially using three different learning rates, and determine whether the loss function value has converged to the preset value. If it has, terminate the training early and save the corresponding model parameters for offline use by edge devices. Otherwise, repeat step S2.

[0011] Preferably, in step S1, the provided dataset is preprocessed by modulating it from the real domain to the complex domain, and simultaneously shuffling, batching, and normalizing it; the semantic source-channel encoder, channel space, receiver generator, loss function, and optimizer required for the joint source-channel coding process are designed, and then the respective network parameters are initialized.

[0012] Preferably, the physical deduction process specifically includes the following steps:

[0013] (1) The radio frequency signal generator generates a radio frequency carrier and performs modulation and MIMO transmission through a spatial modulator. The spatial modulator consists of a layer of RIS and control hardware to adjust the behavior of each RIS unit. After receiving the radio frequency signal, the spatial modulator can perform signal processing on it.

[0014] (2) The radio frequency signal then passes through the semantic encoder;

[0015] (3) After passing through the wireless channel space, the radio frequency signal passes through the semantic decoder, which corresponds to the receiver generator in the offline training process. The semantic decoder decodes the transmitted signal from the parameter space to obtain the received signal, thereby completing the communication process.

[0016] Preferably, in step (2), after mapping the signal to the first layer of the semantic encoder, the smart metasurface directly performs computational processing on the wireless signal instead of the digital data of traditional semantic communication, which corresponds to the semantic source channel in the offline training process.

[0017] Therefore, the image semantic communication system based on a smart metasurface, as described above, can offload computation over the air at optical computing speeds, reducing the latency of AI-assisted semantic extraction. Simultaneously, the participation of the smart metasurface can significantly improve system capacity through spatial diversity and polarization diversity. It possesses numerous advantages such as high computing speed, large system capacity, low power consumption, wide spectrum range, and real-time programmability, and holds a bright future in future wireless networks.

[0018] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0019] Figure 1 This is a diagram of the offline training architecture for deep joint source-channel image compression coding of the present invention;

[0020] Figure 2 This is a physical deduction architecture diagram of the deep joint source-channel image compression coding of the present invention;

[0021] Figure 3 This is a PSNR performance diagram of complex JSCC and complex modulated JSCC under different training signal-to-noise ratios according to the present invention;

[0022] Figure 4 These are PSNR performance graphs of complex JSCC and complex modulated JSCC under different compression ratios according to the present invention;

[0023] Figure 5 These are the SSIM performance graphs of complex JSCC and complex modulated JSCC under different training signal-to-noise ratios according to the present invention;

[0024] Figure 6 This is a graph showing the SSIM performance of complex JSCC and complex modulated JSCC under different compression ratios according to the present invention. Detailed Implementation

[0025] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.

[0026] Unless otherwise defined, the technical or scientific terms used in this invention shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.

[0027] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered illustrative and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention, and no reference numerals in the claims should be construed as limiting the scope of the claims.

[0028] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment contains only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can be appropriately combined to form other embodiments that can be understood by those skilled in the art. These other embodiments are also covered within the scope of protection of this invention.

[0029] It should also be understood that the specific embodiments described above are only used to explain the present invention, and the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

[0030] All terms used in this disclosure (including technical or scientific terms) have the same meaning as understood by one of ordinary skill in the art to which this disclosure pertains, unless otherwise specifically defined. It should also be understood that terms defined in general dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant art, and not as being interpreted with idealized or highly formalized meanings, unless expressly defined herein.

[0031] Techniques, methods, and apparatus known to those skilled in the art may not be discussed in detail, but where appropriate, they should be considered part of the specification.

[0032] All prior art documents cited in this specification are incorporated herein by reference in their entirety and are therefore part of the disclosure of this invention.

[0033] This invention provides an image semantic communication system based on intelligent metasurfaces, which employs a deep joint source-channel image compression coding method based on intelligent metasurfaces, specifically including an offline training method and a physical deduction method.

[0034] Offline training process:

[0035] S1. Preparation before network training. The provided dataset is preprocessed by modulating it from the real domain to the complex domain, and simultaneously shuffling, batching, and normalizing it. The semantic source-channel encoder, channel space, receiver generator, loss function, and optimizer required for joint source-channel coding are designed, along with the network training process. Then, the network parameters are initialized. The overall architecture is as follows: Figure 1 As shown.

[0036] To meet the physical requirements of intelligent metasurfaces for offloading computation in mid-air, the semantic communication in this invention employs a complex-domain neural network, namely, complex convolutional layers, complex activation functions, and complex batch normalization, which are described in turn below:

[0037] Complex convolutional layer: To perform traditional real-valued 2D convolution in the complex domain, the complex filter matrix W = a + iB is convolved with the complex vector h = x + iy, where a and B are real matrices, and x and y are real vectors, because real-valued entities can be used to simulate complex number operations. This yields:

[0038] W*h=(A*xB*y)+i(B*x+A*y)

[0039] Represented using matrices:

[0040]

[0041] Complex activation function:

[0042] CReLU(z)=ReLU(R(z))+jReLU(I(z))

[0043] When both the real and imaginary parts are strictly positive or strictly negative, CReLU satisfies the Cauchy-Riemann equation.

[0044] Multi-batch standardization: This method can standardize complex arrays to a standard normal complex distribution to accelerate neural network learning.

[0045]

[0046] Where V is the covariance matrix:

[0047]

[0048] The network structures of the semantic source-channel encoders used are shown in Tables 1 and 2.

[0049]

[0050] Table 1 Semantic Encoder

[0051]

[0052]

[0053] Table 2 ResNet Block Structure

[0054] Where input represents the input layer; reflectionpadding refers to the reflection padding layer, with the padding size in parentheses; H×W×C ComplexConv,stride s refers to a complex convolutional layer with kernel size H×W, number of channels C, and stride s, followed by a complex instance normalization layer and a complex ReLU activation layer; the number of channels C in the last complex convolutional layer in the encoder structure refers to the number of coding channels, used to reflect the degree of compression; ResNet blocks refer to residual network blocks, the specific structure of which is shown in Table 2.

[0055] The network structure of the receiver generator used is shown in Table 3.

[0056]

[0057]

[0058] Table 3 Receiver Generator

[0059] ComplexConvTranspose2d represents complex enlargement, used to enlarge the size of complex patterns and improve resolution; ComplexSigmoid is the complex sigmoid activation function. The overall structure is basically symmetrical with the semantic encoder.

[0060] The loss function is as follows:

[0061]

[0062] Where X is the original image and Y is the image generated by the receiving end, this loss function calculates the MSE loss function for the real and imaginary parts of X and Y respectively, and λ1 and λ2 are the weights corresponding to the real and imaginary parts respectively. The optimizer used is the Adam algorithm optimizer. The optimizer selection is not unique and can be selected according to requirements.

[0063] This step primarily initializes the necessary components for training the joint source-channel deep neural network. The specific network structure can be built based on the performance of different hardware devices. The loss function and optimizer can be selected according to specific needs, with the key criterion being ensuring stable model training and convergence. Furthermore, the dataset can be selected based on different application scenarios to achieve better training results.

[0064] S2. Input the batched data into the training network built in step S1, and let it pass through the semantic source-channel encoder, channel space, and receiver generator in sequence. Calculate the corresponding loss function values ​​and update the network parameters using stochastic gradient descent and backpropagation.

[0065] In step S2, the current batch of data is first input into the semantic encoder, as shown in the following formula:

[0066] y = SE(x; θ1)

[0067] In the formula, SE(·) represents the encoder network, and θ1 represents the encoder network parameters.

[0068] Then, the current batch of data is channel normalized and enters the wireless channel space, propagating to the receiver generator, as shown in the following formula:

[0069] y = G(AWGN(q(x)); θ3)

[0070] In the formula, G(·) represents the encoder network, θ3 represents the encoder network parameters, AWGN represents the additive white Gaussian noise channel, and q(·) represents the channel normalization.

[0071] The forward propagation process can be represented as:

[0072] z = w T X+b

[0073] pass The MSE loss between the original image and the receiver is calculated. The gradient value of the loss function obtained in step S2 is backpropagated using the Adam optimization algorithm to update the trainable parameters in the encoder and decoder networks, respectively. The specific process of the Adam algorithm is as follows:

[0074] v k =β1v k-1 +(1-β1)g k

[0075]

[0076]

[0077]

[0078] θ k =θ k-1 -Δg k

[0079] Among them, g k Let v represent the stochastic gradient of the k-th batch of data. k Let s represent the momentum variable corresponding to the gradient of the k-th batch of data. kThe cumulative variable representing the squared gradient of the k-th batch of data is... and The momentum and cumulative variables after bias correction are represented by constants β1 and β2, which are hyperparameters of the gradient exponential weighted moving average and the gradient squared exponential weighted moving average, respectively. η is the learning rate of the optimizer, and the constant ε represents a minimum value added to prevent the denominator from being zero; it is usually 10. -8 .

[0080] Considering the complex field form of the network design, the generalized complex chain rule of the loss function is explained: Let L be the loss function, and z be a complex variable such that z = x + iy, where x, y ∈ R.

[0081]

[0082] Let the complex variable t = r + is, where z can be represented by t and r, and s ∈ R. Then the generalized complex chain rule can be written as:

[0083]

[0084] Backpropagation is performed according to the generalized multichain rule.

[0085] This step mainly involves the specific training of the joint source-channel deep neural network. The network structure built in step S1, the optimizer, learning rate, and preprocessed dataset are input into the joint source-channel deep neural network framework, and the semantic source-channel coder and receiver generator are trained alternately under the guidance of the optimization function.

[0086] S3. Update the network parameters sequentially using three different learning rates, and determine whether the loss function value has converged to the preset value. If it has, terminate the training early and save the corresponding model parameters for offline use by edge devices. Otherwise, repeat step S2.

[0087] In this step, the network parameters are updated sequentially using three different learning rates: a learning rate of 5×10⁻⁶ is used for the first 150 epochs. -4 The last 150 epochs used a learning rate of 5×10. -5 The last 100 epochs use a learning rate of 1×10⁻⁶. -5 Fine-tune. Determine whether convergence has occurred based on the loss value obtained in step S2.

[0088] Physical deduction process:

[0089] Step 1: The radio frequency signal generator generates a radio frequency carrier, which is then modulated and transmitted via a spatial modulator. The spatial modulator consists of a layer of RIS and control hardware (such as an FPGA) to adjust the behavior of each RIS unit. After receiving the radio frequency signal, the spatial modulator can process it.

[0090] In this step, the input to the control hardware is a source in the form of digital data. The control hardware then adjusts each RIS unit, thereby writing control signals into the RIS. The first stage maps the digital source into a wireless signal through the RIS modulator. When the radio frequency signal passes through the RIS layer, the amplitude and phase of the carrier wave in each RIS unit are determined by the product of the incident electric field and the complex-valued transmission coefficient of that unit. Therefore, the RIS modulator can implement QAM modulation and MIMO transmission, where the number of tunable units equals the number of data streams. Different types of sources, such as text, voice, and images, are converted into digital data, and the controller of the RIS modulator changes the response of the tunable units according to the input. If an image is transmitted, each pixel can be directly encoded into the phase of each tunable unit.

[0091] Step 2: The radio frequency signal then passes through the semantic encoder. After mapping the signal to the first layer of the semantic encoder, the smart metasurface directly processes the wireless signal rather than the digital data of traditional semantic communication, which corresponds to the semantic source-channel encoder in the offline training process.

[0092] This step focuses on the RIS semantic encoder module. This module consists of a series of RIS units and corresponding control hardware. Each RIS unit functions as a neuron, and each layer of the RIS corresponds to a complex neural network layer. The RIS can implement fully connected, complex convolutions, and activations, which meets the basic requirements of semantic communication. The semantic encoder and decoder modules should be jointly trained at the transceiver or by a centralized control center. Then, parameters are assigned to the individual control hardware units of the RIS, and the transmission coefficients are adjusted accordingly. This scheme directly performs semantic encoding on the radio frequency signal and uses the control hardware to change the structure of the RIS units, thereby adjusting the parameters of the complex neural network layers.

[0093] Step 3: After the radio frequency signal passes through the wireless channel space, it passes through the semantic decoder. The semantic decoder corresponds to the receiver generator in the offline training process. It decodes the transmitted signal from the parameter space to obtain the received signal, thus completing the communication process. The communication process can be described as follows: Figure 2 As shown.

[0094] This section focuses on the RIS semantic decoder module. The RIS semantic decoder module primarily functions as the receiver generator module during offline training, producing the decoded and decompressed generated image. Three candidate devices are considered for collecting electron waves after the RIS and maintaining spatial diversity of the signal: a probe, a horn antenna, and an antenna array. The choice of detection device depends on the application scenario and expected deployment costs. Due to the widespread use of antenna arrays in wireless networks, this paper also selects an antenna array to ensure real-time processing and adaptability to future wireless networks.

[0095] Example 1

[0096] Image semantic compression is performed according to the deep joint source-channel image compression coding method based on intelligent metasurfaces of the present invention, as follows:

[0097] (1) The first step is to determine the hyperparameters required for model training. The experimental dataset is the CIFA10 dataset, which contains 32×32, 3-channel color RGB images. The dataset contains 50,000 training images and 10,000 test images. Each batch of training images consists of 128 images. The channel encoder outputs 16 channels. The signal-to-noise ratio is randomly selected from 0 to 20 dB during training to enhance the generalization ability of the model. The image compression ratio is defined as follows: Where L is the feature size, C is the number of encoding channels, H and W are the image dimensions (length and width), and C is the number of encoding channels. RGB This represents the number of channels in an RGB image; for grayscale images, this value is 1. Different n / k values ​​are used to adjust the semantic compression level of the image. The optimizer uses the Adam algorithm, updating network parameters sequentially with three different learning rates. The gradient descent process employs the generalized complex chain rule. The loss function calculates the MSE loss for both the real and imaginary parts of the transfer vector. During the simulation, each batch of simulation images consists of 1 image; the source input multipath channels are 5; the signal-to-noise ratio (SNR) is incremented from 0 to 20 dB in integer increments during testing; and 10,000 test images are used.

[0098] (2) The second step is to start training. After the optimization function converges, the model and related trainable parameters are saved and stored in a communication device composed of a smart metasurface.

[0099] (3) The radio frequency signal generator transmits signals into the spatial modulator, semantic encoder, and AWGN channel to complete the communication transmitter process.

[0100] (4) The receiving end receives the signal, performs signal transformation from complex domain to real domain, and generates an image through a semantic decoder.

[0101] The hardware to be deployed in this embodiment is as follows: a DC power supply, a voltage regulator circuit (LM2596), a field-programmable gate array (FPGA) (ALTERAAX301), a USRP device (LW-N210), a low-noise amplifier (LNA), Tx and Rx antennas, an Ethernet switch, a server, and a smart metasurface transmission plate. The DC power supply is connected to the voltage regulator circuit, whose input voltage is approximately 6V. The voltage regulator circuit stabilizes the input voltage and reduces it to an output of 1.2V. The USRP device can convert baseband signals to RF signals and vice versa. The USRP consists of hardware including an RF modulation / demodulation circuit and a baseband processing unit, which can be controlled using GNU packages in Python. The low-noise amplifier (LNA) amplifies the transmitted and received RF signals by 15dB, as the RF signal strength attenuates significantly after passing through the smart metasurface, resulting in low SNR and reduced measurement accuracy. The Tx and Rx antennas are: the Tx antenna is a directional double-ridged horn antenna (LB-800) radiating microwaves at 5.4GHz, and the Rx antenna is a patch antenna. The polarization of both Tx and Rx antennas is linear and perpendicular to the ground. Ethernet Switch: The Ethernet switch connects the USRP and the server to form a local Ethernet network, where control signals and received baseband signals are exchanged. Server: The server uses GNU packages in Python to control the two USRPs, handling complex-to-real conversion, receiving semantically decoded image information and forming images, and can serve as the network layer decision layer for semantic communication tasks or implement various transfer learning tasks. The intelligent metasurface transmission plate manipulates the amplitude or phase of reflected or transmitted electromagnetic waves in real time under digital instructions from a field-programmable gate array (FPGA). This transmission plate can be multi-layered to enhance network extrapolation performance. Each layer of the transmission plate consists of 28×28 artificial neurons, each integrated with two amplifiers, providing 500 different levels of candidate phase amplitude sets under bias voltage control via the FPGA.

[0102] Example 2

[0103] Simulation verification was performed using the deep joint source-channel image compression coding method based on intelligent metasurfaces according to the present invention:

[0104] The simulation conditions are: image format is RGB, resolution is 32×32px, number of channels is 3, and the maximum pixel value range for each channel is 255, meaning the image storage overhead is 24 bits per pixel. Image quality metrics used are Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM). This example uses both complex amplitude modulation and complex phase modulation inputs, trains at different PSNRs, and employs different compression ratios, yielding the following results. In the figure below, JSCC refers to Joint Source-Channel Coding.

[0105] from Figure 3 It can be seen that both complex amplitude modulation and complex phase modulation exhibit good performance in the proposed deep joint source-channel image compression coding network, reaching 41.775dB and 42.253dB respectively at a test SNR of 20dB. When the modulation training SNR is 7dB or Random, the network also shows good performance for low test SNRs, demonstrating strong robustness.

[0106] from Figure 4 It can be seen that after adjusting the image compression ratio, complex amplitude modulation and complex phase modulation still exhibit good performance in the proposed deep joint source-channel image compression coding network. When the compression ratio is 1 / 6, they reach 37.723dB and 37.260dB respectively; when the compression ratio is 1 / 12, they reach 32.856dB and 32.458dB respectively.

[0107] from Figure 5 It can be seen that, when the metric is SSIM, both complex amplitude modulation and complex phase modulation exhibit good performance in the proposed deep joint source-channel image compression coding network, reaching 0.996 at a test SNR of 20dB. When the modulation training SNR is 7dB or Random, the network also shows good performance for low test SNRs, demonstrating strong robustness.

[0108] from Figure 6 It can be seen that, with the SSIM metric, complex amplitude modulation and complex phase modulation exhibit good performance in the proposed deep joint source-channel image compression coding network after adjusting the image compression ratio. When the compression ratio is 1 / 6, they reach 0.986 and 0.988 respectively; when the compression ratio is 1 / 12, they reach 0.965 and 0.961 respectively.

[0109] Therefore, the image semantic communication system based on intelligent metasurfaces described above has many advantages such as high computing speed, large system capacity, low computing power consumption, wide spectrum range, and real-time programmability, and has a bright future in future wireless networks.

[0110] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. An image semantic communication system based on a smart metasurface, characterized in that: Using deep based on smart metasurface A joint source-channel image compression coding method, specifically including an offline training process and a physical deduction process; The offline training process includes: S1 includes: designing the semantic source-channel encoder, channel space, receiver generator, loss function and optimizer required for the joint source-channel coding process, and then initializing their respective network parameters; S2. Input the batched data into the training network built in step S1, and let it pass through the semantic source channel encoder, channel space, and receiver generator in sequence. Calculate the corresponding loss function value, and train the semantic source channel encoder and receiver generator alternately under the guidance of the optimization function. Update the network parameters using stochastic gradient descent and backpropagation. S3. Update the network parameters sequentially using three different learning rates, and determine whether the loss function value has converged to the preset value. If yes, terminate the training early and save the corresponding model parameters for offline work of edge devices; otherwise, repeat step S2. The physical deduction process specifically includes: (1) The radio frequency signal generator generates a radio frequency carrier, and the modulation and MIMO transmission are realized through the spatial modulator. The spatial modulator consists of a layer of RIS and control hardware to adjust the behavior of each RIS unit. After receiving the radio frequency signal, the spatial modulator can perform signal processing on it. (2) The radio frequency signal enters the semantic encoder and the wireless channel to complete the communication transmission process; after mapping the signal to the first layer of the semantic encoder, the intelligent metasurface directly performs computational processing on the wireless signal; the semantic encoder corresponds to the semantic source channel encoder in the offline training process. (3) After passing through the wireless channel space, the radio frequency signal passes through the semantic decoder, which corresponds to the receiver generator in the offline training process. The semantic decoder decodes the transmitted signal from the parameter space to obtain the received signal, thereby completing the communication process.

2. The image semantic communication system based on a smart metasurface according to claim 1, characterized in that: In step S1, the provided dataset is preprocessed by modulating it from the real domain to the complex domain, and simultaneously shuffling, batching, and normalizing it; the semantic source-channel encoder, channel space, receiver generator, loss function, and optimizer required for the joint source-channel coding process are designed, and then the network parameters are initialized.