Photoelectric combined training method and semantic optical communication method and device for hyperspectral transmission
By employing a photoelectric joint training method and combining diffractive optical elements and a semantic optical communication model for end-to-end optimization, the problem of the disconnect between DOE design and communication coding in hyperspectral acquisition and transmission was solved. This enabled efficient and reliable spectral data transmission, improving the stability of reconstruction results and system performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional hyperspectral acquisition and transmission methods are difficult to meet the real-time and reliability requirements of satellite communication. DOE design and communication coding are disconnected, system structure is redundant, and error accumulation is serious.
We employ a joint optoelectronic training method for hyperspectral transmission, combining compressed acquisition with a semantic optical communication model for end-to-end optimization to achieve joint optoelectronic encoding and reconstruction. We also design a unified loss function to update the DOE height map and model parameters.
It improves the stability and physical rationality of hyperspectral image reconstruction, reduces system complexity and computational load, reduces error accumulation, and enhances the overall system performance.
Smart Images

Figure CN122244587A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of machine learning and optical communication technology, and in particular to a photoelectric joint training method, semantic optical communication method and device for hyperspectral transmission. Background Technology
[0002] Hyperspectral imaging technology can acquire spectral information of a target object along the wavelength dimension of light. This spectral information has high discriminative power and broad application prospects in many fields such as remote sensing, agriculture, and medical imaging. Especially in the field of satellite remote sensing, hyperspectral information has irreplaceable value for Earth observation, environmental monitoring, and resource exploration.
[0003] Due to hardware limitations, traditional hyperspectral acquisition often employs pushbroom methods, requiring multiple exposures and scans to obtain measurement results. This not only results in long imaging times but also generates a massive amount of raw data. In recent years, snapshot hyperspectral imaging methods based on diffractive optical elements (DOEs) have developed rapidly. This method can simultaneously capture the spatial and spectral information of a natural scene in a single exposure. Its basic principle is to capture compressed measurement data of spectral and spatial information modulated by the point spread function (PSF), and then reconstruct a three-dimensional hyperspectral cube using a reconstruction algorithm.
[0004] However, hyperspectral remote sensing data has a large amount of data. In satellite communication links, due to limitations such as payload capacity, satellite-to-ground link bandwidth, and adaptability to harsh space environments, traditional hyperspectral acquisition and transmission methods cannot meet the requirements of real-time performance and reliability. There is an urgent need for an efficient, reliable, and intelligent free-space optical transmission solution for hyperspectral data. Summary of the Invention
[0005] In view of this, embodiments of this application provide a photoelectric joint training method, a semantic optical communication method and device for hyperspectral transmission, in order to eliminate or improve one or more defects existing in the prior art.
[0006] The first aspect of this application provides a photoelectric joint training method for hyperspectral transmission, the method comprising: A pre-defined semantic optical communication system is trained based on hyperspectral training samples. In each iteration of training, the diffractive optical element in the system compresses and acquires the hyperspectral training samples to obtain a measurement image corresponding to the hyperspectral training samples. This measurement image is then transmitted to the semantic optical communication model in the system, allowing the semantic optical communication model to acquire the target bitstream of the measurement image and reconstruct a target hyperspectral image. A loss function is constructed based on the error between the target hyperspectral image and the hyperspectral training samples to jointly update the height map distribution parameters of the diffractive optical element and the parameters of the semantic optical communication model. If the semantic optical communication model converges, then the semantic optical communication model and the diffractive optical element updated in the last iteration are used together as a semantic optical communication system for hyperspectral transmission for output.
[0007] In some embodiments of this application, the semantic optical communication model includes: A semantic coding compression layer is used to extract and compress features from the input measurement image and output the original bitstream corresponding to the measurement image. The channel transmission layer is used to channel-encode the original bit stream and then transmit it through the channel, perform channel decoding on the encoded original bit stream to obtain the target bit stream, and output the target bit stream. A spectral decoding and reconstruction layer is used to receive the target bitstream and reconstruct the bitstream to obtain the target hyperspectral image.
[0008] In some embodiments of this application, the semantic coding compression layer includes: A multi-scale feature mapping unit is used to perform convolution operations on the input measurement image to extract feature information at different spatial scales in order to obtain a multi-scale fused feature map. The residual feature extraction unit is used to perform residual learning and downsampling on the multi-scale fused feature vector and output a semantic feature map. The channel transformation unit is used to adjust the number of channels of the semantic feature vector to obtain the encoded feature map; The weighted attention unit is used to weight the encoded feature map and output a weighted encoded feature map; A quantization unit is used to map the weighted encoded feature map to the original bit stream.
[0009] In some embodiments of this application, the weighted attention unit adopts a dual-branch structure, and the weighting of the encoded feature map includes: Based on the dual-branch structure, two different weights are generated corresponding to the encoded feature map; The encoded features are weighted according to the two weights; The weighted encoded feature map is obtained by fusion.
[0010] In some embodiments of this application, the quantization unit uses uniform noise instead of quantization operation.
[0011] In some embodiments of this application, the spectral decoding and reconstruction layer includes: A spatial spectral attention unit is used to weight the input target bitstream from both the spatial and spectral dimensions to obtain a weighted decoding feature map. The multi-residual convolutional reconstruction unit is used to upsample and learn residuals on the weighted decoded feature map and output the target hyperspectral image.
[0012] In some embodiments of this application, the parameters of the semantic optical communication model include: The parameters of the semantic coding compression layer and the parameters of the spectral decoding reconstruction layer.
[0013] A second aspect of this application provides a semantic optical communication method, the method comprising: The incident light emitted from the hyperspectral scene is input into the semantic optical communication model to obtain the target hyperspectral image; wherein, the quantization unit in the semantic optical communication model uses rounding operation and binarization to map the weighted encoded feature map into the original bit stream; the semantic optical communication model is obtained by the aforementioned optoelectronic joint training method for hyperspectral transmission.
[0014] A third aspect of this application provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the optoelectronic joint training method for hyperspectral transmission described in the first aspect above, and / or the semantic optical communication method.
[0015] A fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the optoelectronic joint training method for hyperspectral transmission described in the first aspect above, and / or the semantic optical communication method.
[0016] The fifth aspect of this application provides a computer program product comprising a computer program that, when executed by a processor, implements the optoelectronic joint training method for hyperspectral transmission described in the first aspect above, and / or the semantic optical communication method.
[0017] This application provides a photoelectric joint training method for hyperspectral transmission. The method includes: training a pre-defined semantic optical communication system based on hyperspectral training samples, such that in each iteration of training, the diffractive optical element in the semantic optical communication system compresses and acquires the hyperspectral training samples to obtain a measurement image corresponding to the hyperspectral training samples, and transmits the measurement image to the semantic optical communication model in the system, so that the semantic optical communication model acquires the target bitstream of the measurement image to reconstruct a target hyperspectral image; constructing a loss function based on the error between the target hyperspectral image and the hyperspectral training samples to jointly update the height map distribution parameters of the diffractive optical element and the parameters of the semantic optical communication model; if the semantic optical communication model converges, then the semantic optical communication model and the diffractive optical element updated in the last iteration are used together as a semantic optical communication system for hyperspectral transmission for output. By designing a unified design and end-to-end optoelectronic joint optimization framework that integrates diffractive optical element compressed acquisition, semantic compressed coding and spectral decoding reconstruction, this framework has the advantages of fast spectral acquisition time (single exposure), low computational resource consumption, adaptability to harsh channel environments and stable performance. It can improve the stability and physical rationality of reconstruction results, significantly reduce system complexity and computational load, and reduce processing latency, thereby effectively reducing error accumulation and improving the overall system performance.
[0018] Additional advantages, objectives, and features of this application will be set forth in part in the description which follows, and will in part become apparent to those skilled in the art upon review of the following description, or may be learned by practice of the application. The objectives and other advantages of this application can be realized and obtained by means of the structures specifically pointed out in the specification and drawings.
[0019] Those skilled in the art will understand that the purposes and advantages that can be achieved with this application are not limited to those specifically described above, and that the above and other purposes that this application can achieve will be more clearly understood from the following detailed description. Attached Figure Description
[0020] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, do not constitute a limitation thereof. The components in the drawings are not drawn to scale but are merely for illustrating the principles of this application. For ease of illustration and description of certain parts of this application, corresponding portions in the drawings may be enlarged, i.e., may appear larger relative to other components in an exemplary device actually manufactured according to this application. In the drawings: Figure 1 This is a flowchart illustrating a photoelectric joint training method for hyperspectral transmission according to an embodiment of this application.
[0021] Figure 2This is a flowchart illustrating the traditional solution in an application example of this application.
[0022] Figure 3 This is a flowchart illustrating a semantic optical communication system for hyperspectral transmission in an application example of this application.
[0023] Figure 4 This is a schematic diagram of the structure of a semantic optical communication system for hyperspectral transmission in an application example of this application.
[0024] Figure 5 This is a schematic diagram of the system network architecture in an application example of this application.
[0025] Figure 6 This is a schematic diagram of the architecture of the weighted attention module in an application example of this application.
[0026] Figure 7 This is a schematic diagram of the architecture of the decoding and reconstruction network in an application example of this application.
[0027] Figure 8 This is a schematic diagram of the experimental results for different compression ratios in an application example of this application.
[0028] Figure 9 This is a schematic diagram of the experimental results for the red band in an application example of this application.
[0029] Figure 10 This is a schematic diagram of the experimental results for the green band in an application example of this application.
[0030] Figure 11 This is a schematic diagram of the experimental results for the blue band in an application example of this application. Detailed Implementation
[0031] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and their descriptions are used to explain this application, but are not intended to limit it.
[0032] It should also be noted that, in order to avoid obscuring this application with unnecessary details, only the structures and / or processing steps closely related to the solution according to this application are shown in the accompanying drawings, while other details that are not closely related to this application are omitted.
[0033] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.
[0034] It should also be noted that, unless otherwise specified, the term "connection" in this article can refer not only to a direct connection, but also to an indirect connection involving an intermediary.
[0035] In the following description, embodiments of the present application will be illustrated with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar parts, or the same or similar steps.
[0036] Existing DOE-based snapshot hyperspectral imaging technology primarily targets the rapid acquisition and imaging of hyperspectral information. In communication applications, especially in scenarios like satellite remote sensing, due to the large volume of hyperspectral data and limited communication link bandwidth, the measurement images acquired by DOE still require further source coding compression to meet transmission requirements. At the receiving end, channel decoding and source decoding are sequentially performed to reconstruct the measurement map, and finally, a separate spectral reconstruction network is used to recover the hyperspectral data. However, existing technologies suffer from the following problems: DOE design and communication coding processes are independent, lacking joint optimization; traditional image coding algorithms do not consider DOE modulation characteristics and hyperspectral semantic features, easily leading to irreversible loss of key information in the measurement map; the receiving end employs a multi-stage processing flow of "channel decoding-source decoding-spectral reconstruction," resulting in a complex system with errors accumulating at each stage; and DOE height map design only considers imaging error as an optimization objective, without taking into account communication compression constraints.
[0037] Currently, traditional communication systems typically employ a separate processing model for transmitting measurement images acquired through hyperspectral compression. The transmitting end first performs source coding on the measurement image to compress the data, using a standard like JPEG2000 (Joint Photographic Experts Group 2000), and then performs channel coding to add redundancy to combat channel errors, such as using Low-Density Parity-Check (LDPC) codes. The receiving end first performs channel decoding to recover the symbols of the measured image, then performs source decoding to obtain the transmitted measurement image, and finally uses a spectral reconstruction network, such as a Residual U-shaped Network (ResUnet), to reconstruct the hyperspectral data.
[0038] Therefore, in order to solve the problems of fragmented design and communication coding of diffractive optical elements, redundant system structure, and serious error accumulation in the prior art, the embodiments of this application provide a photoelectric joint training method, a semantic optical communication method, an electronic device, a computer-readable storage medium, and a computer program product for hyperspectral transmission, respectively, so as to improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance.
[0039] The following examples will provide a detailed description.
[0040] Based on this, embodiments of this application provide a method for optoelectronic joint training oriented to hyperspectral transmission that can be executed by an optoelectronic joint training device oriented to hyperspectral transmission. See [link to relevant documentation]. Figure 1 The method specifically includes the following: Step 100: Train a pre-defined semantic optical communication system based on hyperspectral training samples. In each iteration of training, the diffractive optical element in the system compresses and acquires the hyperspectral training samples to obtain a measurement image corresponding to the hyperspectral training samples. The measurement image is then transmitted to the semantic optical communication model in the system, allowing the semantic optical communication model to acquire the target bitstream of the measurement image and reconstruct the target hyperspectral image. A loss function is constructed based on the error between the target hyperspectral image and the hyperspectral training samples to jointly update the height map distribution parameters of the diffractive optical element and the parameters of the semantic optical communication model.
[0041] Specifically, the semantic optical communication system consists of DOE compressed acquisition and a semantic optical communication model. The system includes an acquisition end, an encoding and transmission end, and a receiving end. At the acquisition end, the system employs DOE-based hyperspectral imaging technology. Diffractive optical elements use a learnable height map to perform phase modulation on simulated incident light, achieving compressed acquisition of the hyperspectral scene. This is combined with a color camera to acquire encoded measurement images. The measurement image can be H×W×3 in size, fusing key information from the original hyperspectral data in both spatial and spectral dimensions with a limited number of channels. The semantic optical communication model reconstructs the target hyperspectral image based on the measurement image. The hyperspectral training samples include the original hyperspectral data corresponding to the incident light emitted from the hyperspectral scene. The simulated incident light is obtained through simulation based on the hyperspectral training samples. H×W represents the image resolution, i.e., the number of pixels, where H represents the image height (the number of pixels in the vertical direction) and W represents the image width (the number of pixels in the horizontal direction).
[0042] Understandably, a unified reconstruction loss function is constructed based on the hyperspectral reconstruction error and is used simultaneously to update the DOE height map distribution parameters and the parameters of the semantic optical communication model. The loss function can be the mean absolute error (MAE), which calculates the average absolute error between the hyperspectral data reconstructed by the model and the ground truth hyperspectral values (i.e., the training samples).
[0043] Step 200: If the semantic optical communication model converges, then the semantic optical communication model and the diffractive optical element updated in the last iteration are used together as a semantic optical communication system for hyperspectral transmission for output.
[0044] Understandably, the phase modulation structure of the DOE can perceive the impact of subsequent compression processes on information distribution during training, the encoding network can take into account the spectral reconstruction requirements when learning the compressed representation, and the decoding and reconstruction network adjusts collaboratively under overall constraints. The three modules influence, constrain, and guide each other under the same optimization objective, forming a unified design framework across the physical and information layers, realizing true optoelectronic joint encoding and joint reconstruction.
[0045] As can be seen from the above description, the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment is a unified design and end-to-end optoelectronic joint optimization framework that integrates diffractive optical element compressed acquisition, semantic compressed coding and spectral decoding reconstruction. This framework has the advantages of fast spectral acquisition time (single exposure), low computational resource consumption, adaptability to harsh channel environments and stable performance. It can improve the stability and physical rationality of reconstruction results, significantly reduce system complexity and computational load, and reduce processing latency, thereby effectively reducing error accumulation and improving the overall system performance.
[0046] To further improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance, in the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment, the semantic optical communication model in step 100 specifically includes the following: Step 110: Semantic coding compression layer, used to extract and compress features from the input measurement image, and output the original bitstream corresponding to the measurement image; Step 120: Channel transmission layer, used for channel encoding of the original bit stream and channel transmission, channel decoding of the encoded original bit stream to obtain the target bit stream, and outputting the target bit stream; Step 130: Spectral decoding and reconstruction layer, used to receive the target bit stream and reconstruct the bit stream to obtain the target hyperspectral image.
[0047] Specifically, the semantic optical communication model includes an encoding and transmission end (i.e., a semantic coding compression layer and a channel transmission layer) and a receiving end (i.e., a spectral decoding and reconstruction layer). The acquired measurement images are input into the semantic coding network (i.e., the semantic coding compression layer) for feature extraction and compression, and the output is a bit stream for communication transmission. At the receiving end, the bit stream is directly input into the semantic decoding and reconstruction network (i.e., the spectral decoding and reconstruction layer) after channel decoding, without the need to explicitly reconstruct the DOE measurement images.
[0048] It is understandable that the fusion processing flow of channel decoding and hyperspectral reconstruction based on the above content avoids the multi-stage processing mode of channel decoding, source decoding (measurement map recovery) and spectral reconstruction in traditional schemes, thereby effectively reducing error accumulation and improving the overall system performance.
[0049] To further improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance, in the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment, the semantic coding compression layer in step 110 specifically includes the following: Step 111: Multi-scale feature mapping unit, used to perform convolution operation on the input measurement image to extract feature information of different spatial scales to obtain multi-scale fused feature map; Step 112: Residual feature extraction unit, used to perform residual learning and downsampling on the multi-scale fused feature vector, and output semantic feature map; Step 113: Channel transformation unit, used to adjust the number of channels of the semantic feature vector to obtain the encoded feature map; Step 114: Weighted attention unit, used to weight the encoded feature map and output a weighted encoded feature map; Step 115: Quantization unit, used to map the weighted encoded feature map to the original bit stream.
[0050] Specifically, the semantic coding compression layer includes a multi-scale feature mapping unit, a residual feature extraction unit, a channel transformation unit, a weighted attention unit, and a quantization unit.
[0051] The multi-scale feature mapping unit performs parallel convolution operations on the input measurement image using convolution kernels of different sizes to extract feature information at different spatial scales. Specifically, four types of convolution kernels—3×3, 5×5, 7×7, and 9×9—can be used to convolve the input image, with 3 input channels and 8 output channels. To ensure that the spatial dimensions of the feature maps remain consistent before and after convolution, the convolution stride can be uniformly set to 1, and the padding size can be set to 1, 2, 3, and 4 respectively, thus obtaining four feature vectors of size H×W×8. After concatenating these features along the channel dimension, a multi-scale fused feature of size H×W×32 can be obtained. This multi-scale design is used to enhance the representation capability of key semantic features at multiple scales.
[0052] Secondly, the multi-scale fused features are then input into a residual feature extraction unit. This residual feature extraction unit can consist of three cascaded residual units, each containing three layers of two-dimensional convolutions. The first two convolutions do not change the feature size, while the third convolution doubles the number of feature channels and downsamples the spatial size of the features, reducing the height and width of the features to half their original values. By introducing skip connections, the unit input features are added element-wise with the intermediate features before downsampling to enhance the information fusion capability between shallow and deep features, resulting in a final output semantic feature representation with dimensions H / 8 × W / 8 × 256.
[0053] To achieve variable compression ratio at the encoding end, a channel transformation unit is introduced after the residual feature extraction unit. This unit can adjust the number of channels from 256 to the desired N using learnable feature mapping, thereby transforming the feature representation into an encoded feature of size H / 8×W / 8×N. Here, N corresponds to different compression ratio settings, enabling the system to adapt to different communication bandwidth constraints.
[0054] Building upon this, a weighted attention unit is introduced to enhance the modeling capability for important semantic information. This unit simulates a selective attention mechanism for key information. This process does not change the size of the features, but only redistributes the importance of each feature element, thereby further improving the semantic compression efficiency.
[0055] Finally, the quantization unit processes the encoded features, mapping continuous value features to integers through rounding operations, and can also use binarization to map integers to bit streams for subsequent channel coding and communication transmission.
[0056] To further improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance, in the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment, in step 114, the weighted attention unit adopts a dual-branch structure, and the weighting of the encoded feature map specifically includes the following: Step 010: Generate two different weights corresponding to the encoded feature map based on the dual-branch structure; Step 020: Weight the encoded features according to the two weights; Step 030: The weighted encoded feature map is obtained by fusion.
[0057] Specifically, two branches of the network can be used to process the input features, generating weights W1 and W2 for the corresponding feature elements respectively. The input features and weights are then fused element by element to obtain a weighted encoded feature map.
[0058] In order to further improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance, in the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment, in step 115, the quantization unit uses uniform noise to replace the quantization operation.
[0059] Understandably, quantization is required during testing, but quantization is a non-differentiable function and cannot be backpropagated. Therefore, uniform noise is used instead of quantization during model training because quantization noise can be approximated as uniform noise.
[0060] To further improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance, in the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment, the spectral decoding and reconstruction layer in step 130 specifically includes the following: Step 131: Spatial spectral attention unit, used to weight the input target bitstream from the spatial dimension and the spectral dimension respectively to obtain a weighted decoding feature map; Step 132: A multi-residual convolutional reconstruction unit is used to upsample and learn residuals on the weighted decoded feature map and output the target hyperspectral image.
[0061] Specifically, the spectral decoding and reconstruction layer mainly includes a spatial spectral attention unit and a multi-residual convolutional reconstruction unit. The spatial spectral attention unit performs weighted analysis on the decoded features from both the spatial and spectral dimensions to enhance spatial structure consistency and spectral continuity. Subsequently, a convolutional network with multiple residual connections (i.e., a multi-residual convolutional reconstruction unit) can be introduced to recover high-resolution features step by step, achieving end-to-end reconstruction output of hyperspectral data. The number of spectral channels is 31, and the reconstructed hyperspectral image has a size of H×W×31.
[0062] To further improve the stability and physical rationality of the reconstruction results, thereby effectively reducing error accumulation and improving the overall system performance, in the optoelectronic joint training method for hyperspectral transmission provided in this application embodiment, the parameters of the semantic optical communication model in step 100 specifically include the following: The parameters of the semantic coding compression layer and the parameters of the spectral decoding reconstruction layer.
[0063] To further illustrate the above embodiments, this application also provides a specific application example of a photoelectric joint training method for hyperspectral transmission.
[0064] Currently, DOE-based snapshot hyperspectral imaging technology primarily focuses on the rapid acquisition and reconstruction of hyperspectral information. Its core objective is to achieve compressed measurement of spectral information through phase modulation at the optical level and to recover hyperspectral data as accurately as possible at the reconstruction end. Existing compression coding technologies, on the other hand, mainly target the compression and reconstruction of three-channel RGB natural images (RGB colormode), optimizing visual perception quality or pixel-level reconstruction errors. These two types of technologies are independent in their design goals, modeling methods, and optimization paths. Against this backdrop, achieving an integrated design for the acquisition, compression, and reconstruction of hyperspectral information cannot be accomplished simply by sequentially combining "DOE imaging technology" and "traditional source coding technology." First, existing common source coding schemes such as JPEG2000 employ non-differentiable operations such as discrete wavelet transform, quantization, and entropy coding. Their coding process involves numerous discrete and discontinuous mapping steps, making it impossible to embed them into a deep learning-based DOE spectral acquisition and imaging framework for end-to-end backpropagation optimization. This means that once a traditional compression module is introduced after the DOE measurement map, the gradient of the final hyperspectral reconstruction error cannot be fed back to the DOE height map parameters. Consequently, the optical modulation structure cannot perceive the information loss caused by communication compression, and there is no optimal coupling between optical design and communication coding, preventing the overall system from achieving global optimum. Secondly, current spectral reconstruction networks are designed with a "complete measurement image" as input, assuming the measurement map has been accurately recovered. If simply combined with a communication system, the receiver must sequentially perform channel decoding and source decoding to recover the DOE measurement image, and then input the recovered measurement map into a separate spectral reconstruction network to generate hyperspectral data. In this process, measurement map recovery, as an intermediate step, inevitably introduces additional errors and computational overhead. This intermediate result is not the final objective but merely serves as transitional data, leading to system redundancy. Simultaneously, the distortion from the compression stage is already solidified at the measurement map level, making it difficult for subsequent spectral reconstruction networks to compensate for irreversible information loss, resulting in a progressively accumulating error effect. Therefore, existing hyperspectral acquisition technology and traditional compression coding technology are inherently mismatched in their architecture; a simple combination of the two cannot achieve a unified optimization goal or improve overall performance.
[0065] To address the aforementioned issues, this application innovatively integrates the height map distribution parameters of the diffractive optical element (DOE), semantic encoding compression of the measurement map, and spectral decoding reconstruction into a unified system, achieving end-to-end optoelectronic joint optimization. This application simultaneously considers spectral reconstruction quality and measurement map compression efficiency during the system design phase, treating optical compression and communication compression as different parts of the same optimization problem.
[0066] The optoelectronic joint training method for hyperspectral transmission provided in this application example specifically includes the following: Semantic communication technology, which has emerged in recent years, has shown great potential, especially in resource-constrained fields such as satellite communication, and has advantages over traditional source-channel separation coding schemes. Inspired by machine learning and semantic communication technologies, this application proposes a Semantic Optical Communication System for Visible Hyperspectral Transmission (SOC-VHT) to overcome the limitations of the aforementioned traditional schemes. This system, tailored to the characteristics of hyperspectral data, integrates hyperspectral compression acquisition based on diffraction coding in the optical domain and hyperspectral coding reconstruction based on deep learning in the electrical domain, achieving optoelectronic joint optimization. The process of traditional schemes is as follows... Figure 2 As shown, the process of the SOC-VHT system is as follows: Figure 3 As shown.
[0067] The DOE described in this application example achieves phase modulation of incident light through different height maps. Different phase modulations can form different point spread functions on the CMOS (Complementary Metal-Oxide-Semiconductor) sensor, and image compression is achieved through convolution of PSF and spectral information.
[0068] The system utilizes neural networks to achieve efficient compression encoding of hyperspectral measurement maps, and at the decoding end, directly reconstructs the hyperspectral data from the channel-decoded symbols, realizing an integrated decoding and reconstruction design. The overall system consists of three parts: DOE compression acquisition, semantic encoding compression, and spectral decoding and reconstruction. Figure 4 As shown, the DOE modulates the phase of the incident light using a learnable height map, achieving compressed acquisition of the hyperspectral scene and forming a measurement map. Semantic coding compression extracts and compresses the measurement map, quantizes it, and outputs a bitstream. The bitstream is then directly reconstructed into a hyperspectral image through spectral decoding and reconstruction. Unlike traditional frameworks where modules are optimized independently, this application uses a unified reconstruction loss function, determined by the final hyperspectral reconstruction error, to update the DOE height map parameters, encoding network parameters, and decoding and reconstruction network parameters. Thus, the DOE's phase modulation structure can perceive the impact of subsequent compression on information distribution during training, the encoding network can consider spectral reconstruction requirements when learning the compressed representation, and the decoding and reconstruction network adjusts collaboratively under overall constraints. The three modules influence, constrain, and guide each other under the same optimization objective, forming a unified design framework across the physical and information layers, achieving true optoelectronic joint coding and joint reconstruction. This technical path of directly incorporating optical device parameters into the communication system's loss function for joint optimization does not exist in existing technologies and cannot be derived from simple combinations.
[0069] Furthermore, considering the multispectral channel characteristics of hyperspectral images, which differ from traditional RGB images, this application example designs an attention mechanism oriented towards hyperspectral features in the encoding and decoding network. Hyperspectral data exhibits strong inter-band correlation and continuity in the spectral dimension, and its physical meaning is reflected in the overall shape of the material reflectance curve, rather than a single pixel value. Existing compression networks mainly focus on spatial structure modeling, making it difficult to effectively characterize the coupling relationships in the spectral dimension. This application example sets up a weighted attention module at the encoding end to adaptively weight the features of the measurement map, prioritizing the retention of semantic information that plays a key role in spectral reconstruction; in the decoding and reconstruction stage, a spatial-spectral collaborative attention module is introduced, simultaneously modeling spatial regions and spectral channels, enhancing structural consistency in the spatial dimension and characterizing inter-band correlation and continuity in the spectral dimension, thereby improving the stability and physical rationality of the reconstruction results. This attention design oriented towards spectral characteristics is not a simple replacement of the existing spatial attention structure, but a structural improvement specifically for the characteristics of hyperspectral data. Working together with the unified optoelectronic joint optimization framework, it further improves the overall performance of the system.
[0070] Furthermore, the application example of this application implements direct decoding and reconstruction of the hyperspectral image from the bitstream at the receiving end, eliminating the explicit measurement map recovery step. The system no longer performs the multi-stage processing flow of channel decoding, source decoding (measuring map recovery), and spectral reconstruction. Instead, it directly recovers the hyperspectral image from the channel-decoded bitstream through a decoding-reconstruction fusion network. This design eliminates the intermediate measurement map layer, preventing errors from solidifying at the measurement map level and propagating backward, significantly reducing system complexity and computational load, while also reducing processing latency.
[0071] The overall architecture of the system proposed in the application example of this application is as follows: Figure 5 As shown, the system includes a data acquisition end, an encoding and transmission end, and a receiving end. The architecture of the weighted attention module is as follows: Figure 6 As shown, the decoding and reconstruction network architecture is as follows: Figure 7 As shown.
[0072] To verify the effectiveness of the SOC-VHT system proposed in this invention, its performance was compared with that of a traditional hyperspectral transmission scheme. The traditional scheme also uses DOE to acquire measurement images, but it uses JPEG2000 to compress the measurement images during the encoding stage. After decoding, the measurement images are first restored, and then hyperspectral data is generated through the ResUNet spectral reconstruction network. Experimental results are as follows: Figure 8 , Figure 9 , Figure 10 and Figure 11 As shown. Figure 8The comparison results were evaluated using the Structural Similarity Index Measure (SSIM) of the reconstructed spectra at different compression ratios. Figure 9 , Figure 10 and Figure 11 Comparison of the recovered spectral curves for the red, green, and blue bands under compression ratios of 54 and 154 are presented. Experimental results show that, under different compression ratios, the proposed SOC-VHT system significantly outperforms traditional schemes in terms of spatial structure, demonstrating the significant advantages of semantic coding mechanisms in hyperspectral information compression and reconstruction.
[0073] Based on the above embodiments and / or application examples of the optoelectronic joint training method for hyperspectral transmission, this application also provides an embodiment of a semantic optical communication method, which specifically includes the following: The incident light emitted from the hyperspectral scene is input into the semantic optical communication model to obtain the target hyperspectral image; wherein, the quantization unit in the semantic optical communication model uses rounding operation and binarization to map the weighted encoded feature map into the original bit stream; the semantic optical communication model is obtained by the aforementioned optoelectronic joint training method for hyperspectral transmission.
[0074] This application also provides an electronic device, which may include a processor, a memory, a receiver, and a transmitter. The processor is used to execute the optoelectronic joint training method for hyperspectral transmission and / or the semantic optical communication method mentioned in the above embodiments. The processor and the memory can be connected via a bus or other means, taking a bus connection as an example. The receiver can be connected to the processor and the memory via wired or wireless means.
[0075] The processor can be a central processing unit (CPU). The processor can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above types of chips.
[0076] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions / modules corresponding to the optoelectronic joint training method for hyperspectral transmission and / or the semantic optical communication method in the embodiments of this application. The processor executes various functional applications and data processing by running the non-transitory software programs, instructions, and modules stored in the memory, thereby implementing the optoelectronic joint training method for hyperspectral transmission and / or the semantic optical communication method in the above method embodiments.
[0077] The memory may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created by the processor, etc. Furthermore, the memory may include high-speed random access memory and non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may optionally include memory remotely located relative to the processor, which can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0078] The one or more modules are stored in the memory, and when executed by the processor, they execute the optoelectronic joint training method for hyperspectral transmission and / or the semantic optical communication method in the embodiment.
[0079] In some embodiments of this application, the user equipment may include a processor, a memory, and a transceiver unit. The transceiver unit may include a receiver and a transmitter. The processor, memory, receiver, and transmitter may be connected via a bus system. The memory is used to store computer instructions, and the processor is used to execute the computer instructions stored in the memory to control the transceiver unit to send and receive signals.
[0080] As one implementation method, the functions of the receiver and transmitter in this application can be implemented by transceiver circuits or dedicated transceiver chips, and the processor can be implemented by dedicated processing chips, processing circuits or general-purpose chips.
[0081] As another implementation approach, the server provided in this application embodiment can be implemented using a general-purpose computer. That is, the program code implementing the processor, receiver, and transmitter functions is stored in memory, and the general-purpose processor implements the processor, receiver, and transmitter functions by executing the code in memory.
[0082] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the aforementioned optoelectronic joint training method for hyperspectral transmission and / or the semantic optical communication method. The computer-readable storage medium may be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register, floppy disk, hard disk, removable storage disk, CD-ROM, or any other form of storage medium known in the art.
[0083] This application also provides a computer program product, specifically comprising a computer program that, when executed by a processor, implements the steps of the optoelectronic joint training method for hyperspectral transmission and / or the semantic optical communication method mentioned in the foregoing embodiments.
[0084] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. The programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave.
[0085] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.
[0086] In this application, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.
[0087] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to the embodiments of this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A photoelectric joint training method for hyperspectral transmission, characterized in that, The method includes: A pre-defined semantic optical communication system is trained based on hyperspectral training samples. In each iteration of training, the diffractive optical element in the system compresses and acquires the hyperspectral training samples to obtain a measurement image corresponding to the hyperspectral training samples. This measurement image is then transmitted to the semantic optical communication model in the system, allowing the semantic optical communication model to acquire the target bitstream of the measurement image and reconstruct a target hyperspectral image. A loss function is constructed based on the error between the target hyperspectral image and the hyperspectral training samples to jointly update the height map distribution parameters of the diffractive optical element and the parameters of the semantic optical communication model. If the semantic optical communication model converges, then the semantic optical communication model and the diffractive optical element updated in the last iteration are used together as a semantic optical communication system for hyperspectral transmission for output.
2. The optoelectronic joint training method for hyperspectral transmission according to claim 1, characterized in that, The semantic optical communication model includes: A semantic coding compression layer is used to extract and compress features from the input measurement image and output the original bitstream corresponding to the measurement image. The channel transmission layer is used to channel-encode the original bit stream and then transmit it through the channel, perform channel decoding on the encoded original bit stream to obtain the target bit stream, and output the target bit stream. A spectral decoding and reconstruction layer is used to receive the target bitstream and reconstruct the bitstream to obtain the target hyperspectral image.
3. The optoelectronic joint training method for hyperspectral transmission according to claim 2, characterized in that, The semantic encoding compression layer includes: A multi-scale feature mapping unit is used to perform convolution operations on the input measurement image to extract feature information at different spatial scales in order to obtain a multi-scale fused feature map. The residual feature extraction unit is used to perform residual learning and downsampling on the multi-scale fused feature vector and output a semantic feature map. The channel transformation unit is used to adjust the number of channels of the semantic feature vector to obtain the encoded feature map; A weighted attention unit is used to weight the encoded feature map and output a weighted encoded feature map; A quantization unit is used to map the weighted encoded feature map to the original bit stream.
4. The optoelectronic joint training method for hyperspectral transmission according to claim 3, characterized in that, The weighted attention unit adopts a dual-branch structure, and the weighting of the encoded feature map includes: Based on the dual-branch structure, two different weights are generated corresponding to the encoded feature map; The encoded features are weighted according to the two weights; The weighted encoded feature map is obtained by fusion.
5. The optoelectronic joint training method for hyperspectral transmission according to claim 3, characterized in that, The quantization unit uses uniform noise instead of quantization operation.
6. The optoelectronic joint training method for hyperspectral transmission according to claim 2, characterized in that, The spectral decoding and reconstruction layer includes: A spatial spectral attention unit is used to weight the input target bitstream from both the spatial and spectral dimensions to obtain a weighted decoding feature map. The multi-residual convolutional reconstruction unit is used to upsample and learn residuals on the weighted decoded feature map and output the target hyperspectral image.
7. The optoelectronic joint training method for hyperspectral transmission according to claim 1 or 2, characterized in that, The parameters of the semantic optical communication model include: The parameters of the semantic coding compression layer and the parameters of the spectral decoding reconstruction layer.
8. A semantic optical communication method, characterized in that, include: The incident light emitted from the hyperspectral scene is input into the semantic optical communication model to obtain the target hyperspectral image; wherein, the quantization unit in the semantic optical communication model uses rounding operation and binarization to map the weighted encoded feature map into the original bit stream; the semantic optical communication model is obtained by the optoelectronic joint training method for hyperspectral transmission as described in any one of claims 1 to 7.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the optoelectronic joint training method for hyperspectral transmission as described in any one of claims 1 to 7, and / or implements the semantic optical communication method as described in claim 8.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the optoelectronic joint training method for hyperspectral transmission as described in any one of claims 1 to 7, and / or the semantic optical communication method as described in claim 8.