Training method and device of base station model for channel compression feedback, computer device and storage medium
By designing a base model oriented to channel compression feedback, utilizing expert hybrid coding and decoding modules, and combining gating networks and transformation modules, the problems of high feedback overhead and low accuracy in traditional channel compression feedback schemes are solved, and efficient channel state information reconstruction in multi-user scenarios is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PEKING UNIV
- Filing Date
- 2025-07-10
- Publication Date
- 2026-06-12
Smart Images

Figure CN120750385B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of wireless communication technology, and in particular to a training method, apparatus, computer device, computer-readable storage medium, and computer program product for a base model oriented towards channel compression feedback. Background Technology
[0002] Channel compression and feedback is a key means of obtaining downlink channel state information (CSI) in multiple-input multiple-output (MIMO) and orthogonal frequency-division multiplexing (OFDM) systems.
[0003] In traditional technologies, channel compression feedback is typically achieved through vector quantization or codebook schemes. However, with the continuous development of technology, the number of antennas in wireless communication systems is also increasing, leading to a corresponding increase in feedback overhead in traditional schemes, which becomes intolerable for the system. Compressive sensing-based schemes can fully utilize the sparsity of the channel to achieve efficient compression, but this significantly reduces feedback accuracy and introduces practical deployment problems. Deep learning technology, due to its powerful nonlinear modeling capabilities, can reduce feedback overhead to some extent in channel compression feedback tasks, but it still has limitations. With the further development of deep learning technology, foundation models have emerged. Foundation models have achieved great success in fields such as natural language processing; that is, neural networks with large-scale parameters, after self-supervised pre-training on large-scale datasets, can exhibit excellent few-shot learning capabilities in downstream tasks. How to obtain large foundation models applicable to various tasks such as channel compression feedback is an urgent technical problem to be solved. Summary of the Invention
[0004] Based on this, it is necessary to provide a training method, apparatus, computer device, computer-readable storage medium, and computer program product for a channel compression feedback-oriented base model that can flexibly handle various heterogeneous configurations, addressing the aforementioned technical problems.
[0005] In a first aspect, this application provides a training method for a base model oriented to channel compression feedback, the base model including a base station and at least one user terminal, the user terminal including an encoder and a transform module; the base station including an inverse transform module and a decoder; the encoder including at least one expert hybrid coding module; the decoder including at least one expert shared decoding module; the method includes:
[0006] Obtain sample channel state information from at least one user terminal, wherein the sample channel state information includes target dimension information, and the target dimension includes at least two of the following: time dimension, spatial dimension, and frequency dimension.
[0007] The target token corresponding to the sample channel state information of the user terminal is obtained, and the target token is encoded by the expert model and the gating network in each of the expert hybrid coding modules corresponding to the user terminal to obtain a first output token; and the first output token is transformed by the transformation module to obtain target bitstream data.
[0008] The inverse transformation module at the base station performs inverse transformation on the target bitstream data corresponding to each user terminal to obtain the second output token.
[0009] The second output token is processed by at least one expert-shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each user terminal, and the base model is trained based on the reconstructed channel state information to obtain the trained base model.
[0010] In one embodiment, the encoder further includes a first position encoding module; the transformation module includes a compression layer and a quantization encoding layer; the process of encoding the target token using expert models and gating networks in the expert hybrid encoding modules corresponding to the user terminal to obtain a first output token; and transforming the first output token using the transformation module to obtain target bitstream data, includes:
[0011] The target token is position-encoded by the first position encoding module to obtain a first position-encoded token;
[0012] The first position encoded token is encoded by the gating network in each of the expert hybrid coding modules connected end to end, as well as multiple target expert models, to obtain the first output token.
[0013] The first output token is compressed using the compression layer to obtain a compressed feature vector;
[0014] The compressed feature vector is quantized using the quantization coding layer and the quantization bit width to obtain the target bitstream data corresponding to the user terminal.
[0015] In one embodiment, the expert hybrid coding module includes a preprocessing layer, an expert processing layer, and a normalization layer. The expert processing layer includes multiple expert models and multiple gating networks. The process of encoding the first position-coded token using the gating networks and multiple target expert models in the interconnected expert hybrid coding modules to obtain the first output token includes:
[0016] For the i-th expert hybrid coding module, the input token of the i-th expert hybrid coding module is processed by attention mechanism and normalization through the pre-processing layer in the i-th expert hybrid coding module to obtain the first token;
[0017] Based on the gating network in the i-th expert hybrid coding module, the model weight corresponding to the first token is determined, and based on the model weight, an expert activation model is selected from the expert models included in the i-th expert hybrid coding module.
[0018] The first token is encoded by each of the expert activation models to obtain the expert output result corresponding to each expert activation model; and the expert output result of each expert activation model and the model weight corresponding to each expert activation model are used to obtain the output token of the i-th expert hybrid encoding module.
[0019] The output token of the expert hybrid coding module located at the end position is determined as the first output token; the input token of the expert hybrid coding module located at the beginning position is the first position coding token, and the input token of the i-th expert hybrid coding module is the output token of the (i-1)-th expert hybrid coding module.
[0020] In one embodiment, the inverse transformation module includes a splicing layer, a dequantization layer, and an upsampling layer; the step of performing inverse transformation processing on the target bitstream data corresponding to each user terminal through the inverse transformation module at the base station to obtain a second output token includes:
[0021] For each user terminal, the splicing layer in the inverse transformation module of the base station splices the received bit streams fed back by the user terminal based on the bit stream receiving order to obtain the spliced bit stream corresponding to the user terminal.
[0022] Through the dequantization layer, the concatenated bitstream corresponding to each user terminal is subjected to inverse type transformation to obtain the feature map of each user terminal.
[0023] The feature maps of each user terminal are input in parallel to the upsampling module to obtain the sampling tokens corresponding to each user terminal. The sampling tokens are then concatenated to obtain the second output token.
[0024] In one embodiment, the decoder further includes a second position encoding module; the step of processing the second output token through at least one expert-shared decoding module included in the decoder to obtain reconstructed channel state information corresponding to each of the user terminals includes:
[0025] The second output token is position-encoded by the second position encoding module to obtain a second position-encoded token;
[0026] The second position-coded token is decoded by the gating network, at least one shared expert model, and multiple expert activation models in the expert-shared decoding modules connected end to end, to obtain the third output token, and the reconstructed channel state information corresponding to each user terminal is obtained based on the third output token.
[0027] In one embodiment, the base model further includes an output module, wherein obtaining the reconstructed channel state information corresponding to each user terminal based on the third output token includes:
[0028] The third output token is reconstructed through the fully connected layer in the output module to obtain the reconstructed channel state information corresponding to the sample channel state information of each user terminal.
[0029] In one embodiment, the step of training based on the reconstructed channel state information to obtain the trained base model includes:
[0030] Based on the channel state information of each sample and the reconstructed channel state information corresponding to each sample channel state information, loss prediction is performed to obtain the compressed reconstruction loss.
[0031] Based on the model weights and the number of target tokens, the token routing ratio is determined; and based on the number of expert models to be activated, the token routing ratio, and the average activation weight corresponding to the gating network, the task load balancing loss is obtained.
[0032] The quantization loss is calculated using the compressed feature vector corresponding to the user terminal and the target bitstream data corresponding to the user terminal.
[0033] Based on the compression reconstruction loss, the task load balancing loss, the quantization loss, and the loss weighting coefficients, the model training loss is obtained;
[0034] The model parameters of the pedestal model are updated using the model training loss until the preset training completion conditions are met, resulting in a trained pedestal model.
[0035] Secondly, this application also provides a training apparatus for a base model oriented to channel compression feedback, the base model including a base station and at least one user terminal, the user terminal including an encoder and a transform module; the base station including an inverse transform module and a decoder; the encoder including at least one expert hybrid coding module; the decoder including at least one expert shared decoding module; comprising:
[0036] The first acquisition module is used to acquire sample channel state information of at least one user terminal. The sample channel state information includes target dimension information, and the target dimension includes at least two of the following: time dimension, spatial dimension, and frequency dimension.
[0037] The second acquisition module is used to acquire the target token corresponding to the sample channel state information of the user terminal, encode the target token through the expert model and gating network in each of the expert hybrid coding modules corresponding to the user terminal to obtain a first output token; and transform the first output token through the transformation module to obtain target bitstream data.
[0038] The inverse transformation processing module is used to perform inverse transformation processing on the target bit stream data corresponding to each user terminal through the inverse transformation module of the base station to obtain the second output token;
[0039] The first processing module is used to process the second output token through at least one expert shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each of the user terminals, and to train the base model based on the reconstructed channel state information.
[0040] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in this embodiment.
[0041] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in this embodiment.
[0042] Fifthly, this application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps in this embodiment.
[0043] The aforementioned training method, apparatus, computer device, computer-readable storage medium, and computer program product for a base model oriented towards channel compression feedback, wherein the method includes: processing sample channel state information from each user terminal using a model at the user terminal to obtain target bitstream data, and processing the target bitstream data from each user terminal using a model at the base station terminal to obtain reconstructed channel state information, and obtaining a trained base model; by adopting this method, the task of channel compression feedback with different heterogeneous configurations can be flexibly solved using the trained base model; comprehensively considering the situation of multiple users in the wireless communication system, the sample channel state information estimated by multiple users is processed together, thereby improving the efficiency of model training and the performance of the trained base model. Attached Figure Description
[0044] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0045] Figure 1 This is an application environment diagram of a training method for a base model oriented towards channel compression feedback in one embodiment;
[0046] Figure 2 This is a flowchart illustrating a training method for a base model oriented towards channel compression feedback in one embodiment.
[0047] Figure 3 This is a flowchart illustrating the step of obtaining the first output token in one embodiment;
[0048] Figure 4 This is a flowchart illustrating the step of obtaining the second output token in one embodiment;
[0049] Figure 5 This is a flowchart illustrating the training method for a base model oriented towards channel compression feedback in another embodiment.
[0050] Figure 6 This is a schematic diagram of the structure of an expert hybrid coding module in one embodiment;
[0051] Figure 7 This is a schematic diagram of the structure of a shared expert decoding module in one embodiment;
[0052] Figure 8 This is a structural block diagram of a training device for a base model oriented towards channel compression feedback in one embodiment.
[0053] Figure 9 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0054] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0055] It should be noted that the terms "first," "second," etc., used in this application can be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish the first element from the second element. The terms "comprising" and "having," and any variations thereof, used in this application, are intended to cover non-exclusive inclusion. The term "multiple" used in this application refers to two or more. The term "and / or" used in this application refers to one of the embodiments, or any combination of multiple embodiments.
[0056] The base model in the training method for the base model oriented towards channel compression feedback provided in this application embodiment can be as follows: Figure 1 As shown, the system includes a base station model 100 and multiple user-side models 200. The base station model is deployed with a base station-side network architecture, and each user-side model is deployed with its own user-side network architecture. The input data for each user-side network architecture can be the sample channel state information estimated by the corresponding user. The output data for the base station can be the reconstructed channel state information for each user obtained after decoding by the base station.
[0057] Accordingly, in one embodiment, such as Figure 2 As shown, a training method for a base model oriented towards channel compression feedback is provided. Taking the application of this method to a terminal device as an example, the method can also be applied to a server, or to a system including both a terminal and a server, and is implemented through the interaction between the terminal and the server. The terminal can be, but is not limited to, various personal computers, laptops, smartphones, and tablets. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services. The base model includes a base station and at least one user terminal. The user terminal includes an encoder and a transformation module. The base station includes an inverse transformation module and a decoder. The encoder includes at least one expert hybrid coding module. The decoder includes at least one expert shared decoding module. The training method for this base model oriented towards channel compression feedback includes:
[0058] Step 202: Obtain sample channel state information from at least one user terminal.
[0059] The sample channel state information includes target dimension information, which includes at least two of the following: time dimension, spatial dimension, and frequency dimension. The time dimension includes the number of time sampling points, the spatial dimension includes the number of base station antennas, and the frequency dimension includes the number of subcarriers.
[0060] Optionally, the method in this embodiment can be applied to a MIMO-OFDM system, wherein the base station side is equipped with a planar array of multiple antennas and the user side is equipped with a single antenna. The base station large model obtained after training in this embodiment can be a base station large model deployed on both the base station side and the user side, which can handle channel compression feedback tasks with various heterogeneous configurations. The heterogeneous configurations include one or more of the following: heterogeneity in the number of antennas on the base station side, heterogeneity in the number of subcarriers in OFDM, heterogeneity in the number of users jointly processed, heterogeneity in the number of codewords fed back when users feed back the channel, and heterogeneity in the channel distribution of users. That is, in the wireless communication system in which the base station large model trained in this embodiment is deployed, this embodiment does not limit the number of antennas on the base station side, the number of subcarriers in OFDM, the number of users jointly processed (i.e., the number of users included in the wireless communication system), the number of codewords fed back when users feed back the channel, and the channel distribution of users.
[0061] Optionally, the network weights of the deployment models on each user terminal are the same, that is, the network parameters of the base model deployed on each user terminal can be the same.
[0062] Specifically, the terminal can obtain sample channel state information estimated by each user at each user terminal in the MIMO-OFDM system. For example, it can identify the users participating in training in the current MIMO-OFDM system and obtain the sample channel state information estimated by each user, i.e., obtain the CSI data of each user side. This sample channel state information can be two-dimensional information, such as including any two of the time dimension t, spatial dimension n, and frequency dimension k. For example, the terminal device needs to obtain the sample channel state information estimated by each user, i.e., obtain the two-dimensional CSI samples estimated by each user terminal. Generally, this can be information in the spatial and frequency dimensions, such as obtaining the number of antennas and subcarriers of the base station corresponding to each user terminal.
[0063] In one example, the sample channel state information can be spatial-frequency CSI by default. Optionally, the sample channel state information can also be temporal-spatial CSI or temporal-frequency CSI.
[0064] Step 204: Obtain the target token corresponding to the sample channel state information of the user terminal. Encode the target token using the expert models and gating network in the expert hybrid coding modules corresponding to the user terminal to obtain the first output token. Then, transform the first output token using the transformation module to obtain the target bitstream data.
[0065] The target token corresponding to the sample channel state information of the user terminal is obtained by embedding the sample channel state information. The user terminal may also include an embedding module. The encoder includes a first position coding module and at least one expert hybrid coding module; each expert hybrid coding module contains a gating network and multiple expert models. The gating network can output the model weights of each expert model, etc. Based on the model weights output by the gating network, an expert activation model is determined from the multiple expert models contained in the expert hybrid coding module.
[0066] Specifically, for each user terminal, after obtaining the sample channel state information corresponding to that user terminal, the terminal device can divide the sample channel state information into blocks to obtain multiple sample channel state blocks, that is, obtain a sample channel state block sequence. The embedding module performs feature extraction processing on each sample channel state block in the sequence to obtain a unit token sequence, that is, the unit token sequence is determined as the target token, where the target token is a one-dimensional token and the unit token sequence is a one-dimensional token sequence.
[0067] In this way, the terminal device can input the target token into the encoder, and perform position encoding processing on the target token through the first position encoding module in the encoder to obtain the first position encoded token. The first position encoded token is then input into at least one expert hybrid encoding module, and the first position encoded token is encoded through the gating network and multiple expert activation models in at least one expert hybrid encoding module to obtain the first output token output by the encoder.
[0068] Accordingly, the user terminal also includes a transformation module, which may include a compression layer and a quantization coding layer. The compression layer may be a feature compression layer. In this way, the terminal device can compress the first output token through the compression layer to obtain a compression result, and quantize the compression result based on the quantization layer to obtain the target bitstream data corresponding to the user terminal.
[0069] Step 206: The inverse transformation module at the base station performs inverse transformation processing on the target bitstream data corresponding to each user terminal to obtain the second output token.
[0070] Specifically, the base station can deploy a base station end with a large-scale base model to be trained, i.e., a base station-side network architecture. This base station end can include an inverse transformation module; the terminal device can receive target bitstream data fed back by each user terminal through the uplink communication link through the inverse transformation module, and concatenate the received target bitstream data according to the receiving order to obtain the concatenated bitstream of each user terminal, and perform dequantization and upsampling processing on each concatenated bitstream to obtain the second output token.
[0071] Step 208: The second output token is processed by at least one expert-shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each user terminal, and the base model is trained based on the reconstructed channel state information to obtain the trained base model.
[0072] The decoder includes a second position encoding module and at least one expert-shared decoding module.
[0073] Specifically, the terminal device can perform position encoding processing on the second output token through the second position encoding module to obtain a second position encoded token. Then, the terminal device can decode the second position encoded token through multiple interconnected expert-shared decoding modules to obtain a third output token. Finally, the third output token is reconstructed through the fully connected layer in the output module included in the base station to obtain the reconstructed channel state information corresponding to the sample channel state information of each user terminal. Based on this reconstructed channel state information, the terminal device can obtain the trained base station large model.
[0074] In the aforementioned training method for the base model oriented towards channel compression feedback, the user-end model processes the sample channel state information of each user end to obtain the target bitstream data, and the base station model processes the target bitstream data of each user end to obtain the reconstructed channel state information, thus obtaining the trained base model. By employing this method, the trained base model can flexibly solve the task of channel compression feedback with different heterogeneous configurations. Considering the presence of multiple users in the wireless communication system, the combined processing of sample channel state information estimated by multiple users improves the efficiency of model training and the performance of the trained base model.
[0075] In one embodiment, the encoder further includes a first position encoding module. The transform module includes a compression layer and a quantization encoding layer. Specifically, the compression layer may be a feature compression layer.
[0076] The specific implementation process of the step "encoding the target token using the expert models and gating networks in the corresponding expert hybrid coding modules at the user end to obtain the first output token, and transforming the first output token using the transformation module to obtain the target bitstream data" can include:
[0077] The target token is positionally encoded by the first positional encoding module to obtain a first positionally encoded token. The first positionally encoded token is then encoded using a gated network within a series of interconnected expert hybrid encoding modules and multiple target expert models to obtain a first output token. The first output token is then compressed using a compression layer to obtain a compressed feature vector. Finally, the compressed feature vector is quantized using a quantization encoding layer and a quantization bit width to obtain the target bitstream data corresponding to the user terminal.
[0078] The encoder includes a first position encoding module and at least one expert hybrid encoding module. The expert hybrid encoding module includes a pre-processing layer, an expert processing layer, and a normalization layer. The pre-processing layer includes a multi-head attention layer and a normalization layer. The expert processing layer includes a gating network and multiple expert models. The gating network can determine the model weights of each expert model based on the input data of the encoder. Based on the model weights output by the gating network, one or more expert activation models (i.e., one or more target expert models) are determined from the multiple expert models included in the expert hybrid encoding module. The expert activation model is used to process the first position encoded token.
[0079] Specifically, the terminal device can input the target token into the encoder, and perform position encoding processing on the target token through the first position encoding module in the encoder to obtain the first position encoded token. The first position encoded token is then input into at least one expert hybrid encoding module, and the first position encoded token is encoded through the gating network and multiple expert activation models in at least one expert hybrid encoding module to obtain the first output token output by the encoder.
[0080] In this way, the first output token output by the encoder can be a two-dimensional feature map. The terminal device can use a CNN (Convolutional Neural Network) in the compression layer to alternately combine downsampling operations to compress the first output token, obtaining a compressed result, that is, compressing the length and width of the first output token. Then, the terminal device can perform scalar quantization processing on the compressed result through the quantization coding layer to obtain the target bitstream data corresponding to the user terminal. This target bitstream data is then transmitted to the base station through the uplink communication link between the user terminal and the base station.
[0081] Optionally, the scalar quantization process can involve the compression result being a feature map. The terminal device, through a quantization coding layer, performs element-by-element scalar quantization on this feature map based on a randomly selected quantization bit width (each element in the feature map uses the same quantization bit width) to obtain the target bitstream data corresponding to the user terminal. During model training, the quantization bit width value can be uniformly sampled from a preset range of quantization bit widths. During model application, the quantization bit width value can be determined based on the requirements of the actual scenario corresponding to the model application.
[0082] In this embodiment, multiple expert activation models in the expert hybrid coding module adaptively adapt to the channel state information estimated by different users, thereby improving the model performance of the base large model.
[0083] In one embodiment, the expert hybrid coding module includes a preprocessing layer, an expert processing layer, and a first normalization layer. The expert processing layer includes a gating network and multiple expert models to be activated. The preprocessing layer includes a multi-head attention layer and a second normalization layer. The expert processing layer may be a MoE feedforward network layer.
[0084] Accordingly, such as Figure 3 As shown, the specific implementation process of the step "encoding the first position encoded token through the gating network in each expert hybrid coding module connected end to end and multiple target expert models to obtain the first output token" may include:
[0085] Step 302: For the i-th expert hybrid coding module, the input token of the i-th expert hybrid coding module is processed by attention mechanism and normalization through the pre-processing layer in the i-th expert hybrid coding module to obtain the first token.
[0086] Specifically, for the input token of the i-th expert joint coding module, the input token is processed by the attention mechanism through the multi-head attention layer in the preprocessing layer to obtain the attention result, and the attention result is input to the second normalization layer for normalization processing to obtain the first normalization result; the terminal device can superimpose the first normalization result with the input token to obtain the superimposed result, and the terminal can determine the superimposed result as the first token.
[0087] Step 304: Based on the gating network in the i-th expert hybrid coding module, determine the model weight corresponding to the first token, and based on the model weight, select the expert activation model from the expert models included in the i-th expert hybrid coding module.
[0088] Specifically, the terminal can input the first token to the expert processing layer. The expert processing layer includes a gating network (threshold network) and multiple expert models to be activated. The terminal device can process the first token through the gating network to obtain the model weights corresponding to the first token. These model weights can be the weights of each expert model corresponding to the first token. A Top-k sampling method can be used to select the expert model with the highest weight among the top target number of models as the activated expert model, or it can be the target expert model, i.e., the expert model used to process the first token. This embodiment does not limit the specific value of the target number; those skilled in the art can determine it based on the actual application scenario.
[0089] Optionally, the expert hybrid coding model is pre-configured with Nr routing expert models, i.e. expert models to be activated. The terminal can filter among the multiple routing expert models configured in the expert hybrid coding module based on the model weights output by the gating network and the Top-k sampling method to determine the K expert activation models corresponding to the first token.
[0090] Step 306: Encode the first token using each expert activation model to obtain the expert output result corresponding to each expert activation model. Combine the expert output result of each expert activation model with the model weights corresponding to each expert activation model to obtain the output token of the i-th expert hybrid encoding module.
[0091] Specifically, the terminal can encode the first token based on each expert activation model, input the first token into each expert activation model, and obtain the expert output results. The terminal device can perform weighted calculations on the expert output results of each expert activation model and the model weights corresponding to each expert activation model to obtain a weighted result, and perform normalization processing based on the weighted result, and determine the normalized result as the output token of the i-th expert joint encoding module.
[0092] Step 308: Determine the output token of the expert hybrid coding module located at the end as the first output token.
[0093] The input token of the expert hybrid coding module located at the initial position is the first position coding token, and the input token of the i-th expert hybrid coding module is the output token of the (i-1)-th expert hybrid coding module.
[0094] Specifically, the terminal can determine the output token of the expert hybrid coding module located at the end of the encoder as the first output token of the encoder. That is, if there are n expert hybrid coding modules in the encoder, the terminal can determine the output token of the nth expert hybrid coding module as the first output token of the encoder.
[0095] The input token of the first expert hybrid coding module located at the initial position in the encoder is the first position coding token, that is, the terminal can input the first position coding token into the first expert hybrid coding module; the input token of the i-th expert hybrid coding module is the output token of the (i-1)-th expert hybrid coding module, that is, the input token of the i-th expert hybrid coding module is the output token of the previous expert hybrid coding module (i-1).
[0096] In this embodiment, an expert processing layer containing a gating network and multiple routing expert models is used to determine the expert activation model that matches the user for processing. This allows the expert activation characteristics corresponding to different users to be learned, and the adaptability of the routing expert model can effectively handle the channel specificity between different users.
[0097] In one embodiment, the inverse transform module includes a concatenation layer, a dequantization layer, and an upsampling layer. Specifically, the concatenation layer can be a codeword concatenation layer, used to receive the bit stream sent by the user through the uplink communication link and concatenate them in order; the dequantization layer can be a conversion process used to convert the bit stream into floating-point form. The upsampling layer is used to recover the width and height of the feature map.
[0098] like Figure 4 As shown, the specific implementation process of the step "using the inverse transformation module at the base station to perform inverse transformation processing on the target bitstream data corresponding to each user terminal to obtain the second output token" may include:
[0099] Step 402: For each user terminal, the splicing layer in the inverse transformation module of the base station splices the received bit streams fed back by the user terminal based on the bit stream receiving order to obtain the spliced bit stream corresponding to the user terminal.
[0100] The bit stream reception order can be the order in which the base station receives the target bit stream data sent by the user terminal through the uplink communication link; the bit stream fed back by the user terminal is the target bit stream data sent by the user terminal through the uplink communication link.
[0101] Specifically, the terminal device can use a splicing layer to splice the received bit streams from the user terminal based on the order of bit stream reception, so as to obtain the user-sent bit streams corresponding to each user, i.e., spliced bit streams.
[0102] Step 404: Through the dequantization layer, perform type inverse transformation on the spliced bitstream corresponding to each user terminal to obtain the feature map of each user terminal.
[0103] Specifically, in the dequantization layer, the terminal device can convert the representation of the concatenated bitstream for each user into a floating-point representation, and obtain the feature map of each user based on the floating-point number corresponding to each user.
[0104] Optionally, for each user terminal, the terminal can obtain the spliced bitstream corresponding to that user terminal; based on the communication protocol and / or fixed length of the uplink communication link, the spliced bitstream is grouped, and each group of binary bits is converted into an integer value to obtain the integer value corresponding to the spliced bitstream. The type of the integer value is converted and normalized to obtain a normalized floating-point number. Based on the width, height, and number of channels of the original image, the floating-point number is rearranged to obtain a multidimensional array. The multidimensional array is determined as a feature map. The width, height, and number of channels of the original image can be randomly determined or determined based on the actual application scenario.
[0105] Step 406: Input the feature maps of each user terminal into the upsampling layer in parallel to obtain the sampling tokens corresponding to each user terminal. Then, concatenate the sampling tokens to obtain the second output token.
[0106] Specifically, the terminal device can input the feature maps of each user in parallel into the upsampling module. The upsampling model restores the size of the feature maps to their original size before compression, resulting in a restored feature map. The size of this restored feature map is consistent with the size of the feature map input to the compression layer. The terminal can convert the restored feature maps corresponding to each user into tokens, obtaining sampling tokens corresponding to each user. The terminal device can then concatenate these sampling tokens to obtain a second output token.
[0107] In this embodiment, by performing splicing, dequantization type conversion, and upsampling, the feature map can be accurately and comprehensively recovered, providing a reliable data foundation for the subsequent reconstruction of the model's information state information.
[0108] In one embodiment, the decoder further includes a second position encoding module. The specific implementation process of the step "processing the second output token through at least one expert-shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each user terminal" may include:
[0109] The second output token is position-encoded by the second position encoding module to obtain the second position-encoded token. The second position-encoded token is then decoded by a gating network in the expert-shared decoding modules connected end-to-end, at least one shared expert model, and multiple expert activation models to obtain the third output token. Based on the third output token, the reconstructed channel state information corresponding to each user terminal is obtained.
[0110] The expert shared decoding module includes a preprocessing layer, an expert processing layer, and a first normalization layer. The expert processing layer includes a gating network, at least one shared expert model, and multiple expert models to be activated. The preprocessing layer includes a multi-head attention layer and a second normalization layer. The expert processing layer can be an SR MoE feedforward network layer.
[0111] Specifically, the second output token is position-encoded by the second position encoding module to obtain a second position-encoded token. The second position-encoded token is then encoded using a gated network within a series of interconnected expert-shared decoding modules, along with at least one shared expert model and multiple expert activation models, to obtain a third output token. The first output token is then compressed using a compression layer to obtain a compressed feature vector. This compressed feature vector is then quantized using a quantization encoding layer and a quantization bit width to obtain the target bitstream data corresponding to the user terminal. Finally, the third output token is reconstructed using a fully connected layer in the output module to obtain the reconstructed channel state information corresponding to the sample channel state information of each user terminal.
[0112] In one example, the decoder may contain n expert-shared decoding modules, which may be connected end-to-end. Each expert-shared decoding model's expert processing layer contains multiple expert routing models, and may also include a gating network and at least one expert-shared model.
[0113] For the i-th expert shared decoding module among n expert shared decoding modules, the terminal can use the pre-processing layer in the i-th expert shared decoding module to perform attention mechanism processing and normalization processing on the input token of the i-th expert shared decoding module to obtain the first normalization result, and then superimpose the first normalization result with the input token to obtain the second token.
[0114] Based on the gating network in the i-th expert shared decoding module, the model weight corresponding to the second token is determined, and based on the model weight, the expert activation model is selected from the expert models (routing expert models) contained in the i-th expert shared decoding module.
[0115] The second token is encoded using each expert activation model and the expert shared model in the expert shared decoding module, resulting in multiple expert outputs. The expert outputs of each expert activation model and their corresponding model weights are then weighted to obtain a weighted result. This weighted result is then superimposed with the expert outputs of each expert shared model to obtain a first superimposed result. This first superimposed result is normalized using a first normalization layer to obtain a normalized result. This normalized result is then superimposed with the second token to obtain a second superimposed result. This second superimposed result is determined to be the output token of the i-th expert shared decoding module.
[0116] The terminal can determine that the output token of the expert shared decoding module located at the end of the decoder is the third output token. The input token of the expert shared decoding module located at the initial position is the second position-encoded token, and the input token of the i-th expert shared decoding module is the output token of the (i-1)-th expert shared decoding module.
[0117] In this embodiment, multiple routing expert models can be used to determine the expert activation model that matches the user for processing. This allows the system to learn the expert activation characteristics corresponding to different users and effectively handle the channel specificity between different users based on the adaptability of the routing expert models. Furthermore, the shared expert model in the expert layer can extract relevant channel information between multiple users, further improving the decoder's performance and providing accurate data for the output model to reconstruct the channel state.
[0118] In one embodiment, the base model further includes an output module. The specific implementation process of the step "obtaining the reconstructed channel state information corresponding to each user terminal based on the third output token" may include:
[0119] The third output token is reconstructed through the fully connected layer in the output module to obtain the reconstructed channel state information corresponding to the sample channel state information of each user terminal.
[0120] The output module can contain a fully connected layer.
[0121] Specifically, the terminal can reconstruct the third output token output by the decoder through the fully connected layer in the output module. For example, the fully connected layer is a network layer composed of fully connected neurons. The terminal outputs the third output token to the fully connected layer. The features of the third output token can be linearly transformed through the weight matrix and bias term of the fully connected layer. Then, a nonlinear mapping is introduced through the activation function. The third output token is transformed through the nonlinear mapping through the activation function. Finally, the reconstruction result that matches the dimension of the original token (i.e., sample channel state information) of each user terminal is output, that is, the reconstructed channel state information corresponding to the sample channel state information of each user terminal is obtained.
[0122] In this embodiment, the base model can learn how to recover the original information from the compressed or abstract token representation, thereby optimizing the model's feature extraction capabilities.
[0123] In one embodiment, the specific implementation process of the step "training based on the state information of each reconstructed channel to obtain the trained base model" may include:
[0124] Based on the channel state information of each sample and the corresponding reconstructed channel state information, loss prediction is performed to obtain the compression reconstruction loss. The token routing ratio is determined based on the model weights and the number of target tokens. The task load balancing loss is obtained based on the number of expert models to be activated, the token routing ratio, and the average activation weight of the gating network. The quantization loss is calculated using the compressed feature vector and the target bitstream data from the user terminal. The model training loss is obtained based on the compression reconstruction loss, task load balancing loss, quantization loss, and loss weighting coefficients. The model parameters of the base model are updated using the model training loss until the preset training completion conditions are met, resulting in a trained base model.
[0125] The preset training completion condition can be reaching the target number of training iterations, or the loss function obtained during training satisfying the convergence condition. The convergence condition can be that the loss function remains unchanged, or that the loss function is close to the target value. This embodiment does not limit the specific values of the number of training iterations or the target value; those skilled in the art can determine these based on the needs of the actual application scenario. The preset weighting coefficients include a first coefficient for quantizing the loss and a second coefficient for task load balancing loss.
[0126] Specifically, compression reconstruction loss L1 can be the channel prediction loss, representing the difference between the user-estimated 2D CSI and the reconstructed CSI output by the base station; quantization loss L2 represents the difference between the features output by the quantization layer and the feature vector input to the quantization layer; and task load balancing loss L3 is the load balancing loss for the MOE architecture.
[0127] The terminal can determine the compression reconstruction loss through the mean squared error (MSE). For example, the compression reconstruction loss L1 can be calculated using the following formula:
[0128]
[0129] Where H is the user-estimated original 2D CSI, i.e., sample channel state information. It reconstructs channel state information.
[0130] In this way, the terminal can calculate the quantization loss L2 using the mean square error:
[0131]
[0132] Among them, s q s represents the feature vector output by the quantization module (quantization layer), and s represents the feature vector input to the quantization module.
[0133] The terminal can also obtain the task load balancing loss L3 based on the number of expert models to be activated, the token routing ratio, and the average activation weight corresponding to the gating network. For example, the terminal may be executing the i-th pre-training task, and there may be T target tokens in the sample channel state data Ω of a training batch, for example, the token routing ratio may be D. n 'n' represents the number of expert models, 'token routing ratio' represents the proportion of all target tokens routed to expert model 'n', and 'P' represents the average activation weight of the gating network for each expert activation model 'n'. n Then the terminal can calculate the load balancing loss L3 using the following formula:
[0134] ;
[0135] in, The function is an indicator function, which is 1 when the condition in parentheses is met, and 0 otherwise; G(x)[n] represents the model weight of the nth expert model corresponding to the target token x output by the gating network G.
[0136] In this way, the terminal can calculate the model training loss corresponding to the base model by weighting the compression reconstruction loss, quantization loss, task load balancing loss, the first coefficient of quantization loss, and the second coefficient of task load balancing loss. After obtaining the model training loss, the terminal can determine whether the preset training completion condition is met. If the preset training completion condition is not met, the terminal can update the model parameters of the base model based on the model training loss to obtain the updated base model. Then, the method in the above embodiment is re-executed using the updated base model until the preset training completion condition is met, resulting in a trained base model.
[0137] In this embodiment, by comprehensively considering the loss function for task load balancing, the loss function for quantization, and the loss function for channel information, multi-objective optimization can be balanced, single-objective bias can be avoided, training efficiency can be improved, and the performance of the model obtained after training can be enhanced.
[0138] The following describes in detail the specific implementation steps of the training method for the base model oriented towards channel compression feedback, using a specific embodiment:
[0139] The training method for the base model oriented to channel compression feedback provided in this embodiment is a general network that can flexibly compress and feed back channel data with various heterogeneous configurations. Specifically, it is a MoE network architecture. By including multiple MoE architectures in the encoder and multiple SR MoE layers in the decoder, it can simultaneously utilize the correlation and specificity of channels between users during multi-user joint decoding, which significantly improves the accuracy of channel acquisition on the base station side.
[0140] In practical implementation, this embodiment is applied to a MIMO-OFDM system, where the base station side is equipped with a planar array of multiple antennas, while the user side is equipped with a single antenna. The proposed large-scale base model will be deployed simultaneously on both the user side and the base station side, capable of handling channel compression feedback with various heterogeneous configurations. These heterogeneous configurations include one or more of the following: heterogeneous number of antennas on the base station side, heterogeneous number of OFDM subcarriers, heterogeneous number of users undergoing joint processing, heterogeneous number of codewords fed back when users feed back their channels, and heterogeneous channel distribution of users.
[0141] like Figure 5 The diagram illustrates the network architecture of this embodiment, including the network architecture for user 1, ..., user k, and the base station-side network architecture deployed at the base station. The network is divided into two parts: a user-side network and a base station-side network. The user-side network includes multiple users, numbered from 1 to k. Each user is equipped with the same network architecture and shares the same network weights. The user-side network architecture includes an embedding module, an encoder module, a feature compression module, and a quantization encoding module. The base station-side network architecture includes a codeword concatenation module, a dequantization module, an upsampling module, a decoder module, and an output module.
[0142] Embedding Module: The input is a 2D CSI sample (sample channel state information) estimated by the user side. The embedding module processes the sample channel state information to obtain a series of tokens. Specifically, the 2D CSI sample is first divided into blocks to obtain a series of 2D CSI blocks. Then, a convolutional network is used to embed them into a 1D token sequence to obtain the target token.
[0143] The encoder module first performs position encoding on the input token, and then processes it through a series of transformer blocks (multiple interconnected expert hybrid encoding modules), where each transformer block adopts the proposed MoE architecture. The encoder module includes encoder position encoding and the MoE architecture. The encoder position encoding is the first position encoding module; the MoE architecture consists of multiple interconnected blocks (e.g., block 1, block 2, ..., block N), each of which is an expert hybrid encoding module.
[0144] Feature compression module: The token sequence (first output token) is determined as a two-dimensional feature map. By using CNN (Convolutional Neural Networks) to alternately combine downsampling operations, the length and width of the feature map are reduced, thereby achieving the effect of feature compression and obtaining a compressed feature vector.
[0145] The quantization encoding module performs element-wise scalar quantization on the feature map (compressed feature vector) to obtain the bitstream (target bitstream data), which is then fed back to the base station via the uplink communication link. The quantization bit width value is randomly sampled uniformly from a list of quantization bit widths within a certain range. The specific value of this quantization bit width can be determined based on the actual application scenario during model application.
[0146] Codeword concatenation module: After receiving the bit streams from each user, it concatenates them in order to obtain the concatenated bit streams corresponding to each user.
[0147] Dequantization module: Restores the bitstream sent by each user to floating-point form (by splicing the bitstreams) to obtain the feature map of each user.
[0148] Upsampling module: The feature maps of each user are input in parallel to the upsampling module to restore them to their original size before compression. Then, the feature maps of each user are converted into tokens and concatenated sequentially to obtain the second output token, which is then input to the decoder module.
[0149] Decoder Module: First, the input token is position-encoded, and then processed through a series of transformer blocks (a series of expert-shared decoding modules connected end-to-end), where each transformer block adopts the SR MoE architecture. The decoder module includes decoder position encoding and the SR MoE architecture. The encoder position encoding is a second position encoding module; the SR MoE architecture consists of multiple blocks connected end-to-end (e.g., block 1, block 2, ..., block N), each of which is an expert-shared decoding module.
[0150] Output module: The tokens output by the decoder are passed through the fully connected layer to obtain the reconstructed CSI of each user, that is, the reconstructed CSI of each user terminal, which is the CSI (channel state information) of each user decoded by the base station.
[0151] like Figure 6 The diagram shown illustrates the structure of an expert hybrid coding module, which includes a preprocessing layer, an expert processing layer, and a normalization layer (i.e., the first normalization layer). The preprocessing layer comprises a multi-head attention layer and a normalization layer (i.e., the second normalization layer). The expert processing layer is a MoE feedforward network layer. The MoE feedforward network layer includes a threshold network and multiple routing expert models, such as routing expert 1, routing expert 2, routing expert 3, and routing expert 4; token1, ..., token n, ..., token... N can represent N input tokens; the threshold network can process each input token to obtain the model weight corresponding to that input token. The terminal can activate the K experts with the highest weights according to the Top-k sampling method to process the token, and obtain K expert activation models. For example, the model weight output by the threshold network based on the nth input token can be route n, representing the weight of each routing expert for the nth input token, K=2. The expert activation model selected by the terminal device for the nth input token can be routing expert 1 and routing expert 3. Then, the terminal device can process the data input to the expert processing layer corresponding to the nth input token through routing expert 1 and routing expert 3 respectively, and obtain the output results of each expert model. Based on the output results of each expert model and the model weights, a weighted calculation is performed to obtain the weighted result, and the weighted result is determined as the output data of the expert processing layer corresponding to token n.
[0152] In one example, there are N r There are K expert networks (expert routing models, i.e., expert models to be activated), and K are activated each time. For each token input to the expert processing layer, its corresponding N is obtained through a threshold network. r Each expert has a weight, and the highest weight is activated based on a Top-k sampling method. An expert (expert activation model) processes the token and... The outputs of each expert are weighted and summed to obtain the output of the MoE feedforward layer. Assume the tokens input to the MoE feedforward layer are x∈R. (1×D) The corresponding threshold network is G(·), and its weight allocation for the nth expert is G(x)[n]. The network for the nth expert is E. n If the output of the MoE feedforward layer is y, then the following relationship is satisfied:
[0153] ;
[0154] Each expert network is an FFN architecture, that is:
[0155]
[0156] Where R is the set of real numbers, R (1×D) Represents a D-dimensional row vector in the real number field. For example, E n (x) is a D-dimensional row vector, meaning that the output vector obtained by the FFN model after processing the input data x is a D-dimensional row vector.
[0157] The gating network first processes the input x through a fully connected layer to obtain N. r The weights are assigned k weights, then normalized using a Softmax layer. Finally, the top K largest weights are retained, and the remaining weights are reset to zero. The output of the gating network is then:
[0158]
[0159] Here, FC(x) represents processing the input x through a fully connected layer. R (1×Nr ) This indicates that the output of the gated network G is an N r The dimensional row vectors represent the weights of each expert network for x.
[0160] like Figure 7The diagram shown illustrates the structure of an expert-shared decoding module, which includes a pre-processing layer, an expert processing layer, and a normalization layer (i.e., the first normalization layer). The pre-processing layer includes a multi-head attention layer and a normalization layer (i.e., the second normalization layer). The expert processing layer is a MoE feedforward network layer. The MoE feedforward network layer includes a threshold network, at least one shared expert model, and multiple routing expert models, such as routing expert 1, routing expert 2, routing expert 3, routing expert 4, and shared expert 1. token1, ..., token n, ..., token N can represent N input tokens. The threshold network processes each input token separately to obtain the model weights corresponding to each input token (i.e., the model weights of each routing expert model for that input token). The terminal can activate the K experts with the highest weights according to the Top-k sampling method to process the token, obtaining K expert activation models. For example, the model weight output by the threshold network based on the nth input token (token n) can be route n, representing the weight of the token. The weights of each routing expert n are given, K=2. The expert activation model selected by the terminal device for the nth input token can be routing expert 1 and routing expert 3. Then, the terminal device can process token n by sharing expert 1, routing expert 1 and routing expert 3 respectively, and obtain the output results of each model. Based on the output results of each model and the model weights, a weighted calculation is performed to obtain the weighted result. This weighted result is then processed with the output result of the shared expert model to obtain the output result of the expert processing layer corresponding to token n.
[0161] Optionally, there are N s A shared expert network, N r There are K routing expert networks, where each time all shared expert networks and K routing expert networks are activated. Each input token obtains its corresponding N through a threshold network. r The weights of the routing experts are determined, and the K experts with the highest weights are activated based on a Top-k sampling method to process the token. The outputs of the K experts and N are then analyzed. s The outputs of the shared expert networks are weighted and summed to obtain the output of the SR MoE feedforward layer.
[0162] Assume the tokens input to the SR MoE feedforward layer are x∈R (1×D) The corresponding threshold network is G(·), and its weight allocation for the nth routing expert is G(x)[n]. The network of the nth routing expert is E. (r,n) (·), the nth shared expert network is E (s,n) If the output of the SR MoE feedforward layer is y, then the following relationship is satisfied:
[0163] ;
[0164] Each expert network is an FFN architecture, that is:
[0165] ;
[0166] The gating network first processes the input x through a fully connected layer to obtain N. r The weights are assigned k weights, then normalized using a Softmax layer. Finally, the top K largest weights are retained, and the remaining weights are reset to zero. The output of the gating network is then:
[0167] .
[0168] The method provided in this embodiment is a multi-user, multi-code-rate pre-training scheme. This base model scheme has the ability to handle multi-user, multi-code-rate channel compression feedback. Therefore, during the training process, the number of users and the feedback code rate for each sample are randomly selected to allow the model to learn a more general and powerful channel compression feedback capability. This enables it to handle multiple user numbers and multiple feedback code rates simultaneously, solving the problem of difficulty in simultaneously utilizing the correlation and specificity of channels between users when performing multi-user joint decoding. This significantly improves the accuracy of channel acquisition on the base station side, utilizes the system gain brought by multi-user joint processing, and further expands the application scenarios of the obtained large-scale base model.
[0169] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages in other steps. It is understood that the steps in different embodiments can be freely combined as needed, and all non-contradictory solutions formed by such combinations are within the scope of protection of this application.
[0170] Based on the same inventive concept, this application also provides a training apparatus for a channel-compression feedback-oriented base model to implement the training method for the channel-compression feedback-oriented base model described above. The solution provided by this apparatus is similar to the implementation described in the above method. Therefore, the specific limitations of one or more training apparatus embodiments for a channel-compression feedback-oriented base model provided below can be found in the limitations of the channel-compression feedback-oriented base model training method described above, and will not be repeated here.
[0171] In one exemplary embodiment, such as Figure 8 As shown, a training device 800 for a base model oriented to channel compression feedback is provided. The base model includes a base station and at least one user terminal. The user terminal includes an encoder and a transform module; the base station includes an inverse transform module and a decoder; the encoder includes at least one expert hybrid coding module; the decoder includes at least one expert shared decoding module; the device includes:
[0172] The first acquisition module 802 is used to acquire sample channel state information of at least one user terminal. The sample channel state information includes target dimension information, which includes at least two of the following: time dimension, spatial dimension, and frequency dimension.
[0173] The second acquisition module 804 is used to acquire the target token corresponding to the sample channel state information of the user terminal, encode the target token through the expert models in the expert hybrid coding modules corresponding to the user terminal and the gating network to obtain the first output token; and transform the first output token through the transformation module to obtain the target bit stream data.
[0174] The inverse transformation processing module 806 is used to perform inverse transformation processing on the target bit stream data corresponding to each user terminal through the inverse transformation module at the base station to obtain the second output token.
[0175] The first processing module 808 is used to process the second output token through at least one expert shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each user terminal, and to train the base model based on the reconstructed channel state information.
[0176] In one embodiment, the encoder further includes a first position encoding module; the transformation module includes a compression layer and a quantization encoding layer; the second acquisition module is specifically used for:
[0177] The target token is position-encoded by the first position encoding module to obtain a first position-encoded token;
[0178] The first position encoded token is encoded by the gating network in each of the expert hybrid coding modules connected end to end, as well as multiple target expert models, to obtain the first output token.
[0179] The first output token is compressed using the compression layer to obtain a compressed feature vector;
[0180] The compressed feature vector is quantized using the quantization coding layer and the quantization bit width to obtain the target bitstream data corresponding to the user terminal.
[0181] In one embodiment, the expert hybrid coding module includes a preprocessing layer, an expert processing layer, and a normalization layer, wherein the expert processing layer includes multiple expert models and multiple gating networks; the second acquisition module is specifically used for:
[0182] For the i-th expert hybrid coding module, the input token of the i-th expert hybrid coding module is processed by attention mechanism and normalization through the pre-processing layer in the i-th expert hybrid coding module to obtain the first token;
[0183] Based on the gating network in the i-th expert hybrid coding module, the model weight corresponding to the first token is determined, and based on the model weight, an expert activation model is selected from the expert models included in the i-th expert hybrid coding module.
[0184] The first token is encoded by each of the expert activation models to obtain the expert output result corresponding to each expert activation model; and the expert output result of each expert activation model and the model weight corresponding to each expert activation model are used to obtain the output token of the i-th expert hybrid encoding module.
[0185] The output token of the expert hybrid coding module located at the end position is determined as the first output token; the input token of the expert hybrid coding module located at the beginning position is the first position coding token, and the input token of the i-th expert hybrid coding module is the output token of the (i-1)-th expert hybrid coding module.
[0186] In one embodiment, the inverse transform module includes a splicing layer, a dequantization layer, and an upsampling layer; the inverse transform processing module is specifically used for:
[0187] For each user terminal, the splicing layer in the inverse transformation module of the base station splices the received bit streams fed back by the user terminal based on the bit stream receiving order to obtain the spliced bit stream corresponding to the user terminal.
[0188] Through the dequantization layer, the concatenated bitstream corresponding to each user terminal is subjected to inverse type transformation to obtain the feature map of each user terminal.
[0189] The feature maps of each user terminal are input in parallel to the upsampling module to obtain the sampling tokens corresponding to each user terminal. The sampling tokens are then concatenated to obtain the second output token.
[0190] In one embodiment, the decoder further includes a second position encoding module; the first processing module is specifically used for:
[0191] The second output token is position-encoded by the second position encoding module to obtain a second position-encoded token;
[0192] The second position-coded token is decoded by the gating network, at least one shared expert model, and multiple expert activation models in the expert-shared decoding modules connected end to end, to obtain the third output token, and the reconstructed channel state information corresponding to each user terminal is obtained based on the third output token.
[0193] In one embodiment, the base model further includes an output module, and the first processing module is specifically used for:
[0194] The third output token is reconstructed through the fully connected layer in the output module to obtain the reconstructed channel state information corresponding to the sample channel state information of each user terminal.
[0195] In one embodiment, the step of training based on the reconstructed channel state information to obtain the trained base model includes:
[0196] Based on the channel state information of each sample and the reconstructed channel state information corresponding to each sample channel state information, loss prediction is performed to obtain the compressed reconstruction loss.
[0197] Based on the model weights and the number of target tokens, the token routing ratio is determined; and based on the number of expert models to be activated, the token routing ratio, and the average activation weight corresponding to the gating network, the task load balancing loss is obtained.
[0198] The quantization loss is calculated using the compressed feature vector corresponding to the user terminal and the target bitstream data corresponding to the user terminal.
[0199] Based on the compression reconstruction loss, the task load balancing loss, the quantization loss, and the loss weighting coefficients, the model training loss is obtained;
[0200] The model parameters of the pedestal model are updated using the model training loss until the preset training completion conditions are met, resulting in a trained pedestal model.
[0201] Each module in the training device for the aforementioned base model oriented towards channel compression feedback can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0202] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 9 As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores training data for the base model. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements a training method for a base model oriented towards channel compression feedback.
[0203] Those skilled in the art will understand that Figure 9 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0204] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.
[0205] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.
[0206] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0207] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.
[0208] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.
[0209] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.
[0210] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A training method for a base model oriented towards channel compression feedback, characterized in that, The base model includes a base station and at least one user terminal. The user terminal includes an encoder and a transformation module. The base station includes an inverse transformation module and a decoder. The encoder includes at least one expert hybrid coding module. The decoder includes at least one expert-shared decoding module; the method includes: Obtain sample channel state information from at least one user terminal, wherein the sample channel state information includes target dimension information, and the target dimension includes at least two of the following: time dimension, spatial dimension, and frequency dimension. The target token corresponding to the sample channel state information of the user terminal is obtained, and the target token is encoded by the expert model and the gating network in each of the expert hybrid coding modules corresponding to the user terminal to obtain a first output token; and the first output token is transformed by the transformation module to obtain target bitstream data. The inverse transformation module at the base station performs inverse transformation on the target bitstream data corresponding to each user terminal to obtain the second output token. The second output token is processed by at least one expert-shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each user terminal, and the base model is trained based on the reconstructed channel state information to obtain the trained base model.
2. The method according to claim 1, characterized in that, The encoder further includes a first position encoding module; the transformation module includes a compression layer and a quantization encoding layer; the target token is encoded by the expert model and gating network in each of the expert hybrid encoding modules corresponding to the user terminal to obtain a first output token. The transformation module transforms the first output token to obtain the target bitstream data, including: The target token is position-encoded by the first position encoding module to obtain a first position-encoded token; The first position encoded token is encoded by the gating network in each of the expert hybrid coding modules connected end to end, as well as multiple target expert models, to obtain the first output token. The first output token is compressed using the compression layer to obtain a compressed feature vector; The compressed feature vector is quantized using the quantization coding layer and the quantization bit width to obtain the target bitstream data corresponding to the user terminal.
3. The method according to claim 2, characterized in that, The expert hybrid coding module includes a preprocessing layer, an expert processing layer, and a normalization layer. The expert processing layer includes multiple expert models and multiple gating networks. The first position-coded token is encoded using the gating networks and multiple target expert models in the interconnected expert hybrid coding modules to obtain the first output token, including: For the i-th expert hybrid coding module, the input token of the i-th expert hybrid coding module is processed by attention mechanism and normalization through the pre-processing layer in the i-th expert hybrid coding module to obtain the first token; Based on the gating network in the i-th expert hybrid coding module, the model weight corresponding to the first token is determined, and based on the model weight, an expert activation model is selected from the expert models included in the i-th expert hybrid coding module. The first token is encoded by each of the expert activation models to obtain the expert output result corresponding to each expert activation model; and the expert output result of each expert activation model and the model weight corresponding to each expert activation model are used to obtain the output token of the i-th expert hybrid encoding module. The output token of the expert hybrid coding module located at the end position is determined as the first output token; the input token of the expert hybrid coding module located at the beginning position is the first position coding token, and the input token of the i-th expert hybrid coding module is the output token of the (i-1)-th expert hybrid coding module.
4. The method according to claim 1, characterized in that, The inverse transformation module includes a splicing layer, a dequantization layer, and an upsampling layer; the inverse transformation module at the base station performs inverse transformation processing on the target bitstream data corresponding to each user terminal to obtain a second output token, including: For each user terminal, the splicing layer in the inverse transformation module of the base station splices the received bit streams fed back by the user terminal based on the bit stream receiving order to obtain the spliced bit stream corresponding to the user terminal. Through the dequantization layer, the concatenated bitstream corresponding to each user terminal is subjected to inverse type transformation to obtain the feature map of each user terminal. The feature maps of each user terminal are input in parallel to the upsampling layer to obtain the sampling tokens corresponding to each user terminal. The sampling tokens are then concatenated to obtain the second output token.
5. The method according to claim 1, characterized in that, The decoder further includes a second position encoding module; the step of processing the second output token through at least one expert-shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each of the user terminals includes: The second output token is position-encoded by the second position encoding module to obtain a second position-encoded token; The second position-coded token is decoded by the gating network, at least one shared expert model, and multiple expert activation models in the expert-shared decoding modules connected end to end, to obtain the third output token, and the reconstructed channel state information corresponding to each user terminal is obtained based on the third output token.
6. The method according to claim 5, characterized in that, The base model also includes an output module, which, based on the third output token, obtains the reconstructed channel state information corresponding to each user terminal, including: The third output token is reconstructed through the fully connected layer in the output module to obtain the reconstructed channel state information corresponding to the sample channel state information of each user terminal.
7. The method according to claim 3, characterized in that, The training based on the reconstructed channel state information to obtain the trained base model includes: Based on the channel state information of each sample and the reconstructed channel state information corresponding to each sample channel state information, loss prediction is performed to obtain the compressed reconstruction loss. Based on the model weights and the number of target tokens, the token routing ratio is determined; and based on the number of expert models to be activated, the token routing ratio, and the average activation weight corresponding to the gating network, the task load balancing loss is obtained. The quantization loss is calculated using the compressed feature vector corresponding to the user terminal and the target bitstream data corresponding to the user terminal. Based on the compression reconstruction loss, the task load balancing loss, the quantization loss, and the loss weighting coefficients, the model training loss is obtained; The model parameters of the pedestal model are updated using the model training loss until the preset training completion conditions are met, resulting in a trained pedestal model.
8. A training device for a base model oriented towards channel compression feedback, characterized in that, The base model includes a base station and at least one user terminal. The user terminal includes an encoder and a transformation module. The base station includes an inverse transformation module and a decoder. The encoder includes at least one expert hybrid coding module. The decoder includes at least one expert-shared decoding module; the device includes: The first acquisition module is used to acquire sample channel state information of at least one user terminal. The sample channel state information includes target dimension information, and the target dimension includes at least two of the following: time dimension, spatial dimension, and frequency dimension. The second acquisition module is used to acquire the target token corresponding to the sample channel state information of the user terminal, encode the target token through the expert model and gating network in each of the expert hybrid coding modules corresponding to the user terminal to obtain a first output token; and transform the first output token through the transformation module to obtain target bitstream data. The inverse transformation processing module is used to perform inverse transformation processing on the target bit stream data corresponding to each user terminal through the inverse transformation module of the base station to obtain the second output token; The first processing module is used to process the second output token through at least one expert shared decoding module included in the decoder to obtain the reconstructed channel state information corresponding to each of the user terminals, and to train the base model based on the reconstructed channel state information.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.
11. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.