Model training method and related apparatus

By receiving and utilizing the knowledge data from the second communication device to train the model of the first communication device, the problem of poor AI model performance caused by insufficient computing power is solved, the model performance and interpretability are improved, and the communication quality of the communication system is enhanced.

WO2026138361A1PCT designated stage Publication Date: 2026-07-02HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-11-28
Publication Date
2026-07-02

Smart Images

  • Figure CN2025138424_02072026_PF_FP_ABST
    Figure CN2025138424_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the present application is a model training method, which is used for using, when computing power on a terminal device side and / or a network device side is insufficient and an interpretable model needs to be trained, a trained large-scale model on a core network side to train the model on the terminal device side and / or the network device side, thereby improving the efficiency of model training and the precision of the trained model.
Need to check novelty before this filing date? Find Prior Art

Description

A model training method and related apparatus

[0001] This application claims priority to Chinese Patent Application No. 202411937487.6, filed with the State Intellectual Property Office of China on December 24, 2024, entitled “A Model Training Method and Apparatus”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of communication technology, specifically to a model training method and related apparatus. Background Technology

[0003] With the development of communication technology, communication equipment in communication systems can now perform not only traditional communication services but also other new types of services, such as artificial intelligence (AI) services. Generally, wireless communication systems capable of handling AI services can also be called air interface AI systems.

[0004] Currently, communication devices can serve as participating nodes in air interface AI systems, applying their computing power to a specific aspect of the system. Generally, AI functions introduced into wireless communication networks rely on models for implementation. For example, communication devices need to acquire channel state information (CSI) to improve communication quality and network performance; therefore, an AI model for acquiring CSI can be constructed.

[0005] However, due to the limited computing power of communication devices, the trained AI models perform poorly. Therefore, how to improve the performance of trained AI models has become a critical issue that urgently needs to be addressed. Summary of the Invention

[0006] This application provides a model training method for training a high-performance and interpretable AI model on a network device side and / or terminal device side with limited computing power. This application also provides corresponding devices, computer-readable storage media, and computer program products.

[0007] A first aspect of this application provides a model training method. The method is applied to a first communication device, on which a first model is deployed. The first model includes multiple first modules. The method includes: receiving model structure information of a second model from a second communication device, wherein the second model is deployed in the second communication device; wherein the number of model parameters of the first model is less than or equal to the number of model parameters of the second model, the second model includes multiple second modules, and the model structure information is used to indicate the dimension of each second module; sending request information to the second communication device according to the model structure information; wherein the request information is used to request knowledge data of at least one target module with the same dimension as at least one first module, the target module being included in the multiple second modules; receiving knowledge data of at least one target module, and using the knowledge data of the target module to train the first model.

[0008] In this application, the first communication device can be a terminal device or a component of the terminal device, such as a communication module, a circuit or chip responsible for communication functions (such as a modem chip, also known as a baseband chip, or a system-on-chip (SoC) chip or system-in-package (SIP) chip containing a modem core), a chip system, or a processor, etc., which can be applied in the terminal device. It can also be a logic module or software that can realize all or part of the functions of the terminal device.

[0009] In this application, the first communication device may also be a network device or a component of a network device, such as a communication module, processor, chip, chip system, or circuit that can be applied in a network device, or a logic module or software that can realize all or part of the functions of a network device.

[0010] In this application, the first model can be used by network devices to measure the channel state information (CSI) of the downlink channel of terminal devices, recover the downlink channel based on the obtained CSI, and use the recovered downlink channel to determine the precoding matrix, thereby determining the precoding for downlink data transmission. By precoding the downlink data from multiple terminal devices, network devices can achieve spatial multiplexing, thereby reducing interference between terminal devices, improving the signal-to-interference-plus-noise ratio (SINR) at the receiver, and thus increasing system throughput.

[0011] In this application, the first model is an interpretable model. Interpretability means that people can understand the choices made by the model in its decision-making process, including how the decision is made, why the decision is made, and what the decision is. It requires the model to provide clear, explicit, and easy-to-understand explanations to help human users understand the model's decision-making logic and basis.

[0012] In this application, the first model includes multiple first modules. The function of each first module may be the same or different depending on the needs of the first model. For example, the first module may be a module for preprocessing the raw input data, such as normalization and cleaning; the first module may be a multi-head attention module that captures the relationship between the input data through multiple attention heads; or it may be a position encoding module that adds position information to each position in the input sequence. The type and number of first modules are not limited here.

[0013] In this application, the number of first communication devices can be one or more. When there are multiple first communication devices, the second communication device sends the model structure information of the second model to the multiple first communication devices respectively.

[0014] In this application, the second communication device can be a core network (operations, administration, and maintenance, OAM), or a component of the core network, such as a communication module, processor, chip, chip system, or circuit that can be applied in the core network, or a logic module or software that can realize all or part of the core network functions.

[0015] In this application, the second model can be a large black-box dual-ended model trained using a large amount of channel measurement data collected from the radio access network (RAN) side. The number of model parameters in the second model is greater than or equal to the number of model parameters in the first model; therefore, the feature data extracted using the second model will be more accurate.

[0016] In machine learning, "black-box models" typically refer to models with complex and opaque internal decision-making processes, such as neural networks and random forests. These models often possess high accuracy, but their internal workings are difficult to understand, making it impossible to estimate the importance of each feature to the model's predictions or understand the interactions between different features. Therefore, the use of second models is limited in applications requiring interpretability.

[0017] In this application, the model structure information of the second model may include the number of model layers (i.e., the number of second modules) and the dimensional information of each second module. The dimensions of each second module include input and output dimensions. For example, if the second model includes three second modules (i.e., three layers), then the model structure information may include: module 1 has an input dimension of 3×2 and an output dimension of 4×3; module 2 has an input dimension of 4×3 and an output dimension of 5×2; and module 3 has an input dimension of 5×2 and an output dimension of 3×3.

[0018] In this application, the dimensions of the first module are the same as those of the target module. Taking the above example, if the input dimension of the first module is 4×3 and the output dimension is 5×2, then the request information sent by the first communication device to the second communication device includes the module 2 number information, which is used to request the second communication device to send the knowledge data of module 2 to the first communication device.

[0019] In this application, knowledge data may include training data contained in a large-scale dataset, and prediction results obtained from the training data through a teacher model. The prediction results are data labels obtained through the teacher model, which are used to perform regression prediction on the data.

[0020] In the first aspect mentioned above, the first communication device trains the first model in the first communication device by using the knowledge data of the corresponding module in the second model sent by the second communication device. Under the background of limited computing power of the first communication device, not only is the training speed of the first model improved, but also the performance level of the first model is improved. Moreover, the first model is an interpretable AI model, which can help users to debug, locate errors and optimize the performance of the first model more effectively.

[0021] In one possible implementation, the knowledge data includes at least one of the following:

[0022] The target module's input and output data, the target module's attention input data and attention map, the target module's attention input data and attention logic values, the target module's input data and the mean and variance of the batch normalization layer output, and the target module's training parameters.

[0023] In this application, the input data is the feature data before passing through the target module, and the output data is the feature data obtained after passing through the target module.

[0024] In this application, the attention input data for the target module is the attention feature representation obtained by inputting the original data into the attention module of the target module. The attention map is a visual representation of the attention mechanism. It is usually a two-dimensional matrix (or a higher-dimensional tensor), where the value of each element represents the attention weight at the corresponding position in the input data.

[0025] In this application, attention logits typically represent the raw attention scores calculated by the model when processing sequences of input data. These scores are not normalized by the softmax function, and therefore may have different scales and ranges.

[0026] In this application, the batch normalization layer is used to normalize the input data of each mini-batch, so that the mean of the output data is close to 0 and the variance is close to 1.

[0027] In this application, the knowledge data in the target module is transferred to the corresponding first module in the first model through distillation.

[0028] In this possible implementation, the first communication device acquires knowledge data of the target module with the same dimension as the first module for training the first module, which improves the performance and accuracy of the first model. Furthermore, the first model has low computational cost and low memory usage, reducing the computational requirements of the first communication device.

[0029] In one possible implementation, the first communication device trains the first model using knowledge data, including:

[0030] The first module corresponding to the target module in the first model is trained using knowledge data; wherein, the knowledge data includes input data and the first output result of the input data in the target module; the parameters of the first module corresponding to the target module in the first model are adjusted using a loss function determined based on the knowledge data.

[0031] In this application, the loss function is a function that measures the difference between the model's predicted value and the actual value. The first communication device can determine the gap between the first model and the second model based on the knowledge data sent by the second communication device, the results obtained from the knowledge data in the second model, and the results obtained from the knowledge data in the first model, thereby calculating the value of the loss function.

[0032] In this possible implementation, through knowledge distillation, the first model in the first communication device learns the knowledge data of the second model in the second communication device and adjusts its own model parameters according to the output results obtained from the second model, so that the output of the first model gradually approaches the output of the second model, thereby improving the performance and interpretability of the first model in the first communication device.

[0033] In one possible implementation, the parameters of the first module corresponding to the target module in the first model are adjusted using a loss function determined based on knowledge data. This includes: obtaining the second output result of the first module corresponding to the target module in the first model from the input data; calculating the distance between the second output result and the first output result; calculating the value of the loss function based on the distance; and adjusting the parameters of the first module corresponding to the target module in the first model using the value of the loss function.

[0034] In this possible implementation, the first communication device adjusts the parameters of the first model by using the output results of the second model with knowledge data, thereby improving the similarity between the first model and the second model, enabling the first model to better learn the generalization ability of the second model, and thus improving the accuracy and robustness of the first model.

[0035] A second aspect of this application provides a model training method, wherein a second model is deployed on a second communication device, the second model including multiple second modules, the method comprising: sending model structure information of the second model to a first communication device; wherein a first model is deployed on the first communication device, the first model including multiple first modules, the number of model parameters of the first model being less than or equal to the number of model parameters of the second model, and the model structure information indicating the dimension of each second module; receiving request information from the first communication device; wherein the request information is determined by the first communication device based on the model structure information, the request information being used to request knowledge data of at least one target module with the same dimension as at least one first module, the target module being included in the multiple second modules; and sending knowledge data to the first communication device, the knowledge data being used to train the first model.

[0036] In the second aspect mentioned above, the second communication device sends the knowledge data of the modules in the second model, which has been trained using large-scale data, to the first communication device. The knowledge data can be used to train the model of the first model with a smaller number of parameters in the first communication device, thereby improving the model performance and interpretability of the first model.

[0037] A third aspect of this application provides a model training apparatus, which can be a first communication device, including: a transceiver unit and a processing unit;

[0038] The transceiver unit is used to receive model structure information of the second model from the second communication device. The second model is deployed in the second communication device. The number of model parameters of the first model is less than or equal to the number of model parameters of the second model. The second model includes multiple second modules, and the model structure information is used to indicate the dimension of each second module.

[0039] The processing unit is used to identify the target module based on the model structure information;

[0040] The transceiver unit is also configured to send a request message to the second communication device; wherein the request message is used to request knowledge data of at least one target module of the same dimension as at least one first module, and the target module is included in multiple second modules;

[0041] The transceiver unit is also used to receive knowledge data from at least one target module.

[0042] The processing unit is also used to train the first model using the knowledge data of the target module.

[0043] In one possible implementation, each second module has an input dimension and an output dimension, the input dimension of the target module is the same as the input dimension of the corresponding first module, and the output dimension of the target module is the same as the output dimension of the corresponding first module.

[0044] In one possible implementation, the knowledge data includes at least one of the following: input data and output data of the target module, attention input data and attention map of the target module, attention input data and attention logic value of the target module, mean and variance of the input data of the target module and the output of the batch normalization layer, and training parameters of the target module.

[0045] In one possible implementation, the processing unit is further configured to train a first module in the first model corresponding to the target module using knowledge data; wherein the knowledge data includes input data and a first output result of the input data in the target module;

[0046] The processing unit is also used to adjust the parameters of the first module corresponding to the target module in the first model using a loss function determined based on knowledge data.

[0047] In one possible implementation, the processing unit is further configured to obtain the second output result of the first module corresponding to the target module in the first model from the input data;

[0048] The processing unit is also used to calculate the distance between the second output result and the first output result;

[0049] The processing unit is also used to calculate the value of the loss function based on distance;

[0050] The processing unit is also used to adjust the parameters of the first module corresponding to the target module in the first model using the value of the loss function.

[0051] In one possible implementation, the first model and the second model are used to measure the channel state information of the first communication device.

[0052] A fourth aspect of this application provides a communication device, which can be a second communication device that communicates with a first communication device. The communication device includes: a transceiver unit and a processing unit.

[0053] The transceiver unit is used to send model structure information of the second model to the first communication device; wherein, the first communication device is equipped with a first model, the first model includes multiple first modules, the number of model parameters of the first model is less than or equal to the number of model parameters of the second model, and the model structure information is used to indicate the dimension of each second module;

[0054] The transceiver unit is also used to receive request information from the first communication device; wherein the request information is determined by the first communication device based on the model structure information, and the request information is used to request knowledge data of at least one target module with the same dimension as at least one first module, and the target module is included in multiple second modules;

[0055] The transceiver unit is also used to send knowledge data to the first communication device, which is used to train the first model.

[0056] In one possible implementation, each second module has an input dimension and an output dimension, the input dimension of the target module is the same as the input dimension of the corresponding first module, and the output dimension of the target module is the same as the output dimension of the corresponding first module.

[0057] In one possible implementation, the knowledge data includes at least one of the following:

[0058] The target module's input and output data, the target module's attention input data and attention map, the target module's attention input data and attention logic values, the target module's input data and the mean and variance of the batch normalization layer output, and the target module's training parameters.

[0059] In one possible implementation, knowledge data is used to train a first module in the first model corresponding to the target module; wherein, the knowledge data includes input data and the first output result of the input data in the target module; the knowledge data is also used to determine the value of a loss function, which is used to adjust the parameters of the first module in the first model corresponding to the target module.

[0060] In one possible implementation, the value of the loss function is determined based on the distance between the second output and the first output, where the second output is obtained by inputting the input data into the first module corresponding to the target module in the first model.

[0061] In one possible implementation, the first model and the second model are used to measure the channel state information of the first communication device.

[0062] A fifth aspect of this application provides a communication device including a processor. The processor is configured to invoke and execute computer programs or instructions, causing the processor to implement as described in the first aspect or any of the implementations in the first aspect.

[0063] Optionally, the communication device also includes a transceiver; the processor is also used to control the transceiver to send and receive signals.

[0064] Optionally, the communication device includes a memory in which computer programs or instructions are stored.

[0065] The communication device mentioned in the fifth aspect above can be a device or a chip (system) in a device.

[0066] A sixth aspect of this application provides a communication device including a processor. The processor is configured to invoke and execute computer programs or instructions, causing the processor to implement as described in the second aspect or any of the implementations in the second aspect.

[0067] Optionally, the communication device also includes a transceiver; the processor is also used to control the transceiver to send and receive signals.

[0068] Optionally, the communication device includes a memory in which computer programs or instructions are stored.

[0069] The communication device mentioned in the sixth aspect can be a device or a chip (system) in a device.

[0070] The seventh aspect of this application provides a communication device, which may be a first communication device or a module or unit (e.g., a chip, a chip system, or a circuit) in the first communication device that corresponds to the execution of the methods / operations / steps / actions described in the first aspect.

[0071] The eighth aspect of this application provides a communication device, which may be a second communication device or a module or unit (e.g., a chip, a chip system, or a circuit) in the second communication device that corresponds to the execution of the methods / operations / steps / actions described in the second aspect.

[0072] The ninth aspect of this application provides a computer-readable storage medium including a computer program or instructions that, when executed on a computer, cause the computer to perform an implementation as described in the first aspect or any of the first aspects.

[0073] The tenth aspect of this application provides a computer-readable storage medium including a computer program or instructions that, when executed on a computer, cause the computer to perform an implementation as described in the second aspect or any of the second aspects.

[0074] The eleventh aspect of this application provides a computer program product including a computer program or instructions, which, when run on a computer, causes the computer to perform an implementation as described in the first aspect or any of the first aspects.

[0075] The twelfth aspect of this application provides a computer program product including a computer program or instructions, which, when run on a computer, causes the computer to perform an implementation as described in the second aspect or any of the second aspects.

[0076] The thirteenth aspect of this application provides a chip device including a processor for calling a computer program or instructions in memory to cause the processor to execute the first aspect or any implementation thereof.

[0077] Optionally, the memory may be located inside or outside the chip device.

[0078] The fourteenth aspect of this application provides a chip device including a processor for calling a computer program or instructions stored in a memory, so that the processor executes the second aspect or any implementation thereof described above.

[0079] Optionally, the memory may be located inside or outside the chip device.

[0080] The fifteenth aspect of this application provides a communication system, which includes a first communication device and a second communication device. The first communication device is used to execute the first aspect or any one of the implementations of the first aspect, and the second communication device is used to execute the second aspect or any one of the implementations of the second aspect.

[0081] The technical effects of the third aspect or any possible implementation of the third aspect, the fifth aspect, the seventh aspect, the ninth aspect, the eleventh aspect, the thirteenth aspect or the fifteenth aspect can be found in the first aspect or the technical effects of different possible implementations of the first aspect, and will not be repeated here.

[0082] The technical effects of the fourth aspect or any possible implementation of the fourth aspect, the sixth aspect, the eighth aspect, the tenth aspect, the twelfth aspect, the fourteenth aspect, or the fifteenth aspect can be found in the technical effects of the second aspect or different possible implementations of the second aspect, and will not be repeated here. Attached Figure Description

[0083] Figures 1A to 1C are schematic diagrams of the AI ​​processing involved in this application;

[0084] Figure 2A is a schematic diagram of an example communication scenario provided in an embodiment of this application;

[0085] Figure 2B is another example schematic diagram of a communication scenario provided in an embodiment of this application;

[0086] Figure 3 is a schematic diagram of an embodiment of the model training method provided in this application;

[0087] Figure 4 is a schematic diagram of another embodiment of the model training method provided in this application;

[0088] Figure 5 is a schematic diagram of another embodiment of the model training method provided in this application;

[0089] Figure 6 is a structural schematic diagram of a communication device provided in an embodiment of this application;

[0090] Figure 7 is another structural schematic diagram of the communication device provided in an embodiment of this application;

[0091] Figure 8 is another structural schematic diagram of the communication device provided in an embodiment of this application;

[0092] Figure 9 is another structural schematic diagram of the communication device provided in an embodiment of this application;

[0093] Figure 10 is another structural schematic diagram of the communication device provided in an embodiment of this application;

[0094] Figure 11 is another structural schematic diagram of the model training device provided in the embodiment of this application. Detailed Implementation

[0095] The embodiments of this application are described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. As those skilled in the art will recognize, with the development of technology and the emergence of new scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0096] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in a sequence other than that illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0097] This application provides a model training method for training a model on a terminal device / network device, utilizing a large-scale model on the core network. This application also provides corresponding apparatus, computer-readable storage media, and computer program products, etc., which are described in detail below.

[0098] The technical solutions of this application embodiment can be applied to various communication systems, such as: satellite communication, 5th generation (5G) system or new radio (NR), long term evolution (LTE) system, LTE frequency division duplex (FDD) system, LTE time division duplex (TDD) system, universal mobile telecommunication system (UMTS), future communication systems after 5G network, vehicle to everything (V2X) communication system, machine to machine (M2M) communication, machine type communication (MTC), Internet of Things (IoT) communication system, or other communication systems.

[0099] The communication system described in this application can be a communication system based on orthogonal frequency division multiplexing (OFDM) and / or time division multiplexing (TDM), or a communication system or communication and sensing system based on frequency modulated continuous waveform (FMCW).

[0100] The terminal equipment and network equipment of this application are described below.

[0101] Terminal equipment can be a device capable of receiving core network information or a wireless terminal device that handles network device scheduling and instruction information. Wireless terminal equipment can be a device that provides voice and / or data connectivity to a user, a handheld device with wireless connectivity, another processing device connected to a wireless modem, or a device with sensing capabilities.

[0102] Terminal equipment, also known as user equipment (UE), mobile station (MS), mobile terminal (MT), etc., is a device that includes wireless communication functions, such as handheld devices or vehicle-mounted devices with wireless connectivity.

[0103] Terminal devices can communicate with one or more core networks or the Internet via a radio access network (RAN). Terminal devices can be mobile terminal devices, such as mobile phones (or "cellular" phones), computers, and data cards. For example, they can be portable, pocket-sized, handheld, computer-embedded, or vehicle-mounted mobile devices that exchange voice and / or data with the RAN. Examples include personal communication service (PCS) phones, cordless phones, session initiation protocol (SIP) phones, wireless local loop (WLL) stations, personal digital assistants (PDAs), tablets, and computers with wireless transceiver capabilities. Wireless terminal equipment can also be referred to as a system, subscriber unit, subscriber station, mobile station (MS), remote station, access point (AP), remote terminal, access terminal, user terminal, user agent, subscriber station (SS), customer premises equipment (CPE), terminal, user equipment, mobile terminal, etc. In satellite communication, terminal equipment can be a satellite communication terminal, such as a very small aperture terminal (VSAT), as well as portable stations, fixed stations, vehicle-mounted or airborne satellite communication terminals, etc. It should be understood that in these scenarios, satellite communication terminals communicate with satellites and can act as micro base stations or satellite data stations to further provide data interfaces to user equipment accessing the satellite communication terminal.

[0104] By way of example and not limitation, in this embodiment, the terminal device can also be a wearable device. Wearable devices, also known as wearable smart devices or smart wearable devices, are a general term for devices that utilize wearable technology to intelligently design and develop everyday wearables, such as glasses, gloves, watches, clothing, and shoes. Wearable devices are portable devices that are worn directly on the body or integrated into the user's clothing or accessories. Wearable devices are not merely hardware devices, but also achieve powerful functions through software support, data interaction, and cloud interaction. Broadly speaking, wearable smart devices include those that are feature-rich, large in size, and can achieve complete or partial functions without relying on a smartphone, such as smartwatches or smart glasses, as well as those that focus on a specific type of application function and require the use of other devices such as smartphones, such as various smart bracelets, smart helmets, and smart jewelry for vital sign monitoring.

[0105] Furthermore, terminal devices can also be terminal devices for communication systems evolved from fifth-generation (5G) communication systems (such as 5G Advanced or future communication systems). For example, the form and function of communication terminals can be further expanded, including but not limited to vehicles, cellular network terminals (integrating satellite terminal functions), drones, Internet of Things (IoT) devices, as well as virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in vehicle-to-everything (V2X) communication, wireless terminals in self-driving vehicles, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, or wireless terminals in smart homes. For example, wireless terminals in V2X communication can be in-vehicle equipment, vehicle-mounted equipment, in-vehicle modules, vehicles, etc. Wireless terminals in industrial control can be cameras, robots, etc. Wireless terminals in smart homes can be televisions, air conditioners, robot vacuums, speakers, set-top boxes, etc.

[0106] In this embodiment, the apparatus for implementing the functions of the terminal device can be the terminal device itself, or a component of the terminal device, such as a communication module, a circuit or chip responsible for communication functions (e.g., a modem chip, also known as a baseband chip, or a system-on-a-chip (SoC) chip or system-in-package (SIP) chip containing a modem core), a chip system, or a processor, etc., or a logic module or software capable of implementing all or part of the functions of the terminal device. In this embodiment, the terminal device is used as an example to illustrate the apparatus for implementing the functions of the terminal device, and this does not constitute a limitation on the solution of this embodiment.

[0107] The network device in this application embodiment is a means deployed in a radio access network to provide wireless communication functions for terminal devices. It can refer to a radio access network (RAN) node (or device) or base station that connects the terminal device to the wireless network. Currently, some common examples of access network nodes (or devices) include: Node B (NB), evolved Node B (eNB or eNodeB), generation node B (gNB) in 5G NR systems, nodes in future communication systems (e.g., xNodeB), transmission reception point (TRP), transmitting point (TP), transmission measurement function (TMF), radio network controller (RNC), base station controller (BSC), base transceiver station (BTS), access point (AP), etc. Furthermore, in network architectures such as cloud radio access network (CloudRAN) or open radio access network (ORAN), the access network device can be a device including CU and / or DU. In the RAN system, which includes CUs and DUs, the protocol layers of gNBs are separated. Some protocol layer functions are centrally controlled by the CU, while the remaining functions are distributed in the DU, which is centrally controlled by the CU. The separation of CUs and DUs can be based on the protocol stack. For example, one possible separation method is to deploy the RRC, Service Data Adaptation Protocol (SDAP), and Packet Data Convergence Protocol (PDCP) layers in the CU, and the remaining Radio Link Control (RLC), Media Access Control (MAC), and Physical (PHY) layers in the DU. CUs and DUs are connected via the F1 interface. A CU, representing its associated gNB, connects to the core network via the NG interface, and a CU, representing its associated gNB, connects to other gNBs (or other CUs) via the Xn interface. In the actual deployment of RAN equipment, in addition to the logical gNBs composed of CUs and DUs, the RAN equipment also includes RUs (not shown in the figure).An RU is a hardware unit that includes some PHY layer functionality and / or antenna equipment. Optionally, the RU can be configured independently of the antenna equipment (e.g., an antenna line device (ALD)) or integrated with it. For example, in a 5G NR system, the aforementioned RU can be an active antenna unit (AAU), which is a processing unit integrating a remote radio unit (RRU) (or remote radio head (RRH)) and antenna equipment. In a satellite communication system, the network equipment can be a satellite or access network equipment mounted on a satellite.

[0108] It should be noted that in practical applications, there may be multiple ways to deploy access network devices, and this application does not limit them.

[0109] In some examples, the CU can be split into control plane CU nodes (central unit-control plane (CU-CP)) and user plane CU nodes (central unit-user plane (CU-UP)). The CU-CP is a logical node carrying the RRC layer and the PDCP-C (control plane part of PDCP) layer, used to implement the CU's control plane functions. The CU-CP can interact with network elements in the core network used to implement control plane functions. These network elements in the core network can be access and mobility function (AMF) network elements, such as the access and mobility management function (AMF) in a 5G system. The AMF network element is responsible for mobility management in the mobile network, such as terminal device location updates, terminal device registration with the network, and terminal device handover. The CU-UP is a logical node carrying the SDAP layer and the PDCP-U (user plane part of PDCP) layer, used to implement the CU's user plane functions. The CU-UP can interact with network elements in the core network used to implement user plane functions. In the core network, network elements used to implement user plane functions, such as the user plane function (UPF) in a 5G system, are responsible for forwarding and receiving data in terminal devices. The above configuration of CU and DU is merely an example; the functions of CU and DU can be configured as needed. For example, CU or DU can be configured to have more protocol layer functions, or to have only some protocol layer processing functions. For instance, some RLC layer functions and protocol layer functions above the RLC layer can be placed in the CU, while the remaining RLC layer functions and protocol layer functions below the RLC layer can be placed in the DU. Furthermore, the functions of CU or DU can be divided according to service type or other system requirements, such as by latency, placing functions that need to meet low latency requirements in the DU and functions that do not need to meet such latency requirements in the CU.

[0110] In some examples, a DU is a logical node that carries the RLC layer, MAC layer, higher physical layer (Higher PHY) layer, and other functions. In some examples, a DU can control at least one RU. The DU connects to the RU through interfaces, which can be fronthaul interfaces. In some examples, the Higher PHY layer includes the PHY layer processing, such as forward error correction (FEC) encoding and decoding, scrambling, modulation, and demodulation.

[0111] In some examples, the RU is a logical node that carries both lower physical layer (PHY) and radio frequency (RF) processing. In some examples, the RU can be a 3GPP TRP or RRH or other similar entity. In some examples, the Low-PHY includes PHY processing functions such as Fast Fourier Transform (FFT), Inverse Fast Fourier Transform (IFFT), digital beamforming, and filtering. The RU communicates with one or more UEs via a radio link.

[0112] The DU and RU can be co-located or not. The DU and RU exchange control plane and user plane information via a lower-layer split-control, user, and synchronization (LLS-CUS) interface through a fronthaul link. LLS-CUS may include LLS-C and LLS-U interfaces that provide the control plane (C-Plane) and user plane (U-Plane), respectively. In some examples, the control plane (C-Plane) refers to real-time control between the DU and RU. The DU and RU exchange management information via an LLS-M interface on the fronthaul link; the management plane (M-Plane) refers to non-real-time management operations between the DU and RU.

[0113] DU and RU can cooperate to implement the functions of the PHY layer. A DU can be connected to one or more RUs. The functions of DU and RU can be configured in various ways depending on the design. For example, a DU can be configured to implement baseband functions, and an RU can be configured to implement mid-RF functions. Another example is that a DU can be configured to implement higher-level functions in the PHY layer, and an RU can be configured to implement lower-level functions in the PHY layer, or to implement both lower-level and RF functions. Higher-level functions in the physical layer can include a portion of the physical layer's functions that are closer to the MAC layer, while lower-level functions in the physical layer can include another portion of the physical layer's functions that are closer to the mid-RF side.

[0114] In different systems, CU (or CU-CP and CU-UP), DU, or RU may have different names, but those skilled in the art will understand their meaning. For example, in an ORAN system, CU can also be called O-CU (open CU), DU can also be called O-DU, CU-CP can also be called O-CU-CP, CU-UP can also be called O-CU-UP, and RU can also be called O-RU. For ease of description, this application uses CU, CU-CP, CU-UP, DU, and RU as examples.

[0115] Optionally, for network elements in the ORAN system, each network element can implement the protocol layer functions shown in Table 1 below.

[0116] Table 1

[0117] It should be noted that in the ORAN system, the access network equipment in this application can be one or more network elements listed in Table 1 above.

[0118] The architecture of the CU and DU of the access network equipment is described below. The access network equipment includes at least one CU and at least one DU. Optionally, the access network equipment may also include at least one RU.

[0119] The following example uses an access network device consisting of one CU and one DU. The CU has some core network functions and can include CU-CP and CU-UP. The CU and DU can be configured according to the protocol layer functions of the wireless network they implement. For example, the CU may be configured to implement the Packet Data Convergence Protocol (PDCP) layer and above (e.g., RRC and / or SDAP layers). The DU may be configured to implement protocol layers below the PDCP layer (e.g., RLC, MAC, and / or physical (PHY) layers). Alternatively, the CU may be configured to implement protocol layers above the PDCP layer (e.g., RRC and / or SDAP layers), and the DU may be configured to implement protocol layers below the PDCP layer (e.g., RLC, MAC, and / or PHY layers).

[0120] When a CU includes CU-CP and CU-UP, CU-CP is used to implement the control plane functions of the CU, and CU-UP is used to implement the user plane functions of the CU. For example, when a CU is configured to implement the functions of the PDCP layer, RRC layer, and SDAP layer, CU-CP is used to implement the RRC layer functions and the control plane functions of the PDCP layer, and CU-UP is used to implement the SDAP layer functions and the user plane functions of the PDCP layer.

[0121] The CU-CP can interact with network elements in the core network used to implement control plane functions. These network elements can be access and mobility function (AMF) network elements, such as the access and mobility management function (AMF) in a 5G system. The AMF is responsible for mobility management in the mobile network, such as terminal device location updates, terminal device registration with the network, and terminal device handover.

[0122] CU-UP can interact with network elements in the core network used to implement user plane functions. These network elements, such as the user plane function (UPF) in a 5G system, are responsible for forwarding and receiving data in terminal devices.

[0123] Optionally, the ORAN architecture also includes a RAN Intelligent Controller (RIC) module.

[0124] In this embodiment, the apparatus for implementing the functions of the network device can be a network device itself, or a component of an access network device, such as a communication module, processor, chip, chip system, or circuit that can be applied in the access network device. It can also be a logic module or software that can implement all or part of the functions of the access network device. This apparatus can be installed in the network device or used in conjunction with the network device. In this embodiment, only a network device is used as an example to illustrate the apparatus for implementing the functions of the access network device, and this does not constitute a limitation on the solution of this embodiment.

[0125] It should be noted that network devices and / or terminal devices can be deployed on land, including indoors or outdoors, handheld or vehicle-mounted; they can also be deployed on water; and they can also be deployed in the air on airplanes, balloons, and satellites. This application does not limit the scenario in which the network devices and terminal devices are located. Furthermore, terminal devices and network devices can be hardware devices; they can also be software functions running on dedicated or general-purpose hardware, such as virtualization functions instantiated on a platform (e.g., a cloud platform); or they can be entities that include dedicated or general-purpose hardware devices and software functions. This application does not limit the specific form of terminal devices and network devices.

[0126] For ease of understanding, the technical terms involved in the embodiments of this application are briefly introduced below:

[0127] (1) Configuration and Pre-configuration: In this application, both configuration and pre-configuration are used. Configuration refers to the access network device sending configuration information or parameter values ​​of some parameters to the terminal device through messages or signaling, so that the terminal device can determine the communication parameters or resources during transmission based on these values ​​or information. Pre-configuration is similar to configuration and can be parameter information or parameter values ​​that the access network device and the terminal device have negotiated in advance, or parameter information or parameter values ​​that the access network device or the terminal device uses as specified by the standard protocol, or parameter information or parameter values ​​that are pre-stored in the access network device or the terminal device. This application does not limit this.

[0128] Furthermore, these values ​​and parameters can be changed or updated.

[0129] (2) The terms "system" and "network" in the embodiments of this application can be used interchangeably. "At least one" means one or more, and "more than one" means two or more. "And / or" describes the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can mean: A exists alone, A and B exist simultaneously, or B exists alone, where A and B can be singular or plural. The character " / " generally indicates that the related objects before and after are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, "at least one of A, B and C" includes A, B, C, AB, AC, BC or ABC. And, unless otherwise specified, the ordinal numbers such as "first" and "second" mentioned in the embodiments of this application are used to distinguish multiple objects and are not used to limit the order, sequence, priority or importance of multiple objects.

[0130] (3) In the embodiments of this application, "send" and "receive" indicate the direction of signal transmission. For example, "send information to XX" can be understood as the destination of the information being XX, which may include sending directly through the air interface or sending indirectly through the air interface by other units or modules. "Receive information from YY" can be understood as the source of the information being YY, which may include receiving directly from YY through the air interface or receiving indirectly from YY through the air interface by other units or modules. "Send" can also be understood as the "output" of the chip interface, and "receive" can also be understood as the "input" of the chip interface.

[0131] In other words, sending and receiving can occur between devices, such as between access network devices and terminal devices, or within a device, such as between components, modules, chips, software modules, or hardware modules within the device via buses, wiring, or interfaces.

[0132] It is understandable that information may undergo necessary processing, such as encoding and modulation, between the source and destination, but the destination can understand the valid information from the source. Similar statements in this application can be interpreted in a similar way and will not be elaborated further.

[0133] (4) In the embodiments of this application, "instruction" may include direct instruction and indirect instruction, as well as explicit instruction and implicit instruction. The information indicated by a certain piece of information (as described below, the instruction information) is called the information to be instructed. In the specific implementation process, there are many ways to indicate the information to be instructed, such as, but not limited to, directly indicating the information to be instructed, such as the information to be instructed itself or its index. It can also indirectly indicate the information to be instructed by indicating other information, where there is an association between the other information and the information to be instructed; or it can only indicate a part of the information to be instructed, while the other parts of the information to be instructed are known or pre-agreed upon. For example, the instruction can be implemented by using a pre-agreed (e.g., protocol pre-defined or pre-configured) arrangement order of various information, thereby reducing the instruction overhead to a certain extent. This application does not limit the specific method of instruction. It is understood that for the sender of the instruction information, the instruction information can be used to indicate the information to be instructed; for the receiver of the instruction information, the instruction information can be used to determine the information to be instructed.

[0134] (5) Channel state information (CSI): This represents the attenuation factor of the signal on each transmission path, i.e., the value of each element in the channel gain matrix H. It covers various channel effects such as signal scattering, environmental attenuation (e.g., multipath fading or shadowing fading), and power decay of distance. These effects jointly influence the transmission characteristics of the signal in the channel.

[0135] Depending on the application location, CSI can be divided into transmitter-side Channel State Information (CSIT) and receiver-side Channel State Information (CSIR). Generally, transmitter-side CSIT is more important because it can compensate for channel attenuation in advance, thereby achieving high-speed and reliable transmission.

[0136] The base station can recover the downlink channel based on the CSI and use the recovered downlink channel to determine the precoding matrix and the precoding for downlink data transmission. By precoding the downlink data of multiple UEs, spatial multiplexing can be achieved, thereby reducing interference between UEs, improving the signal-to-interference-plus-noise ratio (SINR) at the receiver, and thus increasing the system throughput.

[0137] (6) Black box model: This model focuses on the external behavior and function of a system without considering its internal implementation details. In this model, the system is regarded as a closed box, and testers or researchers can only understand the system's behavior through inputs and outputs.

[0138] (7) White-box model: This model focuses on the internal structure and implementation details of the system, that is, understanding the system's internal logic, algorithms, data structures, etc. In the white-box model, testers or researchers usually need to have the system's source code or detailed design documents in order to design test cases based on this internal information.

[0139] (8) Knowledge distillation: is a model compression technique that transfers knowledge from a large, complex model (teacher model) to a smaller model (student model). The aim is to preserve the performance of the teacher model while reducing the computational complexity of the student model, making it suitable for running on resource-constrained devices.

[0140] (9) Artificial Intelligence (AI):

[0141] AI can endow machines with human-like intelligence, for example, allowing them to use computer hardware and software to simulate certain intelligent human behaviors. To achieve artificial intelligence, machine learning methods can be employed. In machine learning, machines learn (or train) neural network models using training data. This neural network model can also be called an AI model, a large AI model, a large language model (LLMs), or simply a model. This model represents the mapping from input to output. The learned model can be used for reasoning (or prediction), that is, it can be used to predict the output corresponding to a given input. This output can also be called the reasoning result (or prediction result).

[0142] Machine learning can include supervised learning, unsupervised learning, and reinforcement learning. Unsupervised learning can also be called learning without supervision.

[0143] Supervised learning, based on collected sample values ​​and labels, uses machine learning algorithms to learn the mapping relationship between sample values ​​and labels, and then expresses this learned mapping relationship using an AI model. The process of training the machine learning model is the process of learning this mapping relationship. During training, sample values ​​are input into the model to obtain the model's predicted values, and the model parameters are optimized by calculating the error between the model's predicted values ​​and the sample labels (ideal values). After the mapping relationship is learned, it can be used to predict new sample labels. The mapping relationship learned in supervised learning can include linear or non-linear mappings. Based on the type of label, the learning task can be divided into classification tasks and regression tasks.

[0144] Unsupervised learning relies on collected sample values ​​to discover inherent patterns within the samples themselves. One type of unsupervised learning algorithm uses the samples themselves as supervisory signals, meaning the model learns the mapping relationship from sample to sample; this is called self-supervised learning. During training, model parameters are optimized by calculating the error between the model's predictions and the samples themselves. Self-supervised learning can be used for signal compression and decompression recovery applications; common algorithms include autoencoders and generative adversarial networks.

[0145] Reinforcement learning, unlike supervised learning, is a type of algorithm that learns problem-solving strategies through interaction with its environment. Unlike supervised and unsupervised learning, reinforcement learning problems do not have explicit "correct" action labels. The algorithm needs to interact with the environment, obtain reward signals from the environment, and then adjust its decision actions to obtain a larger reward signal value. For example, in downlink power control, the reinforcement learning model adjusts the downlink transmission power of each user based on the total system throughput feedback from the wireless network, aiming to achieve a higher system throughput. The goal of reinforcement learning is to find the decision action that maximizes the cumulative reward over a relatively long period. Training in reinforcement learning is achieved through iterative interaction with the environment.

[0146] Neural networks (NNs) are a specific model in machine learning techniques. According to the general approximation theorem, neural networks can theoretically approximate any continuous function, thus enabling them to learn arbitrary mappings. Traditional communication systems rely on extensive expert knowledge to design communication modules, while deep learning communication systems based on neural networks can automatically discover hidden pattern structures from large datasets, establish mapping relationships between data, and achieve performance superior to traditional modeling methods.

[0147] The idea behind neural networks comes from the neuronal structure of the brain. For example, each neuron performs a weighted summation of its input values ​​and outputs the result through an activation function.

[0148] Figure 1A shows a schematic diagram of a neuron structure. Assume the input to the neuron is x = [x0, x1, ..., x...]. n The weights corresponding to each input are w = [w, w1, ..., w2]. n ], where n is a positive integer, w i and x i It can be any possible type, such as a decimal, an integer (e.g., 0, a positive integer, or a negative integer), or a complex number. i As x i The weights are used to assign weights to x. i Weighting is applied. The bias for the weighted summation of the input values ​​is, for example, b. Activation functions can take many forms. Assuming a neuron's activation function is y = f(z) = max(0, z), then the neuron's output is: For example, if the activation function of a neuron is y = f(z) = z, then the output of that neuron is: Here, b can be any possible type, such as a decimal, an integer (e.g., 0, a positive integer, or a negative integer), or a complex number. The activation functions of different neurons in a neural network can be the same or different.

[0149] Furthermore, neural networks generally consist of multiple layers, each of which may include one or more neurons. Increasing the depth and / or width of a neural network can improve its expressive power, providing more powerful information extraction and abstract modeling capabilities for complex systems. The depth of a neural network can refer to the number of layers it includes, and the number of neurons in each layer can be called the width of that layer. In one implementation, a neural network includes an input layer and an output layer. The input layer processes the received input information through neurons and passes the processing result to the output layer, which then obtains the output of the neural network. In another implementation, a neural network includes an input layer, hidden layers, and an output layer. The input layer processes the received input information through neurons and passes the processing result to the hidden layer. The hidden layer calculates the received processing result and passes the calculation result to the output layer or the next adjacent hidden layer, ultimately obtaining the output of the neural network. A neural network may include one hidden layer or multiple sequentially connected hidden layers, without limitation.

[0150] Neural networks, for example, are deep neural networks (DNNs). Depending on how the network is constructed, DNNs can include feedforward neural networks (FNNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs).

[0151] Figure 1B is a schematic diagram of an FNN network. A characteristic of FNN networks is that neurons in adjacent layers are completely connected pairwise. This characteristic makes FNNs typically require a large amount of storage space, leading to high computational complexity.

[0152] CNNs are neural networks specifically designed to process data with a grid-like structure. For example, time-series data (e.g., discrete sampling along a time axis) and image data (e.g., two-dimensional discrete sampling) can both be considered grid-like data. CNNs do not use all the input information at once for computation; instead, they use a fixed-size window to extract a portion of the information for convolution operations, which significantly reduces the computational cost of model parameters. Furthermore, depending on the type of information extracted by the window (e.g., people and objects in an image represent different types of information), each window can use different convolution kernels, allowing CNNs to better extract features from the input data.

[0153] Recurrent Neural Networks (RNNs) are a type of distributed neural network (DNN) that utilizes feedback time-series information. The input to an RNN includes the current input value and its own output value from the previous time step. RNNs are well-suited for acquiring temporally correlated sequence features, and are particularly applicable to applications such as speech recognition and channel coding / decoding.

[0154] In the model training process described above for machine learning, a loss function can be defined. The loss function describes the difference or discrepancy between the model's output value and the ideal target value. The loss function can be expressed in various forms, and there are no restrictions on its specific form. The model training process can be viewed as follows: by adjusting some or all of the model's parameters, the value of the loss function is made to be less than a threshold value or to meet the target requirement.

[0155] A model can also be called an AI model, a rule, or other names. An AI model can be considered a specific method for implementing AI functions. An AI model represents the mapping relationship or function between the model's input and output. AI functions can include one or more of the following: data collection, model training (or model learning), model information dissemination, model inference (or model reasoning, inference, or prediction, etc.), model monitoring or model validation, or inference result publication, etc. AI functions can also be called AI (related) operations or AI-related functions.

[0156] (10) A two-sided model, also called a bilateral model, collaborative model, dual model, or two-sided model, refers to an AI model composed of two or more sub-AI models combined together in this application. However, the multiple sub-AI models are not deployed on the same node, but are distributed across two or more nodes. The multiple sub-AI models constituting this AI model need to be matched with each other. An auto-encoder (AE) in which the encoder and decoder are deployed on different nodes is a typical bilateral model. The encoder and decoder of the AE need to be matched with each other, that is, the decoder can understand the output of the encoder and can decode the output of the encoder into the desired output. The model structure of an embodiment of the two-sided model can be seen in Figure 1C.

[0157] Typically, in a dual-end model, sub-model A and sub-model B are trained simultaneously, meaning they are matched. Sub-model A and sub-model B can be trained on the same node and then deployed on two separate nodes, or they can be trained on two nodes in a distributed manner.

[0158] In this application, unless otherwise specified, the same or similar parts between the various embodiments can be referred to each other. In the various embodiments of this application, and the various methods / designs / implementations within each embodiment, unless otherwise specified or logically conflicting, the terminology and / or descriptions between different embodiments and between the various methods / designs / implementations within each embodiment are consistent and can be mutually referenced. The technical features in different embodiments and the various methods / designs / implementations within each embodiment can be combined to form new embodiments, methods, or implementations based on their inherent logical relationships. The following descriptions of the embodiments of this application do not constitute a limitation on the scope of protection of this application.

[0159] The communication method provided in this application embodiment can be applied to the communication system shown in Figures 2A to 2B.

[0160] Please refer to Figure 2A, which is a schematic diagram of the architecture of the communication system 1000 used in the embodiments of this application.

[0161] As shown in Figure 2A, the communication system includes a wireless access network 100 and a core network 200. Optionally, the communication system 1000 may also include an Internet 300. The wireless access network 100 may include at least one access network device (also understood as a network device, as shown in Figure 2A 110a and 110b) and at least one terminal (also understood as the terminal device described above, as shown in Figure 2A 120a-120j). Furthermore, the access network device (or wireless access network device) may be a macro base station (as shown in Figure 2A 110a), a micro base station or an indoor station (as shown in Figure 2A 110b), a relay node or a donor node, etc. It is understood that all or part of the functions of the access network device in this application may also be implemented through software functions running on hardware, or through virtualization functions instantiated on a platform (e.g., a cloud platform). The embodiments of this application do not limit the specific technology or specific device form used in the wireless access network device.

[0162] For ease of description, the communication system illustrated in Figure 2A is described using the example of an access network device as a base station and terminal devices as terminals. It is understood that when the communication system includes an integrated access and backhaul (IAB) network, the base station can be an IAB node. It should be noted that in the embodiments of this application, the base station and the access network device can be interchanged.

[0163] In this application, the base station and the terminal can be fixed or mobile. The base station and the terminal can be deployed on land, including indoors or outdoors, handheld or vehicle-mounted, on water, or in the air on aircraft, balloons, and satellites. The embodiments of this application do not limit the application scenarios of the base station and the terminal.

[0164] The roles of base stations and terminals can be relative. For example, the helicopter or drone 120i in Figure 2A can be configured as a mobile base station. For terminals 120j that access the wireless access network 100 through 120i, terminal 120i is a base station. However, for base station 110a, 120i is a terminal, meaning that 110a and 120i communicate via a wireless air interface protocol. Of course, 110a and 120i can also communicate via a base station-to-base station interface protocol. In this case, relative to 110a, 120i is also a base station. Therefore, both base stations and terminals can be collectively referred to as communication devices. 110a and 110b in Figure 2A can be called communication devices with base station functions, and 120a-120j in Figure 2A can be called communication devices with terminal functions.

[0165] Communication between base stations and terminals, between base stations, and between terminals can be conducted using licensed spectrum, unlicensed spectrum, or both simultaneously. Communication can be achieved using spectrum below 6 GHz, spectrum above 6 GHz, or both simultaneously. The embodiments of this application do not limit the spectrum resources used for wireless communication.

[0166] Taking the communication system shown in Figure 2A as an example, in addition to performing communication-related services, different devices (including network devices and network devices, network devices and terminal devices, and / or terminal devices and terminal devices) may also perform AI-related services.

[0167] As shown in Figure 2B, taking a network device as a base station as an example, the base station can perform communication-related services and AI-related services with one or more terminal devices, and different terminal devices can also perform communication-related services and AI-related services.

[0168] The technical solutions provided in this application can be applied to wireless communication systems (such as the systems shown in Figures 2A and 2B). For example, AI network elements can be introduced into the communication system provided in this application to realize some or all AI-related operations. AI network elements can also be called AI nodes, AI devices, AI entities, AI modules, AI models, or AI units, etc. The AI ​​network element can be built into a network element within the communication system. For example, the AI ​​network element can be an AI module built into: access network equipment, core network equipment, cloud server, or operation, administration, and maintenance (OAM) to realize AI-related functions. The OAM can act as the network management system for the core network equipment and / or the access network equipment. Alternatively, the AI ​​network element can also be an independently set network element in the communication system. Optionally, the terminal or its built-in chip can also include an AI entity to realize AI-related functions.

[0169] In a wireless communication system, taking a dual-end model with sub-models A and B deployed on the UE and network device respectively as an example, in order to accurately complete the inference of the dual-end model, the network device needs to transmit training data or a trained sub-model A to the UE, enabling the terminal device to build a model with better performance. Specifically, network device data transmission refers to the network device sending training data to the terminal device, which then uses this data to train sub-model A; model transmission refers to the network device completing the training of sub-models A and B and then sending sub-model A to the terminal device.

[0170] To ensure the performance of dual-end models, both sub-models A and B typically need to have a large number of trainable parameters to provide high model capacity. However, network devices and terminal devices are limited by their own computing power and cannot complete the training and inference of large models in practice. Furthermore, in order to analyze the inference results of AI models and explain their behavior, network devices and terminal devices sometimes need to build interpretable white-box AI models as sub-models A and B. These interpretable models originate from complex mathematical optimization problems and are often difficult to train with high performance.

[0171] This results in smaller AI models being built and trained on network devices or terminal devices, and lower performance of these AI models.

[0172] To address the aforementioned problems, this application provides a model training method and related apparatus, which will be described in detail below with reference to the accompanying drawings.

[0173] The model training method provided in this application embodiment can be implemented through the interaction of a first communication device and a second communication device. The first communication device can be a communication device for receiving and sending information, or a communication device capable of supporting the communication device in implementing the model training method, such as a chip. Exemplarily, the first communication device is a terminal device, or a chip disposed in a terminal device to implement the functions of the terminal device, or other components for implementing the functions of the terminal device; the first communication device can also be a network device, or a chip disposed in a network device to implement the functions of the network device, or other components for implementing the functions of the network device. In the following description, the example of the first communication device being a terminal device / network device will be used. The second communication device can be a communication device for data exchange and communication, or a communication device capable of supporting the communication device in implementing the model training method, such as a chip. Exemplarily, the second communication device is a core network device, or a chip disposed in a core network device to implement the functions of the core network device, or other components for implementing the functions of the core network device. In the following description, the example of the second communication device being a core network device will be used.

[0174] As shown in Figure 3, the model training method provided in the embodiments of this application includes:

[0175] S301. The second communication device sends the model structure information of the second model to the first communication device, and correspondingly, the first communication device receives the model structure information of the second model from the second communication device.

[0176] In this application, a first model is deployed on the first communication device.

[0177] In this application, the first model can be used by a network device to measure the channel state information (CSI) of the downlink channel of a terminal device. If the first communication device is a network device, the first model can be a decoder model for measuring CSI; if the first communication device is a terminal device, the first model can be an encoder model for measuring CSI.

[0178] In this application, the first model includes multiple first modules. The function of each first module may be the same or different depending on the needs of the first model. For example, the first module may be a module for preprocessing the raw input data, such as normalization and cleaning; the first module may be a multi-head attention module that captures the relationship between the input data through multiple attention heads; or it may be a position encoding module that adds position information to each position in the input sequence. The type and number of first modules are not limited here.

[0179] In this application, the number of first communication devices can be one or more. When there are multiple first communication devices, the second communication device sends the model structure information of the second model to the multiple first communication devices respectively.

[0180] In this application, the second communication device can be a core network device. The core network is a key component of a communication network, primarily responsible for data transmission, routing, and control. In mobile communication networks, the core network is located between the radio access network (RAN) and end users, providing functions such as user authentication, mobility management, service data routing, and interconnection with other networks.

[0181] In this application, the second model is deployed in a second communication device, and the second model includes multiple second modules. The number of model parameters in the first model is less than or equal to the number of model parameters in the second model.

[0182] In this application, the second communication device can obtain historical measurement data of the channel from the RAN side and train a second model for knowledge distillation based on this data. Because the amount of historical measurement data of the channel is enormous, a relatively large and highly accurate model can be obtained.

[0183] In this application, the model structure information of the second model is used to indicate the dimensions of each second module. The dimensions of each second module include the input dimension and the output dimension. The model structure information of the second model may also include the number of model layers, i.e., the number of second modules. For example, if the second model includes three second modules (i.e., three layers), then the model structure information may include that module 1 has an input dimension of 3×2 and an output dimension of 4×3, module 2 has an input dimension of 4×3 and an output dimension of 5×2, and module 3 has an input dimension of 5×2 and an output dimension of 3×3.

[0184] S302. The first communication device sends a request message to the second communication device based on the model structure information, and correspondingly, the second communication device receives the request message from the first communication device.

[0185] In this application, the request information is used to request knowledge data of at least one target module that is in the same dimension as at least one first module.

[0186] In this application, the target module is comprised of multiple second modules. The input dimension of the target module is the same as the input dimension of the corresponding first module, and the output dimension of the target module is the same as the output dimension of the corresponding first module. Taking the above example, if the input dimension of the first module is 4×3 and the output dimension is 5×2, then the request information sent by the first communication device to the second communication device includes the module 2 number information, which is used to request the second communication device to send the knowledge data of module 2 to the first communication device.

[0187] Optionally, if the dimension of the second module in the second model is different from the dimension of the first module in the first model, the first communication device can select the second module whose dimension is most similar to that of the first module as the target module. After receiving the knowledge data of the target module, the first communication device can make the dimension of the target module the same as that of the first module by adding a matrix.

[0188] In this application, the first communication device can obtain the target module by sending the target module's number information in the request information. For example, the decoder in the second communication device contains 4 modules, numbered {1, 2, 3, 4}, and the decoder in the first communication device contains 3 modules, numbered {1, 2, 3}. If the first communication device needs to execute distillation module 1 and module 2, then the module numbers sent by the first communication device to the second communication device in the request information are {1, 2}.

[0189] S303. The second communication device sends knowledge data of at least one target module to the first communication device, and correspondingly, the first communication device receives knowledge data of at least one target module from the second communication device.

[0190] In this application, knowledge data may include training data contained in a large-scale dataset, and prediction results obtained from the training data through a second model. The prediction results are data labels obtained through the second model, which are used to perform regression prediction on the data.

[0191] S304. The first communication device uses the knowledge data of the target module to train the first model.

[0192] In this application, the first communication device can train the first module corresponding to the first model based on different types of knowledge data.

[0193] As can be seen from the above description, in the solution provided by the embodiments of this application, the first communication device trains the first model in the first communication device by using the knowledge data of the corresponding module in the second model sent by the second communication device. Under the background of limited computing power of the first communication device, not only is the training speed of the first model improved, but also the performance level of the first model is improved. Moreover, the first model is an interpretable AI model, which can help users to debug, locate errors and optimize performance of the first model more effectively.

[0194] In one possible embodiment, when the first model is an interpretable decoder, the structure of the first model can be solved using the following formula:

[0195] in, express The derivative (or rate of change) with respect to time (or iteration step) is used to describe How to update Z(t) over time (or iteratively). d×N Let Z(t) represent the output of the decoder at layer t ∈ {1,…,K}, and let E(t) represent a matrix that changes over time (or iteration steps) to transform Z(t) to obtain...

[0196] V k ∈R L×d These are the learnable parameters of the model. R c (Z;V [K] () represents a matrix with Z and a weight matrix V [K] (i.e., all V) K The regularization term or complexity measure related to the set. Indicates R c Find the differential. I is the identity matrix, and L, d, and N represent the dimension parameters of the matrix, where L and d are the weight matrices V. [K] The number of rows and columns, where N is the number of data samples or features (depending on the context).

[0197] In one possible embodiment, the network device can build a corresponding decoder AI model structure based on its own computing power or interpretability requirements.

[0198] In one possible embodiment, when the first model is an interpretable encoder, the structure of the first model can be solved using the following formula: max f(H;U) E Z [R(Z)-α‖Z‖ p -R c (Z;U [K] )

[0199] in, Z∈R d×N ,U∈R L×d .

[0200] Here, f(H; U) represents the encoder function, which accepts input H (which may be the original data or data after some transformation) and a weight matrix U, and outputs the encoded result. [K] Let represent the weight matrix of the k-th layer in the encoder, which are learnable parameters. R(Z) represents a regularization term or complexity metric related to Z, used to measure a certain characteristic of the data (such as distribution or feature complexity); α represents a pre-set parameter used to balance the weights between the regularization term and the encoding error; ||Z|| p E is the p-norm of Z, a measure of the size or complexity of Z. Z This represents the expectation of Z, which is the average of all possible Z values. It is used to approximate the expectation by sampling the training dataset.

[0201] In one possible embodiment, the terminal device can construct a corresponding decoder AI model structure based on its own computing power or interpretability requirements.

[0202] In one possible embodiment, the process of the first communication device training the first model using knowledge data may further include the following steps, which can be referred to Figure 4. As shown in Figure 4, the process may further include the following steps:

[0203] S401. Use knowledge data to train the first module in the first model that corresponds to the target module; wherein, the knowledge data includes input data and the first output result of the input data in the target module.

[0204] In this application, the knowledge data includes at least one of the following: the input data and output data of the target module, the attention input data and attention map of the target module, the attention input data and attention logic value of the target module, the mean and variance of the input data of the target module and the output of the batch normalization layer, and the training parameters of the target module.

[0205] In this application, the input and output feature data of the target module can be represented as (x, f t (x)), where x represents the input of the module, f t (x) represents the module's output. For example, if the module number sent from the first communication device to the second communication device is {1, 2}, then the second communication device will send the inputs x1 and φ(f) of module 1 and module 2. t (x1)) and x2 and φ(f t (x2) is sent as module knowledge data to the first communication device.

[0206] In this application, the attention input data and attention map of the target module can be represented as follows: Where x t Let A represent the input of the t-th Attention in the module. t (x) represents the Attention Map generated by the t-th Attention. For example, if the module numbers sent from the first communication device to the second communication device are {1, 2}, then the second communication device will send the inputs of module 1 and module 2 respectively. and output The module knowledge data is sent to the first communication device. The attention map shows the regions or features that the model focuses on when processing the input data.

[0207] In this application, the attention input data and attention logic value of the target module can be represented as follows: Where x t B represents the input of the t-th Attention in the module. t (x) represents the Attention logits generated by the t-th Attention. The attention logits reflect the degree of correlation between different elements.

[0208] In this application, the mean and variance of the input data of the target module and the output of the batch normalization layer can be expressed as: Where x t μ represents the input of the t-th batch normalization layer in the target module. t This represents the mean calculated by the t-th batch normalization layer. This represents the variance calculated by the t-th batch normalization layer. The batch normalization layer is used to normalize the data in each mini-batch.

[0209] In this application, the training parameters of the target module refer to a set of parameters used to train a specific module (or layer) in a machine learning or deep learning model. These parameters are updated during training using optimization methods such as backpropagation and gradient descent to minimize the loss function and thus improve the model's performance.

[0210] S402. Adjust the parameters of the first module corresponding to the target module in the first model using a loss function determined based on knowledge data.

[0211] In this application, the loss function is a function that measures the difference between the model's predicted value and the true value. The first communication device can train the first model not only using the original data in the knowledge data, but also using the output of the original data in the knowledge data in the second model to train the first model.

[0212] In this application, the first communication device acquires the second output result of the first module corresponding to the target module in the first model from the input data; calculates the distance between the second output result and the first output result; calculates the value of the loss function based on the distance; and adjusts the parameters of the first module corresponding to the target module in the first model using the value of the loss function.

[0213] In this application, the distance between the first output result and the second output result can be calculated using the following formula:

[0214] 1. When the knowledge data is the input and output feature data (x, f) of the target module. t When (x)), the first communication device will L=Dis(φ(f) t (x)),φ(f s (x))) is added to the loss function of the first model to adjust the parameters of the first model.

[0215] Where Dis() is a distance function, such as L2 distance or L1 distance, f s (x) represents the output generated by the knowledge distillation module of the network device or terminal device when the input is x, and φ(.) represents the transformation function, which can be a dimension reduction transformation or an identity transformation.

[0216] 2. When the knowledge data is the attention input data and attention map of the target module. At that time, the first communication device will The parameters of the first model are adjusted by adding them to the loss function of the first model.

[0217] Where Dis() is a distance function, such as L2 distance or L1 distance, and A s,t (x t The input to the knowledge distillation module performed by the network device or terminal device at the t-th attention is x. t The Attention Map generated at that time.

[0218] 3. When the knowledge data is the attention input data and attention logic value of the target module. At that time, the first communication device will The parameters of the first model are adjusted by adding them to the loss function of the first model.

[0219] Where Dis() is a distance function, such as L2 distance or L1 distance, B s,t (x t The input to the knowledge distillation module of the network device or terminal device at the t-th attention is x. t Attention logits generated during the process.

[0220] 4. When the knowledge data consists of the mean and variance of the input data and the output of the batch normalization layer. At that time, the first communication device will The parameters of the first model are adjusted by adding them to the loss function of the first model.

[0221] Where Dis() is a distance function, such as L2 distance or L1 distance, μ s,t (x t The module in a network device or terminal device that performs knowledge distillation receives x as input to the t-th batch normalization layer. t The mean generated at that time, The module in a network device or terminal device that performs knowledge distillation receives x as input to the t-th batch normalization layer. t The variance generated over time.

[0222] 5. When the knowledge data is the training parameters of the target module, the first communication device adds at least one of the above 1-4 to the loss function of the first model to adjust the parameters of the first model.

[0223] In this possible embodiment, the first communication device acquires knowledge data of a target module with the same dimensions as the first module for training the first module. This improves the performance and accuracy of the first model, and the first model has low computational cost and low memory usage, reducing the computational power requirements of the first communication device. Furthermore, the first communication device uses the output results of the knowledge data in the second model to adjust the loss parameters of the first model, increasing the similarity between the first and second models. This allows the first model to better learn the generalization ability of the second model, thereby improving the accuracy and robustness of the first model.

[0224] Referring to Figure 5 below, the model training method provided in this application is described in terms of its application on the network device side and the terminal device side:

[0225] Step 501. Generate and train the black-box encoder-decoder large model using the core network.

[0226] The core network can generate and train a large black-box model that includes encoders and decoders by acquiring historical channel measurement data from the RAN side.

[0227] Step 502. The network device generates a decoder model.

[0228] In this application, network devices can construct corresponding decoder AI model structures based on computing power or the interpretability requirements of the model.

[0229] Step 503. The terminal device generates an encoder model.

[0230] In this application, the terminal device can construct a corresponding decoder AI model structure according to the computing power or interpretability requirements of the model.

[0231] Step 504. The network device sends the numbered module for decoder distillation to the core network, and correspondingly, the core network receives the numbered module for decoder distillation from the network device.

[0232] In this application, the network device determines the numbering information of the module performing distillation based on the dimension of the module performing distillation and the dimension of the module in the core network.

[0233] Step 505. The terminal device sends the encoder distillation number module to the core network, and correspondingly, the core network receives the encoder distillation number module from the terminal device.

[0234] In this application, the terminal device determines the numbering information of the module performing distillation based on the dimension of the module performing distillation and the dimension of the module in the core network.

[0235] Step 506. The core network sends module knowledge data for the decoder to perform distillation to the network devices.

[0236] Step 507. The core network sends module knowledge data of the encoder performing distillation to the terminal equipment.

[0237] Optionally, the core network can send the module knowledge data required by the terminal device to the terminal device through network device forwarding.

[0238] Step 508. The network device performs knowledge distillation to train the decoder.

[0239] Step 509. The terminal device performs knowledge distillation training on the encoder.

[0240] Through the above steps, the knowledge data in the AI ​​model trained with the strong computing power in the core network is used to train the AI ​​models in the network devices and terminal devices respectively, which improves the performance of the models on both sides and improves the efficiency of model training while ensuring the interpretability of the models.

[0241] The communication system and model training method in the embodiments of this application have been introduced above. The model training device provided in the embodiments of this application will be described below.

[0242] As shown in Figure 6, network elements in the model training device are connected via interfaces (e.g., NG, Xn) or over-the-air interfaces. These network element nodes, such as core network equipment, access network nodes (RAN nodes), terminals, or one or more devices in the OAM, are equipped with one or more AI modules (only one is shown in Figure 6 for clarity). The access network node can be a single RAN node or can include multiple RAN nodes, for example, including CU and DU. The CU and / or DU can also be equipped with one or more AI modules. Optionally, the CU can be further divided into CU-CP and CU-UP. One or more AI models are configured in the CU-CP and / or CU-UP.

[0243] The AI ​​module is used to implement corresponding AI functions. AI modules deployed in different network elements can be the same or different. Depending on the parameter configuration, the AI ​​module can implement different functions. The AI ​​module model can be configured based on one or more of the following parameters: structural parameters (e.g., at least one of the following: number of neural network layers, neural network width, inter-layer connections, neuron weights, neuron activation function, or bias in the activation function), input parameters (e.g., type and / or dimension of input parameters), or output parameters (e.g., type and / or dimension of output parameters). The bias in the activation function can also be referred to as the neural network bias.

[0244] An AI module can have one or more models. A model can infer an output, which includes one or more parameters. The learning, training, or inference processes of different models can be deployed on different nodes or devices, or they can be deployed on the same node or device.

[0245] The network device can be a network device equipped with one or more AI modules. The network device can be one or more devices in the core network, access network node (RAN node), or OAM. For example, the AI ​​module can be a RIC, such as a near real-time RIC or a non-real-time RIC. For instance, a near real-time RIC is located in a RAN node (e.g., in a CU, DU), while a non-real-time RIC is located in the OAM, a cloud server, a core network device, or other network devices. The RIC can obtain subsets from multiple terminal devices from RAN nodes (e.g., CU, CU-CP, CU-UP, DU, and / or RU), reassemble them into a training dataset #2, and be trained based on the training dataset #2. Exemplarily, the near real-time RIC and the non-real-time RIC can also be configured as separate network elements, and the network device can be either a near real-time RIC or a non-real-time RIC.

[0246] Figure 7 illustrates a possible application framework in a model training device. As shown in Figure 7, the model training device includes a RAN intelligent controller (RIC). For example, the RIC can be the AI ​​module shown in Figure 6, used to implement AI-related functions. The RIC includes near-real-time RICs (near-RT RICs) and non-real-time RICs (non-RT RICs). Non-real-time RICs primarily process non-real-time information, such as data that is not sensitive to latency, with latency in the order of seconds. Near-real-time RICs primarily process near-real-time information, such as data that is relatively sensitive to latency, with latency in the order of tens of milliseconds.

[0247] The near real-time RIC is used for model training and inference. For example, it can be used to train an AI model and then use that AI model for inference. The near real-time RIC can obtain network-side and / or terminal-side information from RAN nodes (e.g., CU, CU-CP, CU-UP, DU, and / or RU) and / or terminals. This information can be used as training data or inference data. Optionally, the near real-time RIC can deliver inference results to RAN nodes and / or terminals. Optionally, inference results can be exchanged between CU and DU, and / or between DU and RU. For example, the near real-time RIC delivers the inference result to the DU, and the DU sends it to the RU.

[0248] The non-real-time RIC is also used for model training and inference. For example, it can be used to train an AI model and then use that model for inference. The non-real-time RIC can obtain network-side and / or terminal-side information from RAN nodes (e.g., CU, CU-CP, CU-UP, DU, and / or RU) and / or terminals. This information can be used as training data or inference data, and the inference results can be delivered to the RAN nodes and / or terminals. Optionally, inference results can be exchanged between CU and DU, and / or between DU and RU. For example, the non-real-time RIC delivers the inference results to the DU, which then forwards them to the RU.

[0249] The near real-time RIC and non-real-time RIC can also be set up as separate network elements. Optionally, the near real-time RIC and non-real-time RIC can also be part of other devices. For example, the near real-time RIC can be set in the RAN node (e.g., in CU, DU), while the non-real-time RIC can be set in the OAM, cloud server, core network device, or other network device.

[0250] Referring to Figure 8, this application embodiment provides a model training device 800. This model training device 800 can implement the functions of the first or second communication device in the above method embodiments, and therefore can also achieve the beneficial effects of the above method embodiments. In this application embodiment, the model training device 800 can be the first or second communication device, or it can be an integrated circuit or component inside the first or second communication device, such as a chip, baseband chip, modem chip, SoC chip (e.g., an SoC chip containing a modem core), SIP chip, communication module, chip system, processor, etc.

[0251] It should be noted that the transceiver unit 802 can also be called a transceiver unit, which may include a sending unit (also called a sending module) and / or a receiving unit (also called a receiving module), which are used to perform the sending and receiving operations in the embodiment, respectively.

[0252] In one possible implementation, when the device 800 is used to execute the method performed by the first communication device in FIG3 and related embodiments, the device 800 includes a processing unit 801 and a transceiver unit 802; the transceiver unit 802 is used to receive model structure information of a second model from a second communication device, the second model being deployed in the second communication device; wherein, the number of model parameters of the first model is less than or equal to the number of model parameters of the second model, the second model includes multiple second modules, the model structure information is used to indicate the dimension of each second module, the processing unit 801 is used to determine the target module according to the model structure information; the transceiver unit 802 is also used to send request information to the second communication device, the transceiver unit 802 is also used to receive knowledge data of at least one target module, and the processing unit 801 is also used to train the first model using the knowledge data of the target module.

[0253] In one possible implementation, when the device 800 is used to execute the method performed by the second communication device in FIG3 and related embodiments, the device 800 includes a processing unit 801 and a transceiver unit 802; the processing unit 801 is used to send model structure information of the second model to the first communication device; the transceiver unit 802 is used to receive request information from the first communication device, and the processing unit 801 is also used to send knowledge data to the first communication device.

[0254] In one possible design, when the model training device 800 is a terminal device or a communication module within a terminal, the functionality of the processing unit 801 can be implemented by one or more processors. Specifically, the processor may include a modem chip, a SoC chip (such as a SoC chip containing a modem core), or a SIP chip. The functionality of the transceiver unit 802 can be implemented by transceiver circuitry.

[0255] In one possible design, when the model training device 800 is a circuit or chip in a terminal responsible for communication functions, such as a modem chip, a SoC chip, or a SoC chip or SIP chip containing a modem core, the function of the processing unit 801 can be implemented by a circuit system in the aforementioned chip that includes one or more processors or processor cores. The function of the transceiver unit 802 can be implemented by the interface circuit or data transceiver circuit on the aforementioned chip.

[0256] It should be noted that the information execution process of the unit of the above-mentioned model training device 800 can be specifically described in the method embodiment shown above in this application, and will not be repeated here.

[0257] Please refer to Figure 9, which is another schematic structural diagram of the model training device 900 provided in this application. The model training device 900 includes a logic circuit 901 and an input / output interface 902. The model training device 900 can be a chip or an integrated circuit.

[0258] In Figure 8, the transceiver unit 802 can be a communication interface, which can be the input / output interface 902 in Figure 9, and the input / output interface 902 can include an input interface and an output interface. Alternatively, the communication interface can also be a transceiver circuit, which can include an input interface circuit and an output interface circuit.

[0259] In one possible implementation, when the device 900 is used to execute the method performed by the first communication device in FIG3 and related embodiments, the input / output interface 902 is used to receive model structure information of a second model from a second communication device, the second model being deployed in the second communication device; wherein, the number of model parameters of the first model is less than or equal to the number of model parameters of the second model, the second model includes multiple second modules, the model structure information is used to indicate the dimension of each second module, and the logic circuit 901 is used to determine the target module based on the model structure information; the input / output interface 902 is also used to receive knowledge data of at least one target module, and the logic circuit 901 is also used to train the first model using the knowledge data of the target module.

[0260] In one possible implementation, when the device 900 is used to execute the method performed by the second communication device in FIG3 and related embodiments, the input / output interface 902 is used to send model structure information of the second model to the first communication device; the input / output interface 902 is also used to receive request information from the first communication device, and the input / output interface 902 is further used to send knowledge data to the first communication device. The logic circuit 901 and the input / output interface 902 can also execute other steps performed by the first or second communication device in any embodiment and achieve corresponding beneficial effects, which will not be elaborated here.

[0261] In one possible implementation, the processing unit 801 shown in FIG8 can be the logic circuit 901 in FIG9.

[0262] Optionally, the logic circuit 901 can be a processing device, the functions of which can be partially or entirely implemented in software.

[0263] Optionally, the processing apparatus may include a memory and a processor, wherein the memory is used to store a computer program, and the processor reads and executes the computer program stored in the memory to perform the corresponding processing and / or steps in any of the method embodiments.

[0264] Optionally, the processing device may consist of only a processor. Memory for storing computer programs is located outside the processing device, and the processor is connected to the memory via circuitry / wires to read and execute the computer programs stored in the memory. The memory and processor may be integrated together or physically independent.

[0265] Optionally, the processing device may be one or more chips, or one or more integrated circuits. For example, the processing device may be one or more field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), system-on-chips (SoCs), central processing units (CPUs), network processors (NPs), digital signal processors (DSPs), microcontroller units (MCUs), programmable logic devices (PLDs), or other integrated chips, or any combination of the above chips or processors.

[0266] Please refer to Figure 10, which shows the model training device 10000 involved in the above embodiments provided in the embodiments of this application. Specifically, the model training device 10000 can be the model training device as a terminal device and / or network device in the above embodiments. The example shown in Figure 10 is implemented through a terminal device and / or network device (or a component in the terminal device and / or network device).

[0267] The diagram shows a possible logical structure of the model training device 10000, which may include, but is not limited to, at least one processor 1001 and a communication port 1002.

[0268] In Figure 8, the transceiver unit 802 can be a communication interface, which can be the communication port 1002 in Figure 10. The communication port 1002 can include an input interface and an output interface. Alternatively, the communication port 1002 can also be a transceiver circuit, which can include an input interface circuit and an output interface circuit.

[0269] Further optionally, the device may also include at least one of a memory 1003 and a bus 1004. In embodiments of this application, at least one processor 1001 is used to control the operation of the model training device 10000.

[0270] Furthermore, the processor 1001 can be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field-programmable gate array, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute the various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. The processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, etc. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0271] It should be noted that the model training device 10000 shown in Figure 10 can be used to implement the steps implemented by the terminal device and / or network device in the aforementioned method embodiments, and to achieve the corresponding technical effects of the terminal device and / or network device. The specific implementation methods of the terminal device and / or network device shown in Figure 10 can all refer to the description of the first communication device or the second communication device in the aforementioned method embodiments, and will not be repeated here.

[0272] Please refer to Figure 11, which is a schematic diagram of the structure of the model training device 1100 involved in the above embodiments provided in the embodiments of this application. The model training device 1100 can specifically be the model training device as a terminal device and / or network device in the above embodiments. The example shown in Figure 11 is implemented by a terminal device and / or network device (or a component in a terminal device and / or network device). The structure of the model training device can refer to the structure shown in Figure 11.

[0273] The model training device 1100 includes at least one processor 1111 and at least one network interface 1114. Further optionally, the model training device also includes at least one memory 1112, at least one transceiver 1113, and one or more antennas 1115. The processor 1111, memory 1112, transceiver 1113, and network interface 1114 are connected, for example, via a bus. In this embodiment, the connection may include various interfaces, transmission lines, or buses, etc., and this embodiment is not limited thereto. The antenna 1115 is connected to the transceiver 1113. The network interface 1114 enables the model training device to communicate with other communication devices via a communication link. For example, the network interface 1114 may include a network interface between the model training device and core network equipment, such as an S1 interface; the network interface may also include a network interface between the model training device and other model training devices (e.g., other network devices or core network equipment), such as an X2 or Xn interface.

[0274] In Figure 8, the transceiver unit 802 can be a communication interface, which can be the network interface 1114 in Figure 11. The network interface 1114 can include an input interface and an output interface. Alternatively, the network interface 1114 can also be a transceiver circuit, which can include an input interface circuit and an output interface circuit.

[0275] The processor 1111 is primarily used for processing communication protocols and communication data, controlling the entire model training device, executing software programs, and processing the data of the software programs, for example, to support the model training device in performing the actions described in the embodiments. The model training device may include a baseband processor and a central processing unit (CPU). The baseband processor is primarily used for processing communication protocols and communication data, while the CPU is primarily used for controlling the entire terminal device, executing software programs, and processing the data of the software programs. The processor 1111 in Figure 11 can integrate the functions of a baseband processor and a CPU. Those skilled in the art will understand that the baseband processor and CPU can also be independent processors interconnected via technologies such as buses. Those skilled in the art will understand that the terminal device may include multiple baseband processors to adapt to different network standards, and the terminal device may include multiple CPUs to enhance its processing capabilities. The various components of the terminal device can be connected via various buses. The baseband processor can also be described as a baseband processing circuit or a baseband processing chip. The CPU can also be described as a central processing circuit or a central processing chip. The function of processing communication protocols and communication data can be built into the processor or stored in memory as a software program, which is then executed by the processor to implement the baseband processing function.

[0276] The memory is primarily used to store software programs and data. The memory 1112 can exist independently or be connected to the processor 1111. Optionally, the memory 1112 can be integrated with the processor 1111, for example, integrated into a single chip. The memory 1112 can store program code that executes the technical solutions of the embodiments of this application, and its execution is controlled by the processor 1111. The various types of computer program code being executed can also be considered as drivers for the processor 1111.

[0277] Figure 11 shows only one memory and one processor. In actual terminal devices, there may be multiple processors and multiple memories. Memory can also be called storage medium or storage device, etc. Memory can be a storage element on the same chip as the processor, i.e., an on-chip storage element, or it can be a separate storage element; this application does not limit this.

[0278] Transceiver 1113 can be used to support the reception or transmission of radio frequency (RF) signals between the model training device and the terminal. Transceiver 1113 can be connected to antenna 1115. Transceiver 1113 includes a transmitter Tx and / or a receiver Rx. Specifically, one or more antennas 1115 can receive RF signals. The receiver Rx of transceiver 1113 is used to receive the RF signals from the antennas, convert the RF signals into digital baseband signals or digital intermediate frequency (IF) signals, and provide the digital baseband signals or IF signals to the processor 1111 so that the processor 1111 can perform further processing on the digital baseband signals or IF signals, such as demodulation and decoding. In addition, the transmitter Tx in transceiver 1113 is also used to receive the modulated digital baseband signals or IF signals from the processor 1111, convert the modulated digital baseband signals or IF signals into RF signals, and transmit the RF signals through one or more antennas 1115. Specifically, the receiver Rx can selectively perform one or more stages of downmixing and analog-to-digital conversion on the radio frequency signal to obtain a digital baseband signal or a digital intermediate frequency (IF) signal. The order of these downmixing and IF conversion processes is adjustable. The transmitter Tx can selectively perform one or more stages of upmixing and digital-to-analog conversion on the modulated digital baseband signal or digital IF signal to obtain a radio frequency signal. The order of these upmixing and IF conversion processes is also adjustable. The digital baseband signal and the digital IF signal can be collectively referred to as digital signals.

[0279] The transceiver 1113 can also be called a transceiver unit, transceiver, transceiver device, etc. Optionally, the device in the transceiver unit that performs the receiving function can be regarded as the receiving unit, and the device in the transceiver unit that performs the transmitting function can be regarded as the transmitting unit. That is, the transceiver unit includes a receiving unit and a transmitting unit. The receiving unit can also be called a receiver, input port, receiving circuit, etc., and the transmitting unit can be called a transmitter, transmitter, or transmitting circuit, etc.

[0280] It should be noted that the model training device 1100 shown in Figure 11 can be used to implement the steps implemented by the terminal device and / or network device in the aforementioned method embodiments, and to achieve the corresponding technical effects of the terminal device and / or network device. The specific implementation of the model training device 1100 shown in Figure 11 can be referred to the description of the first communication device or the second communication device in the aforementioned method embodiments, and will not be repeated here.

[0281] This application also provides a computer-readable storage medium for storing one or more computer-executable instructions. When the computer-executable instructions are executed by a processor, the processor performs the method described in the possible implementations of the first or second communication device in the foregoing embodiments.

[0282] This application also provides a computer program product (or computer program) that, when executed by a processor, executes the method described above for the possible implementation of the first or second communication device.

[0283] This application also provides a chip system including at least one processor for supporting a model training device in implementing the functions involved in the possible implementations of the model training device described above. Optionally, the chip system further includes an interface circuit that provides program instructions and / or data to the at least one processor. In one possible design, the chip system may further include a memory for storing necessary program instructions and data for the communication device. The chip system may be composed of chips or may include chips and other discrete devices, wherein the model training device may specifically be the first communication device or the second communication device in the aforementioned method embodiments.

[0284] This application also provides a system, which includes the first communication device in any of the above embodiments.

[0285] Optionally, the communication system may also include a second communication device.

[0286] In the embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, indirect coupling or communication connection between devices or units, and may be electrical, mechanical, or other forms. Whether a function is implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0287] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0288] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

Claims

A model training method, characterized in that, The method is applied to a first communication device, a first model is deployed on the first communication device, the first model comprises a plurality of first modules, and the method comprises the following steps: receiving model structure information of a second model from a second communication device, the second model being deployed in the second communication device; wherein the number of model parameters of the first model is less than or equal to the number of model parameters of the second model, the second model comprising a plurality of second modules, and the model structure information being used to indicate the dimension of each second module; sending request information to the second communication device according to the model structure information; wherein the request information is used to request knowledge data of at least one target module with the same dimension as at least one first module, and the target module is included in the plurality of second modules; receiving the knowledge data of the at least one target module, and training the first model using the knowledge data of the target module. The method of claim 1, wherein The dimension of each second module comprises an input dimension and an output dimension, the input dimension of the target module is the same as the input dimension of the corresponding first module, and the output dimension of the target module is the same as the output dimension of the corresponding first module. The method according to claim 1 or 2, characterized in that The knowledge data comprises at least one of the following: input data and output data of the target module, attention input data and attention map of the target module, attention input data and attention logic value of the target module, mean and variance of batch normalization layer output of input data of the target module, and training parameters of the target module. The method according to any one of claims 1 to 3, characterized in that The training of the first model using the knowledge data comprises the following steps: training the first module corresponding to the target module in the first model using the knowledge data; wherein the knowledge data comprises input data and a first output result of the input data in the target module; adjusting the parameters of the first module corresponding to the target module in the first model using a loss function determined based on the knowledge data. The method according to claim 4, characterized in that The adjustment of the parameters of the first module corresponding to the target module in the first model using the loss function determined based on the knowledge data comprises the following steps: obtaining a second output result of the input data in the first module corresponding to the target module in the first model; calculating the distance between the second output result and the first output result; calculating the value of the loss function based on the distance; adjusting the parameters of the first module corresponding to the target module in the first model using the value of the loss function. According to any one of claims 1-5, wherein The first model and the second model are used to measure the channel state information of the first communication device. A model training method, characterized in that, The method is applied to a second communication device, a second model is deployed on the second communication device, the second model comprises a plurality of second modules, and the method comprises the following steps: sending, to the first communication device, model structure information of the second model, wherein the first communication device is deployed with a first model including a plurality of first modules, a model parameter quantity of the first model is less than or equal to a model parameter quantity of the second model, and the model structure information is used to indicate a dimension of each second module; receiving, from the first communication device, request information, wherein the request information is determined by the first communication device according to the model structure information, the request information is used to request knowledge data of at least one target module with the same dimension as at least one first module, and the target module is included in the plurality of second modules; sending, to the first communication device, the knowledge data, wherein the knowledge data is used to train the first model. The method of claim 7, wherein The dimension of each second module includes an input dimension and an output dimension, the input dimension of the target module is the same as that of the corresponding first module, and the output dimension of the target module is the same as that of the corresponding first module. The method according to claim 7 or 8, characterized in that The knowledge data includes at least one of the following: The input data and the output data of the target module, the attention input data and the attention graph of the target module, the attention input data and the attention logic value of the target module, the input data of the target module and the mean and variance of the batch normalization layer output, and the training parameters of the target module. The method according to any one of claims 7-9, characterized in that The knowledge data is used to train the first module corresponding to the target module in the first model; wherein the knowledge data includes input data and a first output result of the input data in the target module; and the knowledge data is also used to determine a value of a loss function, and the loss function is used to adjust parameters of the first module corresponding to the target module in the first model. The method of claim 10, wherein The value of the loss function is determined according to a distance between a second output result and the first output result, and the second output result is obtained by inputting the input data into the first module corresponding to the target module in the first model. According to any one of claims 7-11, The first model and the second model are used to measure channel state information of the first communication device. A communication device characterized by comprising: Comprise: a transceiver unit and a processing unit, the transceiver unit is used to perform the sending step or the receiving step in the method of any one of claims 1-12; the processing unit is used to perform steps other than the sending step and the receiving step in the method of any one of claims 1-12. A communication device characterized by comprising: Comprise at least one processor; The at least one processor is used to execute a computer program or instructions to enable the device to implement the method of any one of claims 1 to 12. The communication apparatus according to claim 14, characterized in that The communication device further comprises a memory; The processor is coupled to the memory; The memory is used to store the computer program or instructions. A computer-readable storage medium, characterized by, The computer readable storage medium stores computer programs or instructions, which, when executed, cause the method of any one of claims 1 to 12 to be performed. A computer program product, characterized in that The program instructions, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.