Information processing device, information processing method, and information processing program

The information processing device addresses the challenge of adapting terminal pairing in multi-user MIMO systems by generating pre-trained models in a simulated environment and performing reinforcement learning in real-time, ensuring optimal pairing control across varying communication conditions.

WO2026126274A1PCT designated stage Publication Date: 2026-06-18SOFTBANK CORPORATION

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SOFTBANK CORPORATION
Filing Date
2024-12-09
Publication Date
2026-06-18

Smart Images

  • Figure JP2024043389_18062026_PF_FP_ABST
    Figure JP2024043389_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided are an information processing device, an information processing method, and an information processing program for estimating terminals to be paired. The information processing device comprises: a generation unit that generates a plurality of communication situations in a simulated communication environment; a first learning unit that, for each of the plurality of communication situations generated by the generation unit, learns a plurality of pieces of information pertaining to communication when multi-user MIMO communication is performed between a base station and each of a plurality of terminals, and generates a trained model; and a second learning unit that implements the trained model generated by the first learning unit in an actual communication environment, and performs reinforcement learning using information acquired by multi-user MIMO communication between a base station and each of a plurality of terminals in the actual communication environment.
Need to check novelty before this filing date? Find Prior Art

Description

Information Processing Apparatus, Information Processing Method, and Information Processing Program 【0001】 The present disclosure relates to an information processing apparatus, an information processing method, and an information processing program. 【0002】 Conventionally, various pieces of information related to communication are acquired as learning data, and a learned model is generated using the learning data. In actual MU-MIMO communication, after obtaining a user selection estimated value based on the data input to the access point and the learned model, MU-MIMO communication is performed with other communication devices based on the estimated value. 【0003】 Japanese Patent Application Laid-Open No. 2024-008687 【0004】 An information processing apparatus according to one aspect includes a generation unit that generates a plurality of communication situations in a simulated communication environment, and a first learning unit that learns a plurality of pieces of information related to communication when performing multi-user MIMO communication between a base station and each of a plurality of terminals for each of the plurality of communication situations generated by the generation unit to generate a learned model, and a second learning unit that implements the learned model generated by the first learning unit in a real communication environment and performs reinforcement learning using information obtained by multi-user MIMO communication between the base station and each of the plurality of terminals in the real communication environment. 【0005】 It is a diagram (schematic diagram) for explaining an information processing apparatus according to an embodiment. It is a block diagram for explaining an information processing apparatus according to an embodiment. It is a flowchart for explaining an information processing method according to an embodiment. 【0006】 Hereinafter, an embodiment will be described. 【0007】 [Overview of Information Processing Apparatus 100] First, an overview of the information processing apparatus 100 according to an embodiment will be described. FIG. 1 is a diagram (schematic diagram) for explaining the information processing apparatus 100 according to an embodiment. 【0008】Conventionally, various information related to communication is acquired as training data, and a trained model is generated using this training data. In actual multi-user MIMO (MU-MIMO) communication, a user selection estimate is obtained based on the data input to the access point and the trained model, and then MU-MIMO communication with other communication devices is performed based on that estimate. However, communication conditions are diverse, and it is not possible to perform appropriate terminal pairing control according to these communication conditions with a single trained model. 【0009】 Conventionally, the system determines which terminals should be paired according to the pairing algorithm implemented in the base station for multi-user MIMO communication. However, because this algorithm is uniform, it is difficult to change once it is implemented in the base station, and it is also difficult to cope when the traffic behavior of the communication network changes. 【0010】 In this embodiment, the information processing device 100 selects a pre-trained model generated based on an environment similar to the actual communication conditions occurring at a base station in a multi-user MIMO, and uses the selected pre-trained model to estimate the paired terminal. The information processing device 100 generates a base AI / ML (artificial intelligence / machine learning) pre-trained model in a simulation environment 200 that simulates the traffic behavior of a communication network. In this case, the information processing device 100 prepares multiple base pre-trained models by training on multiple traffic behavior patterns. Next, the information processing device 100 puts each pre-trained model into a real environment 300 that is similar to the environment set in the training of each pre-trained model, and performs reinforcement learning. The information processing device 100 defines an indicator for determining when the behavior of communication traffic has changed, and optimizes by changing the pre-trained model used according to the change in that indicator. As a result, the information processing device 100 can perform optimal pairing control in situations such as nighttime when communication traffic is low, commuting hours when communication traffic from moving users is relatively high, and nighttime when communication traffic from moving users is relatively low. 【0011】 In other words, first, the information processing device 100 generates multiple communication conditions (communication condition 1, communication condition 2, etc.) in a simulated communication environment (under the simulation environment 200). The information processing device 100 simulates the traffic patterns of a real communication network in the simulation environment 200. The communication conditions may be, for example, the conditions of communication traffic (traffic patterns). In other words, the information processing device 100 performs simulations of multiple communication traffics. As an example in this case, the information processing device 100 may perform simulations of communication traffic according to different time periods, regions, and user movement conditions, such as daytime and nighttime, urban areas and residential areas, and areas with a relatively large number of moving users and areas with a relatively small number of moving users. 【0012】 The information processing device 100 learns multiple pieces of information related to communication when performing multi-user MIMO communication between the base station 210 and each of the multiple terminals 220 in each of the multiple communication situations, and generates a trained model. In this case, the information processing device 100 may use the learning platform 115 (for example, the first learning unit 112 and the second learning unit 113, etc. (see Figure 2)) to learn multiple pieces of information related to communication and generate a trained model. That is, the information processing device 100 may, for example, generate a trained model according to each of the multiple communication situations (for each of the multiple communication situations (multiple traffic patterns)). This makes it possible for the information processing device 100 to extract the characteristics of differences depending on the communication situation. Multiple pieces of information related to communication may be, for example, correlation information that records the optimal combination when pairing between multiple terminals 220, and multiple pieces of information acquired by the base station 210 in a simulated communication environment. Multiple pieces of information acquired by the base station 210 in a simulated communication environment may be, for example, sounding reference information (SRS), traffic information, cell load information, and wireless quality information. 【0013】The information processing device 100 implements the trained model generated as described above at the base station 310 in the actual communication environment (actual environment 300), and performs reinforcement learning using information acquired through multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 in the actual communication environment. The information processing device 100 may, for example, perform reinforcement learning using the learning platform 115. For example, as reinforcement learning, the information processing device 100 uses the trained model described above and performs reinforcement learning at the base station 310 in the actual communication environment (multi-user MIMO communication) (actual environment 300) to learn which pairs of terminals 320 (how multiple terminals 320) should be formed to improve the efficiency of the communication. In other words, the information processing device 100 performs reinforcement learning to learn which pairs of terminals 320 (which multiple terminals 320) should be formed to improve the efficiency of multi-user MIMO communication. 【0014】 The information processing device 100 uses the learned models for each of the multiple communication situations generated by reinforcement learning to set up pairing combinations between the multiple terminals 320 in multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 under a real communication environment (multi-user MIMO communication in the real environment 300). 【0015】The information processing device 100 may be configured, for example, as a generation device that generates a trained model that estimates which pair of terminals 320 is optimal to form when performing multi-user MIMO communication. The information processing device 100 may also be configured, for example, as a reinforcement learning device that implements the generated trained model on a base station 310 in a real communication environment (multi-user MIMO communication in the real environment 300) and performs reinforcement learning. The information processing device 100 may also be configured, for example, as a setting device that uses the results of reinforcement learning to set pairing combinations between multiple terminals 320 that communicate with the base station 310 in a real communication environment (multi-user MIMO communication in the real environment 300). That is, the information processing device 100 may be configured, for example, as a control device that performs pairing control between multiple terminals 320. Furthermore, the information processing device 100 may also constitute a part of a communication device (not shown) installed at the base station 310. The information processing device 100 is not limited to the example device described above, but may constitute various other devices. The information processing device 100 may be a computer such as a server, desktop, laptop, tablet, or smartphone. 【0016】 This disclosure relates to RIC (RAN Intelligent Controller). 【0017】 [Details of the Information Processing Device 100] Next, an information processing device 100 according to one embodiment will be described in detail. Figure 2 is a block diagram illustrating the information processing device 100 according to one embodiment. 【0018】The information processing device 100 includes, for example, a communication unit 121, a storage unit 122, a display unit 123, and a control unit 110. The communication unit 121, the storage unit 122, and the display unit 123 may be embodiments of the output unit. The control unit 110 includes, for example, a generation unit 111, a first learning unit 112, a second learning unit 113, and a setting unit 114. The control unit 110 may be configured by, for example, the arithmetic processing unit of the information processing device 100. The control unit 110 (for example, the arithmetic processing unit) may realize the functions of each unit (for example, the generation unit 111, the first learning unit 112, the second learning unit 113, and the setting unit 114) by appropriately reading and executing various programs stored in the storage unit 122, etc. That is, the functions of each unit may be realized by computer implementation. The first learning unit 112 and the second learning unit 113 may correspond to the "learning base 115" illustrated in Figure 1. 【0019】 The communication unit 121 is a communication interface that enables the transmission and reception of various types of information with, for example, an external device (external device) located outside the information processing device 100. An example of an external device may be a base station 310 and a server (not shown) in the actual environment 300. Furthermore, the external device is not limited to the base station 310 and the server. 【0020】 The storage unit 122 may store various information and programs, for example. Examples of the storage unit 122 include memory, solid-state drives, and hard disk drives. The storage unit 122 may also be, for example, a storage area and server located in the cloud. 【0021】 The display unit 123 is a display capable of displaying various characters, symbols, images, etc. 【0022】The generation unit 111 generates multiple communication conditions under a simulated communication environment (under the simulation environment 200). The communication conditions may be, for example, communication traffic conditions (traffic patterns). In other words, the generation unit 111 simulates multiple communication traffics. As an example in this case, the generation unit 111 may simulate communication traffic according to time of day, region, and user movement conditions, such as daytime and nighttime, urban areas and residential areas, and situations where there are relatively many mobile users and relatively few mobile users. 【0023】 The first learning unit 112 learns multiple pieces of information related to communication when performing multi-user MIMO communication between the base station 210 and each of the multiple terminals 220 for each of the multiple communication situations generated by the generation unit 111, and generates a trained model. The multiple pieces of information used for learning in the first learning unit 112 may include correlation information that records the optimal combination when pairing between multiple terminals 220, as well as sounding reference information, traffic information, cell load information, and wireless quality information acquired by the base station 210 in a simulated communication environment. The correlation information may include, for example, information that records the optimal pairing (terminal 220) combination when performing pairing control between multiple terminals 220 communicating via the base station 210. Sounding reference information (SRS) may also be called, for example, a sounding reference signal and a sounding reference signal. Traffic information may include, for example, various types of information (various data) transmitted and received on the communication network and the types of various types of information. The types of information may include, for example, the types of information to be recorded, such as voice information, video information, and text information. Cell load information may refer to, for example, information on the load of the communication cells formed by the base station 210 (cell load). Wireless quality information may refer to, for example, information on the quality of communication when the base station 210 communicates with the terminal 220. In other words, the first learning unit 112 generates multiple trained models according to the communication conditions (communication traffic conditions) for each time period, region, and user movement situation under the simulation environment 200. 【0024】 The second learning unit 113 implements the trained model generated by the first learning unit 112 in a real communication environment and performs reinforcement learning using information acquired through multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 in the real communication environment. In other words, the second learning unit 113 puts multiple trained models generated by the first learning unit 112 into the real environment 300 (actual communication network) and performs reinforcement learning using each of the multiple trained models. 【0025】 The second learning unit 113 selects a trained model according to the actual environment 300, based on the communication environment when the first learning unit 112 generates the trained model, that is, the communication environment generated by the generation unit 111, specifically the time of day, region, and user movement status under the simulation environment 200. In other words, the second learning unit 113 selects a trained model from among several trained models that was generated under the communication conditions of a simulated communication that corresponds to the communication conditions of the actual environment 300. As a specific example, in the case of commuting hours when communication traffic from moving users is relatively high in the actual environment 300, the second learning unit 113 selects a trained model generated under the time of a simulated communication that corresponds to that commuting hours (a trained model generated in the simulation environment 200 during commuting hours when communication traffic from moving users is relatively high). 【0026】 The second learning unit 113 performs reinforcement learning in the actual environment 300 using the selected pre-trained model. For example, as reinforcement learning, the second learning unit 113 uses the pre-trained model described above and performs learning (reinforcement learning) in the actual communication environment (multi-user MIMO communication) (base station 310 of the actual environment 300) to determine which pairs of terminals 320 (how multiple terminals 320) should be formed to improve the efficiency of the communication when communicating between the base station 310 and the terminals 320. In other words, the second learning unit 113 performs learning (reinforcement learning) to determine which pairs of terminals 320 (which multiple terminals 320) should be formed to improve the efficiency of multi-user MIMO communication. 【0027】The setting unit 114 uses the trained models for each of the multiple communication conditions generated by reinforcement learning performed by the second learning unit 113 to set pairing combinations between multiple terminals 320 in multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 in a real communication environment. The setting unit 114 selects the result (trained model after reinforcement learning) that corresponds to the current communication condition of the base station 310 from among the results of reinforcement learning performed according to each communication condition of the real environment 300 (multiple trained models after reinforcement learning). As a specific example, if the current communication condition is the communication traffic situation during commuting hours when there are relatively many moving users, the setting unit 114 selects the trained model for which reinforcement learning was performed during that communication condition. The setting unit 114 may, for example, refer to the current time and the behavior of the communication traffic of the base station 310 (current communication condition), i.e., sounding reference information, traffic information, cell load information and wireless quality information obtained at the base station 310 in the real environment 300, and select the trained model that corresponds to the current communication condition from among the multiple trained models. The configuration unit 114 uses the selected trained model (a trained model that has undergone reinforcement learning) to configure the pairing of multiple terminals 320 in multi-user MIMO communication (performs pairing control). Specifically, the configuration unit 114 estimates the optimal pair of terminals 320 in the current communication situation based on sounding reference information, traffic information, cell load information, and wireless quality information acquired by the base station 310 in the actual environment 300, and the trained model selected as described above. Based on the estimated pair of terminals 320, the configuration unit 114 performs pairing control between the terminals 320 (among multiple terminals 320). 【0028】The setting unit 114 may, when the communication status changes in multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 in a real communication environment, select a trained model from among the trained models corresponding to each of the multiple communication statuses that corresponds to the communication status after the change in the real communication environment, and use the selected trained model to set the pairing combinations between the multiple terminals 320. As a specific example, if the current communication status changes from a situation with communication traffic during commuting hours when there are relatively many moving users to a situation with communication traffic during commuting hours when there are relatively few moving users, the setting unit 114 selects a trained model that underwent reinforcement learning when the communication status changed, that is, when there was communication traffic during commuting hours when there were relatively few moving users. Based on the selected trained model (a trained model that underwent reinforcement learning) and sounding reference information, traffic information, cell load information, and wireless quality information acquired by the base station 310 in the real environment 300, the setting unit 114 sets the pairing of the multiple terminals 320 in multi-user MIMO communication (performs pairing control). 【0029】 The setting unit 114 may, for example, control the output unit to set pairing combinations between multiple terminals 320, i.e., to output a history (log) of pairing control. The output unit may be, for example, a communication unit 121, a storage unit 122, and a display unit 123. That is, the setting unit 114 may, for example, control the communication unit 121 to transmit information of the pairing control history (log) to an external device (not shown). The external device here may be, for example, a server. The setting unit 114 may, for example, control the storage unit 122 to store information of the pairing control history (log). The setting unit 114 may, for example, control the display unit 123 to display the pairing control history (log). 【0030】 [Information Processing Method] Next, an information processing method according to one embodiment will be described. Figure 3 is a flowchart illustrating the information processing method according to one embodiment. 【0031】In step ST101, the generation unit 111 generates multiple communication conditions under a simulated communication environment (under the simulation environment 200). 【0032】 In step ST102, the first learning unit 112 learns multiple pieces of information regarding communication when performing multi-user MIMO communication between the base station 210 and each of the multiple terminals 220 for each of the multiple communication situations generated in step ST101, and generates a trained model. The multiple pieces of information used for learning by the first learning unit 112 may include correlation information that records the optimal combination when pairing is performed between the multiple terminals 220 (when performing pairing control between multiple terminals 220 communicating via the base station 210), and various pieces of information acquired by the base station 210 in a simulated communication environment. The various pieces of information acquired by the base station 210 in a simulated communication environment may include, for example, sounding reference information, traffic information, cell load information, and radio quality information. 【0033】 In step ST103, the second learning unit 113 implements the trained model generated in step ST102 on the base station 310 in the actual communication environment (actual environment 300), and performs reinforcement learning using information obtained through multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 in the actual communication environment. 【0034】 In step ST104, the setting unit 114 uses the learned models for each of the multiple communication situations generated by reinforcement learning in step ST103 to set pairing combinations between the multiple terminals 320 in the multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 under the actual communication environment (actual environment 300). If the communication situation changes in the multi-user MIMO communication between the base station 310 and each of the multiple terminals 320 under the actual communication environment (actual environment 300), the setting unit 114 may select a learned model from among the learned models corresponding to each of the multiple communication situations that corresponds to the communication situation after the change under the actual communication environment (actual environment 300), and use the selected learned model to set pairing combinations between the multiple terminals 320. 【0035】[Functions and Circuitry] Next, the functions and circuitry of the information processing device 100 described above will be explained. Each part of the information processing device 100 may be implemented as a function of a computer's arithmetic processing unit or the like. The information processing device 100 may implement the functions of the generation unit 111, the first learning unit 112, the second learning unit 113, and the setting unit 114 by a single control unit 110 (e.g., an arithmetic processing unit, etc.), or the functions of the generation unit 111, the first learning unit 112, the second learning unit 113, and the setting unit 114 may be implemented in a distributed manner by multiple different control units 110 (e.g., arithmetic processing units, etc.). The generation unit 111, the first learning unit 112, the second learning unit 113, and the setting unit 114 (control unit 110) of the information processing device 100 described above may be implemented as a generation function, a first learning function, a second learning function, and a setting function (control function), respectively, by a computer's arithmetic processing unit or the like. An information processing program can have a computer implement each of the above functions. The information processing program may be recorded on a computer-readable, non-temporary, tangible recording medium such as memory, a solid-state drive, a hard disk drive, or an optical disc. The storage medium may be rephrased as a non-temporary, tangible, computer-readable medium that stores the information processing program. The information processing program may also be transmitted online. The information processing program can be implemented into a product (computer program product) by the control unit 110 (e.g., an arithmetic processing unit). As described above, each part of the information processing device 100 may be implemented by a computer's arithmetic processing unit, etc. This arithmetic processing unit, etc., is composed of, for example, an integrated circuit. For this reason, each part of the information processing device 100 may be implemented as a circuit constituting an arithmetic processing unit, etc. That is, the generation unit 111, the first learning unit 112, the second learning unit 113, and the setting unit 114 (control unit 110) of the information processing device 100 may be implemented as a generation circuit, a first learning circuit, a second learning circuit, and a setting circuit (control circuit) constituting a computer's arithmetic processing unit, etc.Furthermore, the communication unit 121, storage unit 122, and display unit 123 (output unit) of the information processing device 100 may be implemented as a communication function, storage function, and display function (output function) that includes the functions of an arithmetic processing unit, for example. Also, the communication unit 121, storage unit 122, and display unit 123 (output unit) of the information processing device 100 may be implemented as a communication circuit, storage circuit, and display circuit (output circuit) by being composed of an integrated circuit, for example. Furthermore, the communication unit 121, storage unit 122, and display unit 123 (output unit) of the information processing device 100 may be configured as a communication device, storage device, and display device (output device) by being composed of a plurality of devices, for example. 【0036】 The information processing device 100 can be configured by combining one or any multiple of the above-described parts. In this disclosure, the term "information" is used, but the term "information" can be replaced with "data," and the term "data" can be replaced with "information." 【0037】 [Aspects and Effects of this Embodiment] Next, an aspect of this embodiment and the effects of each aspect will be described. Note that the aspects described below are examples as of the time of filing, and this embodiment is not limited to the aspects described below. That is, this embodiment is not limited to the aspects described below, and may be realized by appropriately combining the parts described above. Also, a lower-level aspect may be referenced in any of the higher-level aspects. Furthermore, the effects of this embodiment described below are examples, and the effects of each aspect are not limited to those described below. Also, each aspect may have at least one of the effects described below, for example. 【0038】(Aspect 1) An information processing device in one aspect comprises: a generation unit that generates multiple communication situations in a simulated communication environment; a first learning unit that learns multiple pieces of information relating to communication when performing multi-user MIMO communication between a base station and multiple terminals for each of the multiple communication situations generated by the generation unit and generates a trained model; and a second learning unit that implements the trained model generated by the first learning unit in a real communication environment and performs reinforcement learning using information acquired through multi-user MIMO communication between a base station and multiple terminals in a real communication environment. As a result, the information processing device can generate a trained model and perform reinforcement learning for optimal pairing control of multiple terminals at a base station performing multi-user MIMO communication. Furthermore, since the information processing device can generate trained models (multiple trained models) according to multiple communication situations (multiple traffic patterns) (for each of the multiple communication situations), it is possible to select a trained model according to the communication situation (traffic pattern of the real environment). 【0039】 (Aspect 2) An information processing device in one aspect may include a setting unit that uses learned models for each of several communication situations, generated by reinforcement learning in the second learning unit, to set pairing combinations between multiple terminals in multi-user MIMO communication between a base station and each of several terminals in a real communication environment. This allows the information processing device to perform optimal pairing control of multiple terminals according to the current communication situation (real communication environment) at a base station performing multi-user MIMO communication. In other words, the information processing device can select a learned model according to the communication situation (traffic pattern of the real environment) and use the selected learned model to perform pairing control between multiple terminals according to the communication situation (traffic pattern of the real environment). 【0040】(Aspect 3) In one aspect of the information processing device, when the communication status changes in multi-user MIMO communication between a base station and each of several terminals in an actual communication environment, the setting unit may select a learned model from among several learned models corresponding to each of the multiple communication statuses that corresponds to the communication status after the change in the actual communication environment, and use the selected learned model to set the pairing combination between the multiple terminals. This allows the information processing device to select the optimal learned model for controlling the pairing of multiple terminals at a base station performing multi-user MIMO communication, and to perform appropriate pairing control according to the current communication status. Furthermore, if the information processing device selects a learned model suitable for the actual communication environment (actual environment) from among several learned models, it can perform pairing control according to that actual communication environment, making it easy to change the pairing control and enabling optimal pairing control even when the behavior of communication traffic changes. In other words, when the communication conditions (traffic patterns) change at a base station in a real environment, the information processing device can select a pre-trained model corresponding to the changed communication conditions (traffic patterns), and use the selected pre-trained model to perform pairing control between multiple terminals according to the changed communication conditions (traffic patterns). 【0041】 (Aspect 4) In one aspect of the information processing device, the multiple pieces of information used for learning in the first learning unit may include correlation information that records the optimal combination when pairing multiple terminals, as well as sounding reference information, traffic information, cell load information, and wireless quality information acquired at a base station in a simulated communication environment. This allows the information processing device to generate a trained model for optimal pairing control of multiple terminals at a base station performing multi-user MIMO communication. Furthermore, the information processing device can generate a trained model according to the communication status (traffic pattern) at the base station (for each communication status). 【0042】(Aspect 5) In an information processing method of one aspect, a computer executes a generation step of generating a plurality of communication situations in a simulated communication environment, and a first learning step of generating a learned model by learning a plurality of pieces of information related to communication when performing multi-user MIMO communication between a base station and each of a plurality of terminals for each of the plurality of communication situations generated in the generation step. Then, the computer executes a second learning step of implementing the learned model generated in the first learning step in a real communication environment and performing reinforcement learning using information obtained by multi-user MIMO communication between the base station and each of the plurality of terminals in the real communication environment. As a result, the information processing method can achieve the same effects as the information processing apparatus of one aspect described above. 【0043】 (Aspect 6) An information processing program of one aspect causes a computer to realize a generation unit that generates a plurality of communication situations in a simulated communication environment, a first learning unit that generates a learned model by learning a plurality of pieces of information related to communication when performing multi-user MIMO communication between a base station and each of a plurality of terminals for each of the plurality of communication situations generated by the generation unit, and a second learning unit that implements the learned model generated by the first learning unit in a real communication environment and performs reinforcement learning using information obtained by multi-user MIMO communication between the base station and each of the plurality of terminals in the real communication environment. As a result, the information processing program can achieve the same effects as the information processing apparatus of one aspect described above. 【0044】 By using the present disclosure described above, it is possible to contribute to the achievement of Sustainable Development Goal (SDG) 9, "Build the infrastructure for industry and innovation." 【0045】 100 Information processing apparatus 110 Control unit 111 Generation unit 112 First learning unit 113 Second learning unit 114 Setting unit 115 Learning infrastructure 121 Communication unit 122 Storage unit 123 Display unit 200 Simulation environment (simulated communication environment) 210 Base station in the simulation environment 220 Terminal in the simulation environment 300 Real environment (real communication environment) 310 Base station in the real environment 320 Terminal in the real environment

Claims

1. An information processing device comprising: a generation unit that generates multiple communication situations in a simulated communication environment; a first learning unit that learns multiple pieces of information relating to communication when a base station and multiple terminals communicate for each of the multiple communication situations generated by the generation unit and generates a trained model; and a second learning unit that implements the trained model generated by the first learning unit in a real communication environment and performs reinforcement learning using information obtained through multi-user MIMO communication between the base station and multiple terminals in the real communication environment.

2. The information processing apparatus according to claim 1, further comprising a setting unit that sets pairing combinations between multiple terminals in multi-user MIMO communication between a base station and each of the multiple terminals in the actual communication environment, using learned models of each of the multiple communication conditions generated by reinforcement learning performed in the second learning unit.

3. The information processing apparatus according to claim 2, wherein, when the communication status changes in the multi-user MIMO communication between the base station and each of the multiple terminals in the actual communication environment, the setting unit selects a learned model from among the multiple learned models corresponding to each of the multiple communication statuses that corresponds to the communication status after the change in the actual communication environment, and uses the selected learned model to set the pairing combination between the multiple terminals.

4. The information processing device according to claim 1, wherein the multiple pieces of information used for learning in the first learning unit are correlation information recording the optimal combination when pairing between multiple terminals, and sounding reference information, traffic information, cell load information, and wireless quality information acquired by the base station in a simulated communication environment.

5. An information processing method comprising: a generation step in which a computer generates multiple communication situations in a simulated communication environment; a first learning step in which a computer learns multiple pieces of information relating to communication when a base station and multiple terminals communicate using multi-user MIMO for each of the multiple communication situations generated by the generation step and generates a trained model; and a second learning step in which the trained model generated by the first learning step is implemented in a real communication environment and reinforcement learning is performed using information obtained through multi-user MIMO communication between a base station and multiple terminals in the real communication environment.

6. An information processing program that enables a computer to implement: a generation unit that generates multiple communication situations in a simulated communication environment; a first learning unit that learns multiple pieces of information related to communication when multi-user MIMO communication is performed between a base station and multiple terminals for each of the multiple communication situations generated by the generation unit and generates a trained model; and a second learning unit that implements the trained model generated by the first learning unit in a real communication environment and performs reinforcement learning using information obtained through multi-user MIMO communication between a base station and multiple terminals in the real communication environment.