Information processing apparatus, information processing method, and program
By partitioning data and models across multiple processors for parallel learning and inference, the method addresses the challenge of balancing speed and accuracy in combinatorial optimization, achieving efficient and precise solutions for large-scale problems.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NEC CORP
- Filing Date
- 2025-12-10
- Publication Date
- 2026-06-25
AI Technical Summary
Existing combinatorial optimization problems face challenges in balancing the speed of solution with accuracy, particularly as problem size increases, often resulting in either slow processing times or inaccurate results.
A method involving data and model partitioning across multiple processors, where target data and learning models are divided into partial data and models, allowing each processor to learn or infer solutions referencing node, edge, and image information, facilitating parallel processing.
This approach enables high-speed and accurate solutions for large-scale combinatorial optimization problems by distributing the workload across multiple processors, enhancing both processing speed and accuracy.
Smart Images

Figure JP2025043021_25062026_PF_FP_ABST
Abstract
Description
Information processing device, information processing method, and program
[0001] This disclosure relates to an information processing device, an information processing method, and a program.
[0002] Techniques for performing efficient work by solving combinatorial optimization problems are known in various industries. For example, Patent Document 1 discloses the use of a learning model for generating transportation plans (delivery planning problems) as an example of a combinatorial optimization problem.
[0003] Japanese Patent Publication No. 2021-135784
[0004] However, in combinatorial optimization problems, it is difficult to balance speed of solving the problem with accuracy of the solution. For example, as the size of the problem increases, it becomes difficult to solve it within a realistic time. On the other hand, if priority is given to solving the problem within a realistic time, the accuracy of the solution decreases.
[0005] This disclosure has been made in view of the above-mentioned problems, and one exemplary objective is to provide a technology that can achieve both high speed and high accuracy in the solution for combinatorial optimization problems.
[0006] An information processing device relating to an illustrative aspect of this disclosure includes an acquisition means for acquiring target data, an allocation means for distributing and allocating a plurality of partial data obtained by dividing the target data and a plurality of partial models obtained by dividing a learning model which is the target of learning and refers to the target data, to a plurality of processors, and a learning means for causing each processor to learn the partial model allocated to the processor by referring to at least one of the plurality of partial data.
[0007] An information processing device relating to an illustrative aspect of this disclosure includes an acquisition means for acquiring target data, an allocation means for distributing and allocating a plurality of submodels obtained by dividing a learning model to a plurality of processors, and an inference means for each processor to perform an inference process by inputting at least a portion of the target data to the submodel allocated to that processor, wherein the inference process includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data.
[0008] An example of an information processing method relating to this disclosure includes a computer acquiring target data, distributing and allocating a plurality of partial data obtained by dividing the target data and a plurality of partial models obtained by dividing a learning model that is the target of learning and references the target data to a plurality of processors, and having each processor learn the partial model assigned to it by referencing at least one of the plurality of partial data.
[0009] An example of an information processing method relating to this disclosure includes a computer acquiring target data, distributing and assigning a plurality of submodels obtained by dividing a learning model to a plurality of processors, and performing inference processing in each processor by inputting at least a portion of the target data to the submodel assigned to that processor, wherein the inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data.
[0010] Each aspect of the present invention may be implemented by a computer, in which case a program that enables the computer to implement the information processing device by operating the computer as each part (software element) of the information processing device, and a computer-readable recording medium on which the program is recorded are also included in the scope of this disclosure.
[0011] According to an illustrative aspect of this disclosure, one exemplary effect is that it is possible to achieve both high speed and high accuracy in the solution for combinatorial optimization problems.
[0012] This is a block diagram showing the configuration of the information processing device related to this disclosure. This is a flowchart showing the flow of the information processing method related to this disclosure. This is a block diagram showing the configuration of the information processing device related to this disclosure. This is a flowchart showing the flow of the information processing method related to this disclosure. This is a block diagram showing the configuration of the information processing device related to this disclosure. This is a flowchart showing an example of the processing flow in the information processing device related to this disclosure. This is a flowchart showing an example of the processing flow in the information processing device related to this disclosure. This is a diagram for explaining an example of processing in the information processing device related to this disclosure. This is a flowchart showing an example of the processing flow in the information processing device related to this disclosure. This is a diagram for explaining an example of application of the information processing device related to this disclosure. This is a block diagram showing the configuration of a computer that functions as an information processing device related to this disclosure.
[0013] The following are examples of embodiments of the present invention. However, the present invention is not limited to the exemplary embodiments shown below, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining some or all of the technologies (things or methods) employed in each of the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, embodiments obtained by appropriately omitting some of the technologies employed in each of the exemplary embodiments shown below may also be included in the scope of the present invention. In addition, the effects mentioned in each of the exemplary embodiments shown below are examples of effects that can be expected in that exemplary embodiment and do not define the scope of the present invention. That is, embodiments that do not produce the effects mentioned in each of the exemplary embodiments shown below may also be included in the scope of the present invention.
[0014] [First Embodiment] A first exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. This exemplary embodiment is the basic form for each of the exemplary embodiments described later. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems occur. Furthermore, each technology shown in the drawings referenced to explain this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems occur.
[0015] (Configuration of Information Processing Device 1) The configuration of the information processing device 1 according to this exemplary embodiment will be described with reference to Figure 1. Figure 1 is a block diagram showing the configuration of the information processing device 1. As shown in Figure 1, the information processing device 1 includes an acquisition unit 11, an allocation unit 12, and a learning unit 13.
[0016] (Acquisition Unit 11) The acquisition unit 11 acquires target data. Here, the target data is, for example, data referenced to train a learning model, which will be described later. Specific examples of the target data are not limited to this exemplary embodiment, but as an example, it includes at least one of the following: data defining an optimization problem handled by the information processing device 1, and data referenced to solve the optimization problem.
[0017] (Allocation Unit 12) The allocation unit 12 distributes and allocates the multiple partial data obtained by dividing the target data and the multiple partial models obtained by dividing the learning model to multiple processors. For example, the allocation unit 12 divides the target data into partial data b1 and partial data b2 and allocates them to processor 1 and processor 2, respectively. The allocation unit 12 also divides the learning model into partial model p1 and partial model p2 and allocates them to processor 1 and processor 2, respectively. Here, dividing a model may include dividing at least one of the multiple parameters and multiple layers included in the model. The learning model is, for example, a model for solving an optimization problem. However, these examples do not limit this exemplary embodiment.
[0018] (Learning Unit 13) The learning unit 13 causes each processor to learn the partial model assigned to that processor by referring to at least one of the plurality of partial data. For example, if the assignment unit 12 has made the following assignments: - Processor 1: Partial model p1, Partial data b1 - Processor 2: Partial model p2, Partial data b2, then the learning unit 13 causes processor 1 to learn the partial model p1 by referring to at least one of the partial data b1 and b2. Also, in processor 2, the learning unit 13 causes the partial model p2 by referring to at least one of the partial data b1 and b2.
[0019] Furthermore, the data may be referenced from each other during the training of each submodel in each processor. For example, the following processes may be performed: - In processor 1, partial data b1 is input into submodel p1 and the resulting data h1 is transferred to processor 2; - In processor 2, partial data b2 is transferred to processor 1; - In processor 1, partial data b2 is input into submodel p1; - In processor 2, data h1 is input into submodel p2. Also, the process of "inputting data h1 into submodel p2 in processor 2" (calculating) and the process of "inputting partial data b2 into submodel p1 in processor 1" (calculating) may be performed in parallel.
[0020] (Effects of Information Processing Device 1) As described above, the Information Processing Device 1 employs the following configuration: - Acquires target data; - Distributes and assigns to multiple processors the multiple partial data obtained by dividing the target data and the multiple partial models obtained by dividing the learning model which is the target of learning that references the target data; - In each processor, the partial model assigned to that processor is trained by referencing at least one of the multiple partial data. In this way, the Information Processing Device 1 distributes and assigns to multiple processors the multiple partial data obtained by dividing the target data and the multiple partial models obtained by dividing the learning model which is the target of learning that references the target data, and then in each processor, the partial model assigned to that processor is trained. Therefore, even when the scale of the model is large, and even when the size of the target data is large, it is possible to achieve both high speed and accuracy of the solution.
[0021] (Flow of Information Processing Method S1) Next, the flow of the information processing method S1 according to this exemplary embodiment will be explained with reference to Figure 2. Figure 2 is a flowchart showing the flow of the information processing method S1. As shown in Figure 2, the information processing method S1 includes a step (processing) S11 for acquiring target data, a step (processing) S12 for distributing and allocating a plurality of partial data and a plurality of partial models to a plurality of processors, and a step (processing) S13 for training the partial models in each processor.
[0022] (Step S11) In step S11, the acquisition unit 11 acquires the target data. Here, the target data is, for example, the data referenced to train the learning model described above. A more detailed explanation of the acquisition unit 11 has been given above, so it will be omitted here.
[0023] (Step S12) In step S12, the allocation unit 12 distributes and allocates the multiple partial data obtained by dividing the target data and the multiple partial models obtained by dividing the learning model to multiple processors. A more detailed explanation of the allocation unit 12 has been given above, so it will be omitted here.
[0024] (Step S13) In step S13, the learning unit 13 causes each processor to learn the submodel assigned to that processor by referring to at least one of the plurality of subdata. A more detailed explanation of the learning unit 13 has been given above, so it will be omitted here.
[0025] (Effects of Information Processing Method S1) As described above, the Information Processing Method S1 employs the following configuration: - Acquire target data; - Distribute and assign to multiple processors the multiple partial data obtained by dividing the target data and the multiple partial models obtained by dividing the learning model which is the target of learning and refers to the target data; - In each processor, train the partial model assigned to that processor by referring to at least one of the multiple partial data. The above configuration produces the same effects as the Information Processing Device 1.
[0026] (Configuration of Information Processing Device 2) The configuration of the information processing device 2 according to this exemplary embodiment will be described with reference to Figure 3. Figure 3 is a block diagram showing the configuration of the information processing device 2. As shown in Figure 3, the information processing device 2 includes an acquisition unit 21, an allocation unit 22, and an inference unit 23.
[0027] (Acquisition Unit 21) The acquisition unit 21 acquires target data. Here, the target data is, for example, data referenced in the inference processing (solution derivation processing) by the inference unit 23, which will be described later. The specific examples of the target data are not limited to this exemplary embodiment, but as an example, it is composed of at least one of the following: data that defines the optimization problem handled by the information processing device 2, and data that is referenced in order to solve the optimization problem.
[0028] (Assignment Unit 22) The assignment unit 22 distributes and assigns the multiple submodels obtained by dividing the learning model to multiple processors. For example, the assignment unit 22 divides the learning model into submodel p1 and submodel p2 and assigns them to processor 1 and processor 2, respectively. Here, dividing a model may include dividing at least one of the multiple parameters and multiple layers included in the model. The learning model is, for example, a model for solving an optimization problem. However, these examples do not limit this exemplary embodiment.
[0029] (Inference Unit 23) The inference unit 23 performs inference processing in each processor by inputting at least a portion of the target data into the partial model assigned to that processor. For example, if the assignment unit 22 has made the following assignments: - Processor 1: Partial model p1 - Processor 2: Partial model p2, the inference unit 23 performs inference processing in processor 1 by inputting at least a portion of the target data into partial model p1. Also, in processor 2, the inference processing is performed by inputting at least a portion of the target data into partial model p2. Furthermore, the inference processing in each processor may include a solution derivation process that refers to node information, edge information, and image information extracted from at least a portion of the target data.
[0030] (Effects of Information Processing Device 2) As described above, the Information Processing Device 2 employs the following configuration: - Acquires target data; - Distributes and assigns multiple submodels obtained by dividing the learning model to multiple processors; - Executes inference processing in each processor by inputting at least a portion of the target data to the submodel assigned to that processor; - The inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data. In this way, the Information Processing Device 2 distributes and assigns multiple submodels obtained by dividing the learning model to multiple processors, and then executes inference processing in each processor by inputting at least a portion of the target data to the submodel assigned to that processor. Therefore, even when the scale of the model is large, it is possible to achieve both high speed and accuracy of the solution.
[0031] [Second Embodiment] A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same function as those described in the above-described exemplary embodiment are denoted by the same reference numerals, and their descriptions are omitted as appropriate. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise. Furthermore, each technology shown in the drawings referenced to describe this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise.
[0032] (Configuration of Information Processing System 100A) The configuration of the information processing system 100A according to this exemplary embodiment will be described with reference to Figure 5. Figure 5 is a block diagram showing the configuration of the information processing system 100A. As shown in Figure 5, the information processing system 100A comprises an information processing device 1A and a management device 3 connected to the information processing device 1A via a network N. Here, the specific configuration of the network N is not limited to this exemplary embodiment, but as an example, a wireless LAN (Local Area Network), a wired LAN, a WAN (Wide Area Network), a public telephone network, a mobile data communication network, or a combination of these networks can be used.
[0033] (Management device 3) Management device 3 is configured to perform management in the target industry. The target industry handled by management device 3 is not limited to this exemplary embodiment, but as an example, it may include at least one of delivery management and production management.
[0034] The management device 3 acquires information to be referred to in the management of the target industry type from the information processing device 1A. Then, referring to the acquired information, it executes the management in the target industry type. As an example, the management device 3 may be configured to: - provide a distribution planning problem including information on a plurality of delivery destinations to the information processing device 1A, acquire the solution of the distribution planning problem obtained by the information processing device 1A, and perform distribution management using the solution of the distribution planning problem; or - provide a production planning problem including information on a plurality of tasks to the information processing device 1A, acquire the solution of the production planning problem obtained by the information processing device 1A, and perform production management using the solution of the production planning problem.
[0035] In the present exemplary embodiment, the management device 3 is illustrated as a device separate from the information processing device 1A, but this does not limit the present exemplary embodiment. The functions of the management device 3 may be configured to be provided in the control unit (CPU) of the information processing device 1A.
[0036] (Configuration of Information Processing Device 1A) Next, the configuration of the information processing device 1A according to the present exemplary embodiment will be described with reference to FIG. 5. As shown in FIG. 5, the information processing device 1A includes a CPU (Central Processing Unit) 10, a storage unit 20, a GPU (Graphical Processing Unit) group 50, a communication unit 30, and an input / output unit 40.
[0037] (Communication Unit 30) The communication unit 30 communicates with devices external to the information processing device 1A. As an example, the communication unit 30 communicates with the management device 3. The communication unit 30 transmits data supplied from the CPU 10 or the GPU group 50 to the management device 3, or supplies data received from the management device 3 to the CPU 10 or the GPU group 50. Note that the data received by the communication unit 30 from the management device 3 may include data defining at least one of the above-described distribution planning problem and production planning problem. Also, the data transmitted by the communication unit 30 to the management device 3 may include the solution derived by the information processing device 1A regarding at least one of the above-described distribution planning problem and production planning problem.
[0038] (Input / Output Unit 40) The input / output unit 40 is configured to include at least one of the following input / output devices: a keyboard, mouse, display, printer, touch panel, etc. Alternatively, the input / output unit 40 may be configured to have input / output devices such as a keyboard, mouse, display, printer, touch panel, etc. connected to it. In this configuration, the input / output unit 40 receives various types of information from the connected input device to the information processing device 1A. The input / output unit 40 also outputs various types of information to the connected output device under the control of the CPU 10. An interface such as USB (Universal Serial Bus) can be used as an input / output unit 40.
[0039] (Storage Unit 20) The storage unit 20 stores various data referenced by the CPU 10 or GPU group, and various data generated by the CPU 10 or GPU group. As an example, the storage unit 20 stores: ・Model parameters MP ・Training data TD Model parameters MP are a group of parameters of the deep learning model to be trained. On the other hand, training data TD is data used to train the deep learning model. As an example, training data TD may include at least one of the following: ・Data defining the optimization problem handled by the information processing device 1A, and ・Data referenced to solve the optimization problem. Furthermore, training data TD may include at least one of text data and image data. In this exemplary embodiment, training data TD may also be referred to as target data. Specific examples of training data TD, deep learning model, and model parameters MP will be described later.
[0040] (CPU 10) The CPU 10 includes a data extraction unit 112, a multi-process startup unit 121, a data partitioning unit 122, and a buffer (memory) for storing embedded data ED. The embedded data ED is data generated by the data preprocessing unit 111 (described later) by referring to the training data TD, and as an example includes one or more multidimensional vectors embedded in the feature space.
[0041] (Multi-process startup unit 121) The multi-process startup unit 121, as an example, starts up multiple processes and issues instructions to each unit to execute the learning process in order to distribute the learning data TD and the deep learning model across multiple processors. For example, the multi-process startup unit 121 issues instructions to the data partitioning unit 122 (described later) to partition the data, and issues various instructions to the partitioning and integration unit 123 (described later) for the learning process.
[0042] (Data extraction unit 112, data splitting unit 122) The data extraction unit 112 extracts data from the embedded data ED to be used in the learning process described above. The data splitting unit 122 divides the data extracted by the data extraction unit 112 into partial data to be assigned to each GPU included in the GPU group 50 described later, based on instructions from the multi-process startup unit 121. As an example, the data splitting unit 122 divides the data extracted by the data extraction unit 112 into: - Partial data b1 to be assigned to GPU 50-1 and - Partial data b2 to be assigned to GPU 50-2. Each partial data obtained by the splitting process by the data splitting unit 122 is supplied to each GPU.
[0043] (GPU Group 50) The GPU group 50 comprises multiple GPUs (50-1, 50-2, 50-3, ...). As shown in Figure 5, each GPU includes a data preprocessing unit 111, a splitting and integrating unit 123, a forward propagation execution unit 131, a reward calculation unit 132, a loss function calculation unit 133, a gradient calculation unit 134, a parameter update unit 135, an inter-GPU communication unit 14, and a buffer (parameter buffer, memory) where the split model parameters DMP are stored. The following explanation will refer to the configuration of GPU 50-1 in the drawing, but the same applies to the other GPUs 50-2, 50-3, ...
[0044] (Data preprocessing unit 111) The data preprocessing unit 111 acquires the training data TD from the storage unit 20 and performs preprocessing on the training data TD. As an example, the data preprocessing unit 111 performs the following processes: - Extracts at least one of node information, edge information, and image information from the training data TD. - Generates an embedding vector corresponding to at least one of the extracted node information, edge information, and image information. The generated embedding vector is supplied to the CPU 10 and buffered as part of the embedding data ED. The specific preprocessing by the data preprocessing unit 111 is not limited to this exemplary embodiment, but as an example, it may be configured to extract at least one of node information and edge information from at least one of the text data and image data contained in the training data TD. More specific examples of preprocessing by the data preprocessing unit 111 will be described later.
[0045] (Splitting and Integration Unit 123) Based on instructions from the multi-process startup unit 121, the splitting and integration unit 123 performs various processes to distribute the learning process of the learning data TD and the deep learning model across multiple processors. As an example, the splitting and integration unit 123 splits the deep learning model into multiple submodels. More specifically, the splitting and integration unit 123 splits the model parameters MP that define the deep learning model and buffers the model parameters assigned to the GPU 50-1 on the GPU as "split model parameters DMP" as shown in Figure 5. The splitting and integration unit 123 also works in cooperation with the inter-GPU communication unit 14 to share the updated model parameters with other GPUs. The splitting and integration unit 123 may also integrate the outputs of each submodel. More specific processing by the splitting and integration unit 123 will be described later.
[0046] (Forward propagation execution unit 131) The forward propagation execution unit 131 performs forward propagation processing by inputting partial data assigned to the GPU 50-1 into a partial model defined by the model parameters assigned to the GPU 50-1. The forward propagation execution unit 131 also supplies the data obtained by the forward propagation processing to the reward calculation unit 132. The data obtained by the forward propagation processing may include, for example, the solution (provisional solution) of the optimization problem solved by the forward propagation processing and the output probability of the solution. The forward propagation processing performed by the forward propagation execution unit 131 may also include a solution derivation process that references node information, edge information, and image information extracted from at least one of the multiple partial data. The forward propagation execution unit 131 may also be configured to perform the forward propagation processing using an attention model. However, these examples do not limit this exemplary embodiment.
[0047] (Reward Calculation Unit 132) The reward calculation unit 132 calculates the reward by referring to the data obtained by the forward propagation process performed by the forward propagation execution unit 131. For example, the reward calculation unit 132 refers to the solution (provisional solution) of the optimization problem obtained by the forward propagation process and calculates the reward corresponding to that solution. The reward calculation unit 132 also supplies the calculated reward to the loss function calculation unit 133.
[0048] (Loss function calculation unit 133) The loss function calculation unit 133 calculates a loss function by referring to: the reward calculated by the reward calculation unit 132, and the output probability of the solution (provisional solution) of the optimization problem referenced in the calculation of the reward (for example, the output probability calculated by the forward propagation execution unit 131). The calculated loss function is supplied to the gradient calculation unit 134.
[0049] (Gradient Calculation Unit 134) The gradient calculation unit 134 calculates the gradient (parameter gradient) of the model parameters assigned to the GPU by backpropagation processing that references the loss function calculated by the loss function calculation unit 133. Alternatively, the gradient calculation unit 134 may be configured to cooperate with the inter-GPU communication unit 14 and obtain the averaged parameter gradient by averaging the parameter gradient in the GPU with the parameter gradients in other GPUs. In other words, the gradient calculation unit 134 may be configured to average the parameter gradients obtained by backpropagation processing in each GPU across each processor and obtain the averaged parameter gradient. The averaged parameter gradient is supplied to the parameter update unit 135.
[0050] (Parameter update unit 135) The parameter update unit 135 updates the model parameters assigned to the GPU 50-1 by referring to the averaged parameter gradient calculated by the gradient calculation unit 134. The parameter update unit 135 may also be configured to cooperate with the inter-GPU communication unit 14 to share the updated parameters in each GPU. The updated model parameters are stored in the parameter buffer provided by the GPU 50-1.
[0051] Furthermore, the data preprocessing unit 111 and the data extraction unit 112 described above constitute, as an example, an acquisition unit (corresponding to the acquisition unit 11 in exemplary embodiment 1) that acquires target data (training data) TD.
[0052] Furthermore, the multi-process startup unit 121, the data partitioning unit 122, and the partitioning and integration unit 123 constitute an allocation unit (corresponding to the allocation unit 12 in exemplary embodiment 1) that distributes and allocates multiple partial data obtained by partitioning the target data (training data) TD, and multiple partial models obtained by partitioning the deep learning model that is the target of training and refers to the target data, to multiple processors (GPUs 50-1, 50-2, ...).
[0053] Furthermore, the forward propagation execution unit 131, reward calculation unit 132, loss function calculation unit 133, gradient calculation unit 134, and parameter update unit 135 described above constitute a learning unit (corresponding to the learning unit 13 in exemplary embodiment 1) in each processor (GPU 50-1, 50-2, ...) that trains the submodel assigned to the processor by referring to at least one of the plurality of subdata.
[0054] In the above example, the CPU 10 is configured to include a multi-process startup unit 121 and a data partitioning unit 122, and the GPUs 50-1, 50-2, ... are configured to include a partitioning and integration unit 123. However, this is not limited to this exemplary embodiment. For example, the CPU 10 may be configured to include the partitioning and integration unit 123. Alternatively, the GPUs 50-1, 50-2, ... may be configured to include at least one of the multi-process startup unit 121 and the data partitioning unit 122.
[0055] (Processing flow in Information Processing Device 1A) Next, an example of the processing flow in Information Processing Device 1A will be explained with reference to Figure 6. Figure 6 is a flowchart showing an example of the processing flow in Information Processing Device 1A.
[0056] (Step S1211) First, in step S1211, the multi-process startup unit 121 initializes the deep learning model. More specifically, the multi-process startup unit 121 initializes the model parameters MP of the deep learning model.
[0057] (Step S1212) Next, in step S1212, the multi-process startup unit 121 starts up multiple processes in order to perform learning processing that distributes the learning data TD and the deep learning model across multiple processors.
[0058] (Step S1231) Next, in step S1231, the splitting and integrating unit 123 splits the state of the deep learning model to be learned (parameters, gradients, optimizer state) in order to assign it to each of the multiple GPUs (50-1, 50-2, 50-3, ...). As an example, the splitting and integrating unit 123 performs the following processing: - Splits the model parameter MP included in the state of the deep learning model into parameter p1 and parameter p2, - Assigns parameter p1 to GPU 50-1, and assigns parameter p2 to GPU 50-2. The same applies to the gradient and optimizer state.
[0059] (Step S111) Next, in step S111, the data preprocessing unit 111 performs preprocessing on the training data TD. The data obtained by this preprocessing is then stored on the CPU 10 as embedded data ED. The processing flow by the data preprocessing unit 111 in this step will be described later.
[0060] (Step S1232) Next, in step S1232, the splitting and integrating unit 123 determines whether sufficient training has been performed on the deep learning model. If it is determined that sufficient training has not been performed (NO in step S1232), the process proceeds to step S1221. If it is determined that sufficient training has been performed (YES in step S1232), the process ends.
[0061] (Step S1221) In step S1221, the data extraction unit 112 and the data splitting unit 122 extract multiple data blocks (batches) from the embedded data ED. This process can also be described as the process by which the data splitting unit 122 divides the data extracted from data ED by the data extraction unit 112 into multiple batches. As an example, if the information processing device 1A is dealing with a delivery planning problem, multiple blocks (batches) of delivery destination information may be extracted from the embedded data ED.
[0062] (Step S1222) In step S1222, the data splitting unit 122 assigns each of the batches extracted in step S1221 to each of the multiple GPUs (50-1, 50-2, 50-3, ...). Then, it transfers each batch to each GPU. As an example, the data splitting unit 122 performs the following processes: - Transfers batch b1 to GPU 50-1 - Transfers batch b2 to GPU 50-2
[0063] (Step S13A) Next, in step S13A, each GPU inputs the batch assigned to it into a submodel defined by the parameters assigned to it, and calculates the loss function. As an example, REINFORCE loss may be calculated as the loss function according to the REINFORCE algorithm. A more specific processing flow in this step will be described later.
[0064] (Step S1341) Next, in step S1341, the gradient calculation unit 134 calculates the parameter gradient locally (for each GPU) according to the backpropagation method which refers to the loss function calculated in step S13A. As an example, the gradient ∇ of the loss function (policy function) J(θ) in the REINFORCE algorithm. θ J(θ) can also be calculated locally.
[0065] (Step S1342) In step S1342, the gradient calculation unit 134 works in cooperation with the inter-GPU communication unit 14 to obtain the averaged parameter gradient by averaging the parameter gradient in the GPU with the parameter gradients in other GPUs.
[0066] (Step S1351) Next, in step S1351, the parameter update unit 135 updates the parameters handled by the GPU (parameters assigned to the GPU) locally (for each GPU) according to a predetermined optimization algorithm. As an example, the parameter update unit 135 may use Adam (Adaptive Moment Estimation) to update the parameters handled by the GPU locally.
[0067] (Step S1352) In step S1352, the parameter update unit 135 works in cooperation with the inter-GPU communication unit 14 to share the updated parameters in each GPU.
[0068] (Specific Processing Example 1 in Step S111) The upper part of Figure 7 shows Specific Processing Example 1 in Step S111 as described above. As shown in the upper part of Figure 7, Step S111 is configured to include Steps S1111 to S1114 as an example.
[0069] (Step S1111) In step S1111, the data preprocessing unit 111 classifies (organizes) the input information, which is the learning data TD, into either node, edge, or global information.
[0070] (Step S1112) In step S1112, the data preprocessing unit 111 embeds the information classified in step S1111 into the feature space according to the data format of the information. As a result, the data preprocessing unit 111 obtains feature quantities (feature vectors) corresponding to each piece of information classified in step S1111.
[0071] (Step S1113) In step S1113, the data preprocessing unit 111 combines feature quantities for each node and each edge.
[0072] (Step S1114) In step S1114, the data preprocessing unit 111 stores the combined feature quantities from step S1113 on the CPU 10 as embedded data ED.
[0073] (Specific Processing Example 1 in Step S13A) The lower part of Figure 7 shows Specific Processing Example 1 in Step S13A as described above. As shown in the lower part of Figure 7, Step S13A consists of steps S131, S1321 to S1322, and S1331 to S1332 as an example.
[0074] (Step S131) In step S131, the forward propagation execution unit 131 performs forward propagation to the partial model defined by the parameters assigned to the GPU 50-1 by inputting at least one of the above-described batches to the partial model.
[0075] (Step S1321) Next, in step S1321, the reward calculation unit 132 obtains the solution (provisional solution) of the optimization problem obtained by the forward propagation process performed by the forward propagation execution unit 131, and the output probability of said solution.
[0076] (Step 1322) Then, in step S1322, the reward calculation unit 132 calculates the reward by referring to the solution and the output probability.
[0077] (Step S1331) Next, in step S1331, the loss function calculation unit 133 performs the necessary processing according to the reinforcement learning algorithm to be applied.
[0078] (Step S1332) In step S1332, the loss function calculation unit 133 calculates the loss function by referring to the reward and the output probability.
[0079] (Specific Processing Example 1 in Step S131) Figure 8 shows specific processing example 1 in step S131 as described above.
[0080] (Step S13101) In step S13101, the forward propagation execution unit 131 performs embedding (feature vectorization in feature space) of edge information included in the batch assigned to the GPU 50-1.
[0081] (Step S13102) In step S13102, the forward propagation execution unit 131 performs embedding (feature vectorization in feature space) of node information included in the batch assigned to the GPU 50-1.
[0082] (Step S13103) In step S13103, the forward propagation execution unit 131 performs embedding (feature vectorization in feature space) of the image information included in the batch assigned to the GPU 50-1.
[0083] (Step S13104) Next, in step S13104, the forward propagation execution unit 131 applies data augmentation to the feature vectors of the node information and the feature vectors of the image information.
[0084] (Step S13105) Next, in step S13105, the forward propagation execution unit 131 performs multi-head attention (MHA) processing in the layer specified by layer index i, referencing: the feature vector of the node information generated in step S13102, the feature vector of the augmented node information generated in step S13104, and the feature vector of the augmented image information generated in step S13104. When this step is performed for the first time, an initial value (for example, 1) is used as the layer index i.
[0085] (Step S13106) Next, in step S13106, the forward propagation execution unit 131 applies residual connection (Add) and layer normalization (norm) to the result of the multi-head attention.
[0086] (Step S13107) Next, in step S13107, the forward propagation execution unit 131 applies MLP (Multilayer Perceptron) to the result of the processing in step S13106.
[0087] (Step S13108) Next, in step S13108, the forward propagation execution unit 131 performs residual connection (Add) and layer normalization (norm) on the result of the processing in step S13107.
[0088] (Step S13109) Next, in step S13109, the forward propagation execution unit 131 determines that index i is n layer Determine whether it is less than or equal to n. Here, n layer This is the total number of layers in the submodel defined by the parameters assigned to the GPU 50-1. layerIf it is less than (YES in step S13109), index i is incremented and the process in step S13105 is executed; otherwise (NO in step S13109), the process proceeds to step S13110.
[0089] (Step S13110) In step S13110, the forward propagation execution unit 131 determines whether or not the solution is complete. If the solution is complete (YES in step S13110), the process in step S131 is terminated and the process proceeds to step S1321 in the lower part of Figure 7; otherwise (NO in step S13110), the process proceeds to step S13111.
[0090] (Step S13111) In step S13111, the forward propagation execution unit 131 applies a masking process to the solution and applies multi-head attention processing to the masked data.
[0091] (Step S13112) Next, in step S13112, the forward propagation execution unit 131 applies residual connection (Add) and layer normalization (norm) to the result of the multi-head attention.
[0092] (Step S13113) Next, in step S13113, the forward propagation execution unit 131 applies MLP (Multilayer Perceptron) to the result of the processing in step S13112.
[0093] (Step S13114) Next, in step S13114, the forward propagation execution unit 131 performs residual connection (Add) and layer normalization (norm) on the result of the processing in step S13113.
[0094] (Step S13115) Next, in step S13115, the forward propagation execution unit 131 applies mask processing to the nodes resulting from the processing in step S13114.
[0095] (Step S13116) Next, in step S13116, the forward propagation execution unit 131 applies the softmax function to the data after the masking process.
[0096] (Step S13117) Next, in step S13117, the forward propagation execution unit 131 refers to the node to which the processing in step S13116 was applied and the discrete probability distribution, and probabilistically selects a node from that node.
[0097] (Step S13118) Next, in step S13118, the forward propagation execution unit 131 adds the selected node to the solution and then proceeds to step S13110 described above.
[0098] (Learning example using two GPUs) Next, with reference to Figure 9, an example of learning processing by the information processing device 1A using two GPUs will be explained. In the example shown in Figure 9, a total of seven phases, from phase P11 to P17, are shown as processing in the first GPU (corresponding to GPU 50-1), and a total of six phases, from phase P21 to P26, are shown as processing in the second GPU (corresponding to GPU 50-2).
[0099] (Phase P11) In Phase P11, batch b1 is assigned to the first GPU as the divided training data TD, and model parameter p1 is assigned as the parameter that defines the submodel. In Figure 9, the gradient of the model parameter p1 is shown as gradient g1. Hereafter, the submodel defined by model parameter p1 will also be referred to as submodel p1.
[0100] (Phase P21) Meanwhile, in Phase P21, batch b2 is assigned to the second GPU as the divided training data TD, and model parameter p2 is assigned as the parameter that defines the submodel. Also, in Figure 9, the gradient of the model parameter p2 is shown as gradient g2. Hereafter, the submodel defined by the model parameter p2 will also be referred to as submodel p2.
[0101] (Phase P12) In Phase P12, the forward propagation execution unit 131 of the first GPU inputs batch b1 to the partial model p1 and generates data h1 by forward propagating batch b1 to the partial model p1. The generated data h1 is transferred to the second GPU and referenced in Phase P22.
[0102] (Phase P22) In Phase P22, the forward propagation execution unit 131 of the second GPU inputs the data h1 generated by the propagation process in the first GPU in Phase P12 to the partial model p2, and generates data l1 by forward propagating the data h1 to the partial model p2. The generated data l1 is transferred to the first GPU and referenced in Phase P14.
[0103] (Phase P13) In Phase P13, the forward propagation execution unit 131 of the first GPU inputs batch b2, which was transferred from the second GPU in Phase P21, into model p1, and generates data h2 by forward propagating batch b2 to model p1. The generated data h2 is transferred to the second GPU and referenced in Phase P23.
[0104] (Phase P14) In Phase P14, the forward propagation execution unit 131 of the first GPU acquires the data l1 transferred from the second GPU in Phase P22. This data l1 is referenced to be averaged with data l2.
[0105] (Phase P23) In Phase P23, the forward propagation execution unit 131 of the second GPU inputs the data h2 transferred from the first GPU in Phase P13 to the model p2 and generates data l2 by forward propagating the data p2 to the model p2. The generated data l2 is averaged with l1 from Phase P14 of the first GPU to generate data l.
[0106] (Phase P15) In Phase P15, the first GPU obtains data l, which is the average of data l1 and data l2. Similarly, the first GPU obtains data h, which is the average of data h1 and data h2.
[0107] (Phase P24) In Phase P24, the second GPU obtains data l, which is the average of data l1 and data l2. Similarly, the second GPU may obtain data h, which is the average of data h1 and data h2.
[0108] (Phase P16) In phase P16, the gradient calculation unit 134 of the first GPU calculates the gradient g1' by backpropagating the data h to the partial model p1.
[0109] (Phase P25) In Phase P25, the gradient calculation unit 134 of the second GPU calculates the gradient g2' by backpropagating the data l to the partial model p2.
[0110] (Phase P17) In Phase P17, the parameter update unit 135 of the first GPU updates parameter p1 to parameter p1' using the gradient g1' calculated in Phase P16. Alternatively, the parameter update unit 135 of the first GPU may update parameter p1 to parameter p1' using the average of gradient g1' and gradient g2'.
[0111] (Phase P26) In Phase P26, the parameter update unit 135 of the second GPU updates parameter p2 to parameter p2' using the gradient g2' calculated in Phase P25. Alternatively, the parameter update unit 135 of the second GPU may update parameter p2 to parameter p2' using the average of gradient g1' and gradient g2'.
[0112] Furthermore, at least a portion of each of the above-described phases may be executed in parallel. For example, the process of "inputting data h1 into partial model p2 on the second GPU and propagating said data h1 forward to partial model p2" (phase P22) and the process of "inputting batch b2 into model p1 on the first GPU and propagating said batch b2 forward to model p1" (phase P13) may be executed in parallel.
[0113] Thus, according to the information processing device 1A, multiple partial data (batches b1, b2, ...) obtained by dividing the target data, and multiple partial models (p1, p2, ...) obtained by dividing the deep learning model that is the target of learning and refers to the target data, are distributed and allocated to multiple processors (first GPU, second GPU, ...). Then, each processor (first GPU, second GPU, ...) learns the partial model allocated to that processor. Therefore, even when the model is large, or when the size of the target data is large, both high speed and accuracy of the solution can be achieved.
[0114] The effects of this example can be expressed as follows: For example, if a deep learning model has a first layer, a second layer, and a third layer from the input side, then: - The first layer can perform calculations on partial data 3, - The second layer can perform calculations on partial data 2 (the output result from the first layer), and - The third layer can perform calculations on partial data 1 (the output result from the second layer) in parallel. As a result, calculations can be performed efficiently using multiple GPUs, and processing time is reduced.
[0115] In the above example, distributed processing across two GPUs, a first GPU and a second GPU, was used as an example, but this does not limit the exemplary embodiment. The configuration according to this exemplary embodiment can also be applied to cases with any N GPUs (50-1, 50-2, ..., 50-N).
[0116] (Specific Processing Example 2 in Step S111) The upper part of Figure 10 shows Specific Processing Example 1 in Step S111 described above. This example shows a case where the information processing device 1A handles the delivery planning problem as an optimization problem. In this example, the learning data TD includes: a map image including traffic congestion information; addresses of multiple delivery destinations, type and weight of delivered items, specified time slots, priority, etc.; and information such as the distance between delivery destinations and transportation costs. As shown in the upper part of Figure 10, Step S111 in this example consists of steps S1110 to S1114.
[0117] (Step S1110) In step S1110, the data preprocessing unit 111 obtains a map image containing traffic congestion information from the input information, which is the learning data TD.
[0118] (Step S1111) In step S1111, the data preprocessing unit 111 extracts node information and edge information from the training data TD. The extraction process also references a map image containing traffic congestion information acquired in step S1110. Here, the node information includes, as an example, the addresses of multiple delivery destinations, the type and weight of the delivered items, the specified time slot, the priority, etc., and the edge information includes, as an example, the distance between delivery destinations, the transportation costs, etc. Furthermore, the node information and edge information may include multiple data formats. For example, the node information and edge information may include both text data and numerical data.
[0119] (Steps S11121, S11122, S11123) Next, in step S11121, the data preprocessing unit 111 determines the data format of the information fragments included in the node information and edge information, and in steps S11122 and S11123, it performs processing according to the data format. For example, if the data format is text data (address, type of package, etc.), in step S11122, the text data is input into a machine learning language model (LLM) to perform feature extraction (feature vectorization). For example, word2vec is applied to the text data to perform feature extraction (feature vectorization). On the other hand, if the data format is numerical data, min-max scaling is applied to the numerical data to perform feature extraction.
[0120] (Step S1113) Then, in step S1113, the data preprocessing unit 111 classifies each feature generated in steps S11122 and S11123 into node information or edge information, combines each feature that is node information, and combines each feature that is edge information.
[0121] (Step S1114) In step S1114, the data preprocessing unit 111 stores the combined feature quantities from step S1113 on the CPU 10 as embedded data ED.
[0122] (Specific Processing Example 1 in Step S13A) The lower part of Figure 10 shows specific processing example 2 in step S13A described above. This example is an example in which the information processing device 1A handles the delivery planning problem as an optimization problem, and is a processing example that is executed in correspondence with processing example 2 in step S111 described above. As shown in the lower part of Figure 10, step S13A in this example consists of steps S131, S1321, S13221, S13222 to S13223, and S1331 to S1332.
[0123] (Step S131) In step S131, the forward propagation execution unit 131 performs forward propagation to the submodel, which is defined by the parameters assigned to the GPU 50-1, by inputting at least one of the above-described batches to the submodel. Note that if this step is performed for the first time, the index i, which indicates the number of rollouts (probabilistic trials), is set to an initial value (e.g., 1).
[0124] (Step S1321) Next, in step S1321, the reward calculation unit 132 obtains the distribution order as the solution (provisional solution) of the optimization problem obtained by the forward propagation execution unit 131, and the output probability of said solution.
[0125] (Step S13221) Next, in step S13221, the reward calculation unit 132 calculates the travel time and travel distance indicated by the solution.
[0126] (Step S13222) Then, in step S13222, the reward calculation unit 132 calculates the reward r corresponding to the solution. i Calculate the reward r. i The specific calculation examples are not limited to this example, but as an example, r i The compensation r is calculated as follows: = (fare paid to the driver) + (fuel costs) + ... (Step S13223) Subsequently, in step S13223, the reward calculation unit 132 determines whether the index i is less than n. Here, n is a value that defines the upper limit of the number of rollouts. If the index i is less than n (YES in step S13223), the index i is incremented and the process of step S131 is executed. Otherwise (NO in step S13223), the process proceeds to step S1331. rollout (Step S13223) Subsequently, in step S13223, the reward calculation unit 132 determines whether the index i is less than n. Here, n is a value that defines the upper limit of the number of rollouts. If the index i is less than n (YES in step S13223), the index i is incremented and the process of step S131 is executed. Otherwise (NO in step S13223), the process proceeds to step S1331. rollout (Step S13223) Subsequently, in step S13223, the reward calculation unit 132 determines whether the index i is less than n. Here, n is a value that defines the upper limit of the number of rollouts. If the index i is less than n (YES in step S13223), the index i is incremented and the process of step S131 is executed. Otherwise (NO in step S13223), the process proceeds to step S1331. rollout (Step S13223) Subsequently, in step S13223, the reward calculation unit 132 determines whether the index i is less than n. Here, n is a value that defines the upper limit of the number of rollouts. If the index i is less than n (YES in step S13223), the index i is incremented and the process of step S131 is executed. Otherwise (NO in step S13223), the process proceeds to step S1331.
[0128] (Step S1331) Subsequently, in step S1331, the loss function calculation unit 133 calculates the baseline b for calculating the loss function by the average of the losses r in each rollout. More specifically, the loss function calculation unit 133 calculates the baseline b as b = Σr / n. i (Step S1331) Subsequently, in step S1331, the loss function calculation unit 133 calculates the baseline b for calculating the loss function by the average of the losses r in each rollout. More specifically, the loss function calculation unit 133 calculates the baseline b as b = Σr / n. i (Step S1331) Subsequently, in step S1331, the loss function calculation unit 133 calculates the baseline b for calculating the loss function by the average of the losses r in each rollout. More specifically, the loss function calculation unit 133 calculates the baseline b as b = Σr / n. i (Step S1331) Subsequently, in step S1331, the loss function calculation unit 133 calculates the baseline b for calculating the loss function by the average of the losses r in each rollout. More specifically, the loss function calculation unit 133 calculates the baseline b as b = Σr / n. rollout (Step S1331) Subsequently, in step S1331, the loss function calculation unit 133 calculates the baseline b for calculating the loss function by the average of the losses r in each rollout. More specifically, the loss function calculation unit 133 calculates the baseline b as b = Σr / n.
[0129] (Step S1332) Then, in step S1332, the loss function calculation unit 133 calculates the loss function by referring to the above reward, the above output probability, and the above baseline. More specifically, the loss function calculation unit 133 calculates the loss function J(θ) in the REINFORCE algorithm as J(θ) = E[Σlogp(r - b)]. Here, p in the above formula represents the output probability of the solution. π (Step S1332) Then, in step S1332, the loss function calculation unit 133 calculates the loss function by referring to the above reward, the above output probability, and the above baseline. More specifically, the loss function calculation unit 133 calculates the loss function J(θ) in the REINFORCE algorithm as J(θ) = E[Σlogp(r - b)]. Here, p in the above formula represents the output probability of the solution. i (Step S1332) Then, in step S1332, the loss function calculation unit 133 calculates the loss function by referring to the above reward, the above output probability, and the above baseline. More specifically, the loss function calculation unit 133 calculates the loss function J(θ) in the REINFORCE algorithm as J(θ) = E[Σlogp(r - b)]. Here, p in the above formula represents the output probability of the solution. i (Step S1332) Then, in step S1332, the loss function calculation unit 133 calculates the loss function by referring to the above reward, the above output probability, and the above baseline. More specifically, the loss function calculation unit 133 calculates the loss function J(θ) in the REINFORCE algorithm as J(θ) = E[Σlogp(r - b)]. Here, p in the above formula represents the output probability of the solution. i (Step S1332) Then, in step S1332, the loss function calculation unit 133 calculates the loss function by referring to the above reward, the above output probability, and the above baseline. More specifically, the loss function calculation unit 133 calculates the loss function J(θ) in the REINFORCE algorithm as J(θ) = E[Σlogp(r - b)]. Here, p in the above formula represents the output probability of the solution. i (Step S1332) Then, in step S1332, the loss function calculation unit 133 calculates the loss function by referring to the above reward, the above output probability, and the above baseline. More specifically, the loss function calculation unit 133 calculates the loss function J(θ) in the REINFORCE algorithm as J(θ) = E[Σlogp(r - b)]. Here, p in the above formula represents the output probability of the solution.
[0130] As described in this example, in the information processing apparatus 1A, the forward propagation execution unit 131 executes the solution derivation process a plurality of times in each GPU, the loss function calculation unit 133 sets the average value of the rewards obtained according to the solutions derived by each of the solution derivation processes as the baseline, calculates the loss function by referring to the reward and the baseline, and the gradient calculation unit 134 executes the backpropagation process by referring to the loss function.
[0131] (Specific Processing Example 1 in Step S131) Figure 11 shows a specific processing example (processing example 2) in step S131 described above, with reference to the lower part of Figure 10. Figure 11 is a processing example that is executed in correspondence with processing example 2 in step S13A shown in the lower part of Figure 10.
[0132] (Steps S13101 to S13110) Each step in steps S13101 to S13110 is the same as in Processing Example 1 explained with reference to Figure 8, so redundant explanations will be omitted.
[0133] (Step S13111) In step S13111, the forward propagation execution unit 131 masks the already visited delivery destinations included in the solution and applies multi-head attention processing to the masked data.
[0134] (Step S13112) Next, in step S13112, the forward propagation execution unit 131 applies residual connection (Add) and layer normalization (norm) to the result of the multi-head attention.
[0135] (Step S13113) Next, in step S13113, the forward propagation execution unit 131 applies MLP (Multilayer Perceptron) to the result of the processing in step S13112.
[0136] (Step S13114) Next, in step S13114, the forward propagation execution unit 131 performs residual connection (Add) and layer normalization (norm) on the result of the processing in step S13113.
[0137] (Step S13115) Next, in step S13115, the forward propagation execution unit 131 masks the already visited delivery destinations included in the result of the processing in step S13114.
[0138] (Step S13116) Next, in step S13116, the forward propagation execution unit 131 applies the softmax function to the data after the masking process.
[0139] (Step S13117) Next, in step S13117, the forward propagation execution unit 131 refers to the destination to which the processing in step S13116 was applied and the discrete probability distribution, and probabilistically selects a destination from the said destination.
[0140] (Step S13118) Next, in step S13118, the forward propagation execution unit 131 adds the selected delivery destination to the solution delivery route and then proceeds to step S13110 described above.
[0141] (Effects of Information Processing Device 1A) As described above, the Information Processing Device 1A employs the following configuration: - Acquires training data (target data) TD; - Distributes and assigns to multiple processors (GPU 50-1, 50-2, ...) multiple partial data (b1, b2, ...) obtained by dividing the target data TD, and multiple partial models (p1, p2, ...) obtained by dividing the deep learning model that is the target of learning and refers to the target data TD; - Each processor (GPU 50-1, 50-2, ...) is trained by referring to at least one of the multiple partial data. In this way, the information processing device 1A distributes and assigns multiple partial data (b1, b2, ...) obtained by dividing the target data TD, and multiple partial models (p1, p2, ...) obtained by dividing the deep learning model that is the target of learning by referring to the target data TD, to multiple processors (GPU 50-1, 50-2, ...), and then each processor learns the partial model assigned to that processor. Therefore, even when the model is large or the size of the target data is large, it is possible to achieve both high speed and accuracy of the solution.
[0142] As an example, with the information processing device 1A configured as described above, the target deep learning model can be distributed across tens of GPUs. 7 It is possible to train a large-scale model (i.e., a high-performance model) with the above parameters.
[0143] Furthermore, as specifically explained in steps S1110 and S13103, in a configuration that handles delivery planning problems, the information processing device 1A acquires map information including traffic congestion information and uses it in the solution derivation process. In this way, the information processing device 1A can also suitably handle map image data representing traffic congestion conditions.
[0144] Furthermore, as specifically explained in step S1111, in the configuration for handling the delivery planning problem, the information processing device 1A also extracts edge information such as the distance and transportation costs between delivery destinations from the training data TD and uses it in the solution derivation process. Therefore, it is possible to derive a delivery plan based not only on point information such as the distance and transportation costs between delivery destinations, but also on line information.
[0145] Furthermore, as specifically explained in step S11122, the information processing device 1A inputs the text data contained in the training data TD into a machine learning language model (LLM) to extract features (create feature vectors), which are then used in the solution derivation process. As a result, it is possible to derive a delivery plan while utilizing text data that could not be used in the past, such as the address itself and the type of delivery item.
[0146] In this way, the information processing device 1A can derive more accurate solutions to delivery planning problems and production planning problems by supporting the multimodal processing of target data (training data) TD.
[0147] Furthermore, as specifically explained in step S1332, the information processing device 1A performs the following processes: - Sets the average value of the rewards obtained according to each solution derived by the solution derivation process as the baseline, and - Calculates a loss function by referring to the rewards and the baseline. With this configuration that refers to the baseline, even large-scale plans with 100 or more delivery destinations can be learned stably.
[0148] (Configuration of Information Processing System 200A) Next, the configuration of the information processing system 200A according to this exemplary embodiment will be described with reference to Figure 12. Figure 12 is a block diagram showing the configuration of the information processing system 200A. As shown in Figure 12, the information processing system 200A comprises an information processing device 2A and a management device 3 connected to the information processing device 2A via a network N. Here, the specific configuration of the network N is not limited to this exemplary embodiment, but as an example, a wireless LAN (Local Area Network), a wired LAN, a WAN (Wide Area Network), a public telephone network, a mobile data communication network, or a combination of these networks can be used.
[0149] (Management device 3) Management device 3 has the same configuration as the management device provided in the information processing system 100A described above. For example, management device 3 may be configured to: - Provide a delivery planning problem containing information on multiple delivery destinations to the information processing device 1A, obtain the solution to the delivery planning problem obtained by the information processing device 1A, and perform delivery management using the solution to the delivery planning problem; or - Provide a production planning problem containing information on multiple tasks to the information processing device 1A, obtain the solution to the production planning problem obtained by the information processing device 1A, and perform production management using the solution to the production planning problem.
[0150] In the information processing system 200A, the management device 3 is shown as a separate device from the information processing device 2A, but this does not limit the configuration of the information processing system 200A. The functions of the management device 3 may also be provided by the control unit (CPU) of the information processing device 2A.
[0151] (Configuration of Information Processing Device 2A) Next, the configuration of the information processing device 2A according to this exemplary embodiment will be described with reference to Figure 12. As shown in Figure 12, the information processing device 2A includes a CPU (Central Processing Unit) 10, a storage unit 20, a group of GPUs (Graphical Processing Units) 50, a communication unit 30, and an input / output unit 40.
[0152] (Communication unit 30, Input / Output unit 40) The communication unit 30 and the input / output unit 40 are the same as those of the information processing device 1A, so redundant explanations will be omitted.
[0153] (Storage Unit 20) The storage unit 20 of the information processing device 2A stores various data referenced by the CPU 10 or GPU group, and various data generated by the CPU 10 or GPU group. As an example, the storage unit 20 stores: ・Model parameters MP ・Query data QD Model parameters MP are a group of parameters of a deep learning model that performs inference processing (solution derivation processing). Model parameters learned (updated) in the information processing device 1A can be used as model parameters MP. To enable this configuration, the information processing device 2A may be configured to include blocks that execute the learning processing described in the information processing device 1A.
[0154] On the other hand, query data QD is data used to perform inference processing by the deep learning model. For example, query data QD may include at least one of the following: data defining the optimization problem handled by the information processing device 2A, and data referenced to solve the optimization problem. In addition, query data QD may include at least one of text data and image data. In the following description, query data QD may also be referred to as target data or query information.
[0155] (CPU 10) As shown in Figure 12, the CPU 10 of the information processing device 2A includes a query acquisition unit 211, a supplementary information acquisition unit 212, and a multi-process startup unit 121.
[0156] (Multi-process startup unit 121) The multi-process startup unit 121 of the information processing device 2A starts up multiple processes and issues instructions to each unit to execute the inference processing in order to perform inference processing that distributes the deep learning model across multiple processors, as an example. For example, the multi-process startup unit 121 issues various instructions for the above-mentioned inference processing to the division and integration unit 123, which will be described later.
[0157] (Query acquisition unit 211, supplementary information acquisition unit 212) The query acquisition unit 211 acquires queries via the communication unit 30 or the input / output unit 40. Here, the query may, for example, include information that defines a part of the delivery planning problem or production planning problem provided by the management device 3 described above. The supplementary information acquisition unit 212 acquires supplementary information via the communication unit 30 or the input / output unit 40. Here, the supplementary information is data that is referenced together with the query in the inference processing performed by the information processing device 2A. Specific examples of the supplementary information are not limited to this exemplary embodiment, but in a configuration in which the information processing device 2A handles a delivery planning problem, if the query acquires: - Addresses of multiple delivery destinations, type and weight of delivery items, specified time slots, priority, etc., the supplementary information can acquire: - Map images including traffic congestion information, routes between delivery destinations, transportation costs, etc., and reference them in the inference processing. In the following description, the query and supplementary information together may be referred to as target data.
[0158] (GPU Group 50) The GPU group 50, like the information processing device 1A, comprises multiple GPUs (50-1, 50-2, 50-3, ...). As shown in Figure 12, each GPU includes a data preprocessing unit 111 (213), a splitting and integrating unit 123 (22), a forward propagation execution unit 131 (23), an inter-GPU communication unit 14, a display information generation unit 17, and a buffer (parameter buffer, memory) where the split model parameters DMP are stored. The explanation will be given with reference to the configuration of GPU 50-1 in the drawing, but the same applies to the other GPUs 50-2, 50-3, ...
[0159] Although not shown in Figure 12, the GPU group of the information processing device 2A may also be configured to further include a reward calculation unit 132, a loss function calculation unit 133, a gradient calculation unit 134, and a parameter update unit 135, similar to the information processing device 1A, in order to train a deep learning model used for inference processing.
[0160] (Data preprocessing unit 111 (213)) The data preprocessing unit 111 (213) performs preprocessing on the target data, which includes query data QD and supplementary information. As an example, the data preprocessing unit 111 (213) performs the following processes: - Extracts at least one of node information, edge information, and image information from the target data - Generates an embedding vector corresponding to at least one of the extracted node information, edge information, and image information. The generated embedding vector is supplied to the forward propagation execution unit 131 (23). The specific preprocessing performed by the data preprocessing unit 111 (213) is not limited to this exemplary embodiment, but as an example, it may be configured to extract at least one of node information and edge information from at least one of the text data and image data included in the target data. The specific processing performed by the data preprocessing unit 111 (213) is the same as that of the data preprocessing unit 111 provided in the information processing device 1A, so redundant explanations are omitted.
[0161] (Splitting and integrating unit 123(23)) The splitting and integrating unit 123(23) executes various processes for distributing the learning process of the deep learning model across multiple processors based on instructions from the multi-process startup unit 121. The splitting and integrating unit 123(23) may also be configured to execute various processes for distributing the learning process of both the target data and the deep learning model across multiple processors.
[0162] As an example, the splitting and integrating unit 123 (23) divides the deep learning model into multiple submodels. More specifically, the splitting and integrating unit 123 (23) divides the model parameters MP that define the deep learning model and buffers the model parameters assigned to the GPU 50-1 on the GPU as "divided model parameters DMP" as shown in Figure 12. The splitting and integrating unit 123 (23) may also cooperate with the inter-GPU communication unit 14 to share the updated model parameters (trained model parameters) with other GPUs. The more specific processing by the splitting and integrating unit 123 (23) includes the same processing as the splitting and integrating unit 123 provided in the information processing device 1A, so redundant explanations are omitted.
[0163] (Forward propagation execution unit 131 (23)) The forward propagation execution unit 131 (23) performs forward propagation processing by inputting data supplied from the data preprocessing unit 111 (213) into a partial model defined by the model parameters assigned to the GPU 50-1. The forward propagation execution unit 131 also supplies the data obtained by the forward propagation processing to the display information generation unit 17. The data obtained by the forward propagation processing may, for example, include the solution to an optimization problem solved by the forward propagation processing. The forward propagation processing performed by the forward propagation execution unit 131 may include a solution derivation process that references node information, edge information, and image information extracted from at least one of a plurality of partial data. The specific processing by the forward propagation execution unit 131 (23) is the same as that of the forward propagation execution unit 131 provided in the information processing device 1A, so redundant explanations are omitted.
[0164] (Display Information Generation Unit 17) The display information generation unit 17 generates display data from the data obtained by the forward propagation processing performed by the forward propagation execution unit 131 (23). The display data generated by the display information generation unit 17 is, for example, visually presented via the display provided by the input / output unit 40.
[0165] For example, in a configuration where the information processing device 2A handles a delivery planning problem, the display data generated by the display information generation unit 17 may include information indicating the delivery order. Similarly, in a configuration where the information processing device 2A handles a production planning problem, the display data generated by the display information generation unit 17 may include information indicating the order of the production processes. Specific examples of the display data generated by the display information generation unit 17 will be described later.
[0166] The query acquisition unit 211, supplementary information acquisition unit 212, and data preprocessing unit 111 (213) described above constitute, as an example, an acquisition unit (corresponding to the acquisition unit 21 in exemplary embodiment 1) that acquires target data (query data QD, supplementary information).
[0167] Furthermore, the multi-process startup unit 121 and the division / integration unit 123 (22) constitute an allocation unit (corresponding to the allocation unit 22 in exemplary embodiment 1) that distributes and allocates multiple submodels obtained by dividing the deep learning model to multiple processors (GPUs 50-1, 50-2, ...).
[0168] Furthermore, the forward propagation execution unit 131 (23) described above constitutes an inference unit (corresponding to the inference unit 23 in exemplary embodiment 1) that performs inference processing by inputting at least a portion of the target data into the partial model assigned to each processor (GPU 50-1, 50-2, ...).
[0169] (Processing flow in information processing device 2A) Next, an example of the processing flow in information processing device 2A will be explained with reference to Figure 13. Figure 13 is a flowchart showing an example of the processing flow in information processing device 2A.
[0170] (Step S211) First, in step S211, the query acquisition unit 211 receives query information (query).
[0171] (Step S221) Next, in step S221, the splitting and integrating unit 123 (22) reads the model parameters MP of the deep learning model from the storage unit 20.
[0172] (Step S222) In step S222, the division and integration unit 123(22) divides the deep learning model defined by the model parameter MP into a plurality of submodels and distributes them among the GPUs. As an example, the division and integration unit 123(22) divides the model parameter MP into parameter p1 and parameter p2, and performs the following processing: assign parameter p1 to GPU 50-1 and assign parameter p2 to GPU 50-2.
[0173] (Step S213 (S111)) Next, in step S213 (S111), the data preprocessing unit 111 (213) performs the preprocessing described above. The data after preprocessing is supplied to the forward propagation execution unit 131 (23).
[0174] (Step S23 (S131)) Next, in step S23 (S131), the forward propagation execution unit 131 (23) derives a solution by forward propagation of the data supplied from the data preprocessing unit 111 (213) to a model defined by the parameters assigned to the GPU. The more specific processing in this step is the same as in step S131 performed by the information processing device 1A, so a redundant explanation will be omitted.
[0175] (Step S17) In step S17, the display information generation unit 17 generates display data from the data obtained by the forward propagation processing by the forward propagation execution unit 131 (23). The display data generated by the display information generation unit 17 is visually presented via the display provided by the input / output unit 40.
[0176] (Specific Processing Example 1 in Step S213) Figure 14 shows specific processing example 1 in step S213 (S111) described above. This example shows a processing example when the information processing device 2A handles a delivery planning problem. As shown in Figure 14, step S213 is configured to include steps S2131 to S2134 as an example.
[0177] (Step S2131) In step S2131, the data preprocessing unit 111 (213) obtains real-time supplementary information from the supplementary information acquisition unit 212. For example, if the query information obtained in step S211 as described above includes: the addresses of multiple delivery destinations, the type and weight of the delivery items, the specified time slot, the priority, etc., the data preprocessing unit 111 (213) may obtain supplementary information such as: a map image including traffic congestion information, the route between delivery destinations, and transportation costs.
[0178] (Step S2132) Next, in step S2132, the data preprocessing unit 111 (213) extracts and organizes various types of information from the target data, including the query data QD and supplementary information. As an example, the data preprocessing unit 111 (213) extracts node information, edge information, and other global information from the target data.
[0179] (Step S2133) Next, in step S2133, the data preprocessing unit 111 (213) embeds the information fragments extracted in step S2132 into the feature space (feature vectorization) by processing according to the data format of the information fragments.
[0180] (Step S2134) In step S2134, the data preprocessing unit 111 (213) combines feature quantities for each node and each edge.
[0181] (Effects of Information Processing Device 2A) As described above, Information Processing Device 1A employs the following configuration: - Acquires target data; - Distributes and assigns multiple submodels obtained by dividing the deep learning model to multiple processors (GPU 50-1, 50-2, ...); - Executes inference processing in each processor (GPU 50-1, 50-2, ...) by inputting at least a portion of the target data into the submodel assigned to that processor; - The inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data. In this way, Information Processing Device 2 distributes and assigns multiple submodels obtained by dividing the deep learning model to multiple processors, and then executes inference processing in each processor by inputting at least a portion of the target data into the submodel assigned to that processor. Therefore, even when the model is large, it is possible to achieve both high speed and accuracy of the solution.
[0182] As an example, with the information processing device 2A configured as described above, the target deep learning model can be distributed across tens of GPUs. 7 Inference processing can be performed using a large-scale model (i.e., a high-performance model) with the above parameters.
[0183] Furthermore, as described above, in the configuration for handling the delivery planning problem, the information processing device 2A acquires map information including traffic congestion information and uses it in the solution derivation process. In this way, the information processing device 2A can also suitably handle map image data representing traffic congestion conditions.
[0184] Furthermore, as described above, in a configuration that handles delivery planning problems, the information processing device 2A may also extract edge information such as the distance and transportation costs between delivery destinations from the target data and use it in the solution derivation process. Therefore, it is possible to derive a delivery plan based not only on point information such as the distance and transportation costs between delivery destinations, but also on line information.
[0185] Furthermore, in the information processing device 2A, similar to the information processing device 1A, the text data included in the target data may be input into a machine learning-trained language model (LLM) to extract features (create feature vectors) and use them in the solution derivation process. This makes it possible to derive a delivery plan while utilizing text data that could not be used in the past, such as the address itself and the type of delivery item.
[0186] In this way, the information processing device 2A can derive more accurate solutions to delivery planning problems and production planning problems by supporting the multimodal processing of target data (training data) TD.
[0187] (Application Examples) Below, application examples of the information processing device 1A and the information processing device 2A will be described with reference to Figures 15 and 16.
[0188] (Application Example 1: Delivery Planning Problem) Figure 15 shows the case where information processing device 1A or information processing device 2A is applied to the delivery planning problem. In this example, the target data obtained is: - Coordinates of the delivery destination (numerical data) - Map including the delivery destination (image data) - Address of the delivery destination (text data) Here, the map including the delivery destination (image data) may be obtained as supplementary information as described above.
[0189] The information processing device 1A or information processing device 2A performs a learning process or inference process by referring to the above target data. It then supplies the delivery sequence information, which is the result of the inference process, to the delivery management device 3. The delivery management device 3 refers to the delivery sequence information to generate a delivery plan and issues instructions to each delivery vehicle. The delivery sequence information shown in Figure 15 may, for example, be generated by the display information generation unit 17. Alternatively, the delivery sequence information shown in Figure 15 may be displayed via the display of the input / output unit 40.
[0190] (Application Example 2: Production Planning Problem) Figure 16 shows the case where information processing device 1A or information processing device 2A is applied to a production planning problem. In this example, the target data acquired is: - Time required for each process of each task (numerical data) - Past examples (image data) - Information of each machine (machine tool) (text data) Here, past examples (image data) may be acquired as supplementary information as described above.
[0191] The information processing device 1A or information processing device 2A performs a learning process or inference process by referring to the above target data. It then supplies the order of the production process, which is the result of the inference process, to the production management device 3. The production management device 3 refers to the order of the production process, generates a production plan, and issues instructions to each production machine. The order of the production process shown in Figure 16 may, for example, be generated by the display information generation unit 17. Alternatively, the order of the production process shown in Figure 16 may be displayed via the display of the input / output unit 40.
[0192] [Example of implementation by software] Some or all of the functions of the information processing devices 1, 2, 1A, 2A (hereinafter also referred to as "each of the above devices") may be implemented by hardware such as integrated circuits (IC chips) or by software.
[0193] In the latter case, each of the above devices is implemented, for example, by a computer that executes instructions for a program, which is software that realizes each function. An example of such a computer (hereinafter referred to as computer C) is shown in Figure 17. Figure 17 is a block diagram showing the hardware configuration of computer C, which functions as each of the above devices.
[0194] Computer C comprises at least one processor C1 and at least one memory C2. Memory C2 stores a program P for operating Computer C as each of the above-mentioned devices. In Computer C, the processor C1 reads and executes the program P from memory C2, thereby realizing each of the above-mentioned devices.
[0195] For processor C1, for example, a CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating Point Number Processing Unit), PPU (Physics Processing Unit), TPU (Tensor Processing Unit), quantum processor, microcontroller, or a combination thereof can be used. For memory C2, for example, flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.
[0196] Furthermore, computer C may also be equipped with RAM (Random Access Memory) for loading program P at runtime and for temporarily storing various data. Computer C may also be equipped with a communication interface for sending and receiving data with other devices. Furthermore, computer C may also be equipped with an input / output interface for connecting input / output devices such as a keyboard, mouse, display, and printer.
[0197] Furthermore, program P can be recorded on a non-temporary, tangible recording medium M that is readable by computer C. Such recording medium M can include, for example, tape, disk, card, semiconductor memory, or programmable logic circuitry. Computer C can acquire program P via such recording medium M. Program P can also be transmitted via a transmission medium. Such transmission mediums can include, for example, a communication network or broadcast waves. Computer C can also acquire program P via such transmission medium.
[0198] Furthermore, each of the above functions of each of the above devices may be implemented by a single processor in a single computer, by multiple processors in a single computer working together, or by multiple processors in each of multiple computers working together. In addition, the programs for implementing each of the above functions in each of the above devices may be stored in a single memory in a single computer, distributed and stored in multiple memories in a single computer, or distributed and stored in multiple memories in each of multiple computers.
[0199] [Additional Notes] This disclosure includes the technologies described in the following addendums. However, the present invention is not limited to the technologies described in the following addendums, and various modifications are possible within the scope of the claims.
[0200] (Appendix A1) An information processing apparatus comprising: an acquisition means for acquiring target data; an allocation means for distributing and allocating to multiple processors a plurality of partial data obtained by dividing the target data and a plurality of partial models obtained by dividing a learning model which is the target of learning and refers to the target data; and a learning means for causing each processor to learn the partial model allocated to the processor by referring to at least one of the plurality of partial data.
[0201] (Appendix A2) The learning means is an information processing device according to Appendix A1, which performs forward propagation and backpropagation in the submodel assigned to each processor in each processor, averages the parameter gradients obtained by the backpropagation in each processor among the processors, and updates the parameters of the submodel assigned to each processor in each processor using the averaged parameter gradient.
[0202] (Note A3) The learning means is an information processing device according to Note A2, which averages the data obtained by the forward propagation process in each processor and uses the averaged data to perform the backpropagation process in each processor.
[0203] (Appendix A4) The information processing apparatus according to Appendix A2 or A3, wherein the forward propagation process in each processor includes a derivation process that references node information, edge information, and image information extracted from at least one of the plurality of partial data.
[0204] (Appendix A5) The learning means is an information processing device according to any one of the appendices A2 to A4, wherein each processor performs the solution derivation process multiple times, sets the average value of the reward obtained according to the solution derived by each of the solution derivation processes as a baseline, calculates a loss function by referring to the reward and the baseline, and performs the backpropagation process by referring to the loss function.
[0205] (Appendix A6) The target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data, as described in any one of Appendix A1 to A5.
[0206] (Appendix A7) An information processing device comprising: acquisition means for acquiring target data; allocation means for distributing and allocating a plurality of submodels obtained by dividing a learning model to a plurality of processors; and inference means for each processor to perform inference processing by inputting at least a portion of the target data to the submodel allocated to that processor, wherein the inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data.
[0207] (Appendix A8) An information processing method comprising: a computer acquiring target data; distributing and allocating a plurality of partial data obtained by dividing the target data and a plurality of partial models obtained by dividing a learning model which is the target of learning and references the target data, to a plurality of processors; and in each processor, training the partial model assigned to the processor by referencing at least one of the plurality of partial data.
[0208] (Appendix A9) An information processing method comprising: acquiring target data; distributing and assigning multiple submodels obtained by dividing a learning model to multiple processors; and performing inference processing in each processor by inputting at least a portion of the target data to the submodel assigned to that processor, wherein the inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data.
[0209] (Appendix A10) A program for causing a computer to function as an information processing device as described in Appendix A10, wherein the acquisition means, the assignment means, and the learning means are all programs for causing the computer to function.
[0210] (Appendix A11) A program for causing a computer to function as an information processing device as described in Appendix A7, wherein the acquisition means, the assignment means, and the inference means are all programs for causing the computer to function.
[0211] Some or all of the elements described in Appendices A2 to A6, which are dependent on the information processing device in Appendice A1, may also be dependent on Appendices A7, A8, and A9 in the same way as in Appendices A2 to A6. Some or all of the elements described in any appendice may be applied to various hardware, software, recording means, systems, and methods for recording software.
[0212] Although the present invention has been described above with reference to embodiments, the present invention is not limited thereto. Various modifications to the structure and details of the present invention can be made that are understandable to those skilled in the art within the scope of the invention.
[0213] This application claims priority based on Japanese Patent Application No. 2024-221047, filed on 17 December 2024, and incorporates all of its disclosures herein.
[0214] 1, 2, 1A, 2A Information processing device 100A, 200A Information processing system 11, 21 Acquisition unit (acquisition means) 12, 22 Assignment unit (assignment means) 13 Learning unit (learning means) 23 Inference unit (inference means)
Claims
1. An information processing apparatus comprising: an acquisition means for acquiring target data; an allocation means for distributing and allocating to multiple processors a plurality of partial data obtained by dividing the target data, a plurality of partial models obtained by dividing a learning model which is the target of learning and refers to the target data; and a learning means for causing each processor to learn the partial model allocated to that processor by referring to at least one of the plurality of partial data.
2. The information processing apparatus according to claim 1, wherein the learning means performs forward propagation and backpropagation in the submodel assigned to the processor in each processor, averages the parameter gradients obtained by the backpropagation in each processor among the processors, and updates the parameters of the submodel assigned to the processor in each processor using the averaged parameter gradient.
3. The information processing apparatus according to claim 2, wherein the learning means averages the data obtained by the forward propagation process in each processor, and uses the averaged data to perform the backpropagation process in each processor.
4. The information processing apparatus according to claim 2 or 3, wherein the forward propagation process in each processor includes a derivation process that references node information, edge information, and image information extracted from at least one of the plurality of partial data.
5. The information processing apparatus according to claim 2 or 3, wherein the learning means performs multiple solution derivation processes in each processor, referencing node information, edge information, and image information extracted from at least one of the plurality of partial data; sets the average value of the reward obtained according to each solution derivation process as a baseline; calculates a loss function by referencing the reward and the baseline; and performs the backpropagation process by referencing the loss function.
6. The information processing apparatus according to claim 4, wherein the learning means performs the solution derivation process multiple times in each processor, sets the average value of the rewards obtained according to the solutions derived by each solution derivation process as a baseline, calculates a loss function by referring to the rewards and the baseline, and performs the backpropagation process by referring to the loss function.
7. The information processing apparatus according to any one of claims 1 to 3, wherein the target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data.
8. The information processing apparatus according to claim 4, wherein the target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data.
9. The information processing apparatus according to claim 5, wherein the target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data.
10. The information processing apparatus according to claim 6, wherein the target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data.
11. An information processing device comprising: acquisition means for acquiring target data; allocation means for distributing and allocating multiple submodels obtained by dividing a learning model to multiple processors; and inference means for each processor to perform inference processing by inputting at least a portion of the target data to the submodel allocated to that processor, wherein the inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data.
12. The information processing apparatus according to claim 11, comprising: a learning means which in each processor performs forward propagation and backpropagation in the submodel assigned to the processor; averages the parameter gradients obtained by the backpropagation in each processor across the processors; and updates the parameters of the submodel assigned to the processor in each processor using the averaged parameter gradients.
13. The information processing apparatus according to claim 12, wherein the learning means averages the data obtained by the forward propagation process in each processor, and uses the averaged data to perform the backpropagation process in each processor.
14. The information processing apparatus according to claim 12 or 13, wherein the learning means performs the solution derivation process multiple times in each processor, sets the average value of the rewards obtained according to the solutions derived by each solution derivation process as a baseline, calculates a loss function by referring to the rewards and the baseline, and performs the backpropagation process by referring to the loss function.
15. The information processing apparatus according to any one of claims 11 to 13, wherein the target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data.
16. The information processing apparatus according to claim 14, wherein the target data includes at least one of text data and image data, and the acquisition means performs preprocessing to extract at least one of node information and edge information from the target data.
17. An information processing method comprising: a computer acquiring target data; distributing and allocating to multiple processors a plurality of partial data obtained by dividing the target data and a plurality of partial models obtained by dividing a learning model that is the target of learning and references the target data; and training each processor by referencing at least one of the plurality of partial data.
18. An information processing method comprising: acquiring target data; distributing and assigning multiple submodels obtained by dividing a learning model to multiple processors; and performing inference processing in each processor by inputting at least a portion of the target data to the submodel assigned to that processor, wherein the inference processing includes a solution derivation process that references node information, edge information, and image information extracted from at least a portion of the target data.
19. A program for causing a computer to function as an information processing device according to claim 1, wherein the acquisition means, the assignment means, and the learning means are causing the computer to function.
20. A program for causing a computer to function as an information processing device according to claim 11, wherein the acquisition means, the assignment means, and the inference means are the programs for causing the computer to function.