Information processing device, information processing method, and information processing program

By integrating external resource information to reflect hardware constraints, the apparatus generates neural network structures that meet deployment requirements, addressing the issue of suboptimal performance due to overlooked hardware limitations.

WO2026141171A1PCT designated stage Publication Date: 2026-07-02SONY GROUP CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SONY GROUP CORP
Filing Date
2025-12-19
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing neural network generation methods fail to consider hardware constraints such as FLOPS and memory size when deploying neural networks, leading to suboptimal performance on target devices.

Method used

An information processing apparatus and method that incorporates external resource information to generate a neural network structure using a network structure generation model, reflecting hardware constraints through a resource reflection query, ensuring the generated network meets the required specifications.

Benefits of technology

The solution allows for the generation of a neural network structure that adheres to hardware constraints, resulting in optimized performance on target devices.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025044443_02072026_PF_FP_ABST
    Figure JP2025044443_02072026_PF_FP_ABST
Patent Text Reader

Abstract

An information processing device according to an embodiment of the present invention comprises: a network structure generation model that generates a structure of a neural network; an acquisition unit that acquires external resource information which is information outside the network structure generation model which is used for generating the structure of the neural network; a first generation unit that generates, as a query to be inputted to the network structure generation model, a resource reflection query reflecting the external resource information by using the external resource information; and a second generation unit that inputs the resource reflection query generated by using the external resource information to the network structure generation model and causes the network structure generation model to output information indicating the structure of the neural network, thereby generating the structure of the neural network.
Need to check novelty before this filing date? Find Prior Art

Description

Information Processing Apparatus, Information Processing Method, and Information Processing Program

[0001] The present disclosure relates to an information processing apparatus, an information processing method, and an information processing program.

[0002] Techniques such as NAS (Neural Architecture Search) that automatically search for the structure of a neural network are used to provide a method for generating the structure of a neural network (for example, Non-Patent Document 1).

[0003] Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha, "Evolutionary Optimization of Model Merging Recipes", [online] [searched on November 15, 2024], Internet <URL: https: / / arxiv.org / pdf / 2403.13187>

[0004] However, there is room for improvement in the prior art. For example, in the prior art, the weights of a neural network are optimized by merging learned models, but it is not possible to obtain a neural network structure that satisfies HW (hardware) constraints such as FLOPS (Floating-point Operations Per Second) and memory size according to the target device for deployment. Therefore, it is desired to appropriately generate a neural network structure that satisfies HW constraints.

[0005] Therefore, the present disclosure proposes an information processing apparatus, an information processing method, and an information processing program that appropriately generate the structure of a neural network.

[0006] To solve the above problems, an information processing device according to this disclosure comprises: a network structure generation model for generating the structure of a neural network; an acquisition unit for acquiring external resource information, which is information outside the network structure generation model used for generating the structure of the neural network; a first generation unit that generates a resource reflection query, which reflects the external resource information, as a query to be input to the network structure generation model by using the external resource information; and a second generation unit that inputs the resource reflection query generated using the external resource information to the network structure generation model and causes the network structure generation model to output information indicating the structure of the neural network, thereby generating the structure of the neural network.

[0007] This figure shows an overview of information processing according to the embodiment of this disclosure. This figure shows an example configuration of an information processing system according to the embodiment of this disclosure. This is a flowchart of the information processing procedure according to the embodiment of this disclosure. This is a flowchart of the procedure for the first processing. This figure shows an example of processing using a parent model. This figure shows an example of a prompt. This figure shows an overview of the first processing. This figure shows an example of processing based on a feedback mechanism. This figure shows an example of acquiring external resource information based on a search. This figure shows an example of search processing including processing when there are no search results. This figure shows an example of evaluation processing of external resource information. This figure shows an example of processing using a Mutation history. This is a flowchart of the procedure for the second processing. This figure shows an overview of the second processing. This figure shows an example of processing related to network weights. This figure shows an example of learning processing. This figure shows an example of inference processing. This figure shows an example of prompts and LLM output during inference. This is a flowchart of the procedure for the third processing. This figure shows an example of the structure of a neural network. This figure shows an example of a configuration for evaluation processing. This figure shows an example of a configuration for evaluation processing. This figure shows an example of evaluation processing. This is a hardware configuration diagram showing an example of a computer that realizes the functions of an information processing device.

[0008] Embodiments of this disclosure will be described in detail below with reference to the drawings. Note that these embodiments do not limit the information processing apparatus, information processing method, and information processing program described herein. Furthermore, in each of the following embodiments, the same parts are denoted by the same reference numerals to avoid redundant explanations.

[0009] This disclosure will be described in the following order of items: 1. Embodiments 1-1. Overview of information processing according to the embodiment of this disclosure 1-2. Configuration of the information processing device according to the embodiment 1-3. Procedure for information processing according to the embodiment 1-4. Processing examples 1-4-1. First processing (NAS based on RAG) 1-4-1-1. Procedure for the first processing 1-4-1-2. Overview of the first processing 1-4-2. Second processing (NAS based on trained weights) 1-4-2-1. Procedure for the second processing 1-4-2-2. Overview of the second processing 1-4-3. Third processing (NAS based on evaluation) 1-4-3-1. Procedure for the third processing 1-4-3-2. Overview of the third processing 2. Others 3. Hardware configuration

[0010] <1. Embodiments> <1-1. Overview of Information Processing According to the Embodiments of the Disclosure> Below, we will first describe the processing overview shown in Figure 1, and then describe the device configuration and specific processing examples. Figure 1 is a diagram showing an overview of information processing according to the embodiment of the disclosure. The information processing outlined in Figure 1 is realized by the information processing system 1 shown in Figure 2. Figure 2 is a diagram showing an example of the configuration of the information processing system according to the embodiment of the disclosure. As shown in Figure 2, the information processing system 1 includes an information processing device 100 and an external resource device 200. The information processing device 100 and the external resource device 200 are connected to each other via a predetermined communication network (network N) by wired or wireless means. Although only one external resource device 200 is shown in Figure 2, the information processing system 1 may include multiple external resource devices 200 depending on the information used by the information processing device 100.

[0011] Note that the configuration of the information processing system 1 shown in Figure 2 is just one example, and any device configuration can be adopted as long as the information processing system 1 includes at least one information processing device 100. For example, the information processing system 1 may include multiple information processing devices 100. The information processing system 1 may also include a terminal device used by a user who requests processing from the information processing device 100. In this case, the information processing device 100 may receive request information indicating the user's request from the terminal device used by the user, and transmit information indicating the network structure generated based on the received request information to the terminal device used by the user.

[0012] Furthermore, the information processing device 100 may receive input from the user via an input unit 12 (described later) to acquire request information indicating the user's request, and output information indicating the network structure generated based on the acquired request information via an output unit 13 (described later). For example, the information processing device 100 may display the generated network structure information on a display device (such as a display). Also, the information processing system 1 does not necessarily have to include an external resource device 200. For example, the information processing system 1 may include only the information processing device 100, and the information processing device 100 may communicate with a device outside the information processing system 1 (such as an external resource device 200) and receive information used for processing from the device outside the information processing system 1.

[0013] The information processing device 100 is a computer that performs the process of generating the structure of a neural network (also called the "network structure generation process"). The neural network referred to here is an AI (Artificial Intelligence) model (also simply called a "model") that is learned through machine learning. Neural network can be read as "model," and model can be read as "neural network." For example, the neural network shown in the following example is a deep neural network (DNN). The neural network generated (learned) by the information processing device 100 is used for various tasks such as classification problems.

[0014] The information processing device 100 performs network structure generation processing using a network structure generation model that generates the structure of a neural network, and information used to generate the structure of the neural network, which is information outside the network structure generation model (also called "external resource information"). In Figure 1, the information processing device 100 performs network structure generation processing using the network structure generation model M1. The network structure generation model M1 can use a Large Language Model (LLM), etc.

[0015] The external resource device 200 is a computer (server device) that provides the information processing device 100 with information used for network structure generation processing. For example, the external resource device 200 is a server device that provides the information processing device 100 with information about open source software (OSS). The external resource device 200 may also be a server device that provides a software development platform such as OpenMMLab, Hugging Face, or GitHub. Alternatively, the external resource device 200 may be a device that manages internal company documents. Note that the above is merely an example, and the external resource device 200 may be any device that can provide the information used by the information processing device 100 for processing.

[0016] The following describes the processing overview shown in Figure 1. First, the software that executes the network structure generation process (also simply called "software") receives input of a model request (also called "user request") from the user (step S1). In Figure 1, request information DM1 is input to the software as a model request from the user. Request information DM1 includes the tasks, datasets, accuracy, and hardware constraints that the user requests for the model (neural network) to be generated. Hardware constraints include FLOPS (Floating-point Operations Per Second), memory size, etc. Note that request information DM1 is not limited to the above and may include various other information that the user requests for the model to be generated. For example, the information processing device 100 that executes the processing corresponding to the software in Figure 1 receives request information DM1 from the user.

[0017] The software then obtains external resource information used for network structure generation from an external resource RS (step S2). In Figure 1, the external resource RS contains a model database, etc., and provides the software with the external resource information that the software uses for network structure generation. Thus, the external resource information that the external resource RS provides to the software is information used for generating the structure of the neural network and is information outside the network structure generation model M1.

[0018] The external resource device 200 is an example of an external resource RS outside of the software. For example, the information processing device 100, which executes the processing corresponding to the software in Figure 1, sends information (e.g., a search query) requesting external resource information to the external resource device 200 and receives external resource information from the external resource device 200.

[0019] The data pool storage unit 141 of the information processing device 100 is an example of an external resource RS. For example, the information processing device 100 that executes the processing corresponding to the software in Figure 1 retrieves from the data pool storage unit 141 information to be used as external resource information from the information stored in the data pool storage unit 141.

[0020] The software then generates a prompt PT1 to be input to the network structure generation model M1 using the request information DM1 and the external resource information obtained from the external resource RS (step S3). For example, the information processing device 100, which executes the processing corresponding to the software in Figure 1, uses the external resource information obtained from the external resource RS to generate a query that reflects the external resource information (also called a "resource reflection query") as prompt PT1. For example, the information processing device 100 generates a resource reflection query as prompt PT1 that includes information based on the external resource information and an instruction statement that instructs the network structure generation model M1 to generate a network structure.

[0021] The software then inputs the generated prompt PT1 into the network structure generation model M1 (step S4). The network structure generation model M1, upon receiving prompt PT1, outputs a neural network NW1 having a structure corresponding to prompt PT1 (step S5). For example, the information processing device 100, which performs the processing corresponding to the software in Figure 1, inputs prompt PT1 into the network structure generation model M1 and causes the network structure generation model M1 to output information indicating the network structure, thereby generating the neural network NW1.

[0022] The software may then perform an evaluation process, which is the process of evaluating the generated neural network NW1 (step S6). For example, the information processing device 100 that performs the process corresponding to the software in Figure 1 performs the evaluation process and generates evaluation information RP1 that indicates the evaluation of the neural network NW1.

[0023] The software may repeat the processes in steps S2 to S6 until predetermined conditions are met, as shown by the feedback FB in Figure 1. In Figure 1, the software repeats the process by feeding back the generated neural network NW1, evaluation information RP1, etc. For example, the information processing device 100 that executes the process corresponding to the software in Figure 1 may repeat the process until the generated neural network NW1 satisfies the user's request. The information processing device 100 may terminate the repeated process if a predetermined number of repetitions is reached before the generated neural network NW1 satisfies the user's request.

[0024] The software then outputs the neural network NW2 generated by the network structure generation process described above (step S7). In Figure 1, the software outputs a neural network NW2 with a new structure that matches the user's requirements. For example, the information processing device 100 that performs the processing corresponding to the software in Figure 1 outputs the neural network NW2 generated by the network structure generation process described above to the user.

[0025] As described above, the information processing device 100 executes network structure generation processing using external resource information obtained from the external software resource RS, and generates a neural network structure that reflects the external resource information. That is, the information processing device 100 can execute a NAS that can take in external resource information (such as DNN model knowledge) obtained from the external software resource RS. As a result, the information processing device 100 can generate a neural network structure that takes into account external resource information that may be unknown to the network structure generation model M1. In this way, the information processing device 100 can generate a neural network structure that takes into account external resource information, and therefore can generate a neural network structure appropriately.

[0026] <1-2. Configuration of the Information Processing Device According to the Embodiment> Next, the configuration of the information processing device 100, which is an example of an information processing device that performs information processing according to the embodiment, will be described with reference to Figure 2. For example, the information processing device 100 shown in Figure 2 is an example of an information processing device.

[0027] As shown in Figure 2, the information processing device 100 includes a communication unit 11, an input unit 12, an output unit 13, a storage unit 14, and a control unit 15. In the example in Figure 2, the information processing device 100 includes an input unit 12 (for example, a keyboard or mouse) that receives various operations from the administrator of the information processing device 100, and an output unit 13 (for example, a liquid crystal display) that outputs (displays, etc.) various information.

[0028] The communication unit 11 is implemented, for example, by a NIC (Network Interface Card) or a communication circuit. The communication unit 11 is connected to a network N (such as the Internet) by wire or wireless connection and transmits and receives information with other devices via the network N.

[0029] The input unit 12 receives various operations from the user. The input unit 12 accepts input from the user. The input unit 12 accepts input from the user of information used to generate a model. The input unit 12 may accept various operations from the user via a keyboard, mouse, or touch panel provided on the information processing device 100.

[0030] The output unit 13 outputs various types of information. The output unit 13 has a display device (display unit) such as a display and displays various types of information. The output unit 13 displays the information generated by the first generation unit 153. The output unit 13 displays the information generated by the second generation unit 154. The output unit 13 displays the structure of the neural network generated by the second generation unit 154.

[0031] Furthermore, the output unit 13 may have functions to output information in various ways, not limited to display functions. For example, the output unit 13 may have a function to output information as sound. For example, the output unit 13 may have an audio output unit such as a speaker that outputs sound.

[0032] The storage unit 14 is implemented by, for example, semiconductor memory elements such as RAM (Random Access Memory) and flash memory, or by storage devices such as hard disks and optical discs. The storage unit 14 has a data pool storage unit 141.

[0033] The data pool storage unit 141 according to this embodiment stores information used to generate the structure of a neural network. The data pool storage unit 141 stores information used in past network structure generation processes. For example, the data pool storage unit 141 stores information used in past network structure generation processes as information that can be used in subsequent network structure generation processes. For example, the data pool storage unit 141 stores information used in past network structure generation processes as evaluated information.

[0034] Furthermore, the data pool storage unit 141 may store various types of information depending on the purpose, not limited to those described above.

[0035] Furthermore, the memory unit 14 stores various other types of information besides those mentioned above. For example, the memory unit 14 stores information generated by the first generation unit 153. The memory unit 14 also stores queries generated by the first generation unit 153.

[0036] The memory unit 14 acquires information obtained from the information processing performed by the information processing device 100. For example, the memory unit 14 stores information generated by the second generation unit 154. The memory unit 14 stores information indicating the structure of the neural network generated by the second generation unit 154. For example, the memory unit 14 stores the model used in the processing. The memory unit 14 stores the model used to generate the structure of the neural network. The memory unit 14 stores the LLM used to generate the structure of the neural network. The memory unit 14 stores the network structure generation model M1, which is an LLM used to generate the structure of the neural network. The memory unit 14 stores the weight analysis model M21. The memory unit 14 stores the evaluator (evaluation model) that performs the evaluation of the model.

[0037] Furthermore, for example, the memory unit 14 stores information used in the first process described later. For example, the memory unit 14 stores a function used in the first process. For example, the memory unit 14 stores a determination condition used in the first process. For example, the memory unit 14 stores information used in the second process described later. For example, the memory unit 14 stores a function used in the second process. For example, the memory unit 14 stores a determination condition used in the second process. For example, the memory unit 14 stores information used in the third process described later. For example, the memory unit 14 stores a function used in the third process. For example, the memory unit 14 stores a determination condition used in the third process.

[0038] The control unit 15 is implemented, for example, by a CPU (Central Processing Unit) or MPU (Micro Processing Unit) executing a program (for example, the information processing program according to this disclosure) stored inside the information processing device 100 using RAM (Random Access Memory) or the like as a working area. The control unit 15 is also a controller and may be implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).

[0039] As shown in Figure 2, the control unit 15 includes an acquisition unit 151, an evaluation unit 152, a first generation unit 153, a second generation unit 154, and a transmission unit 155, and realizes or executes the information processing functions and operations described below. Note that the internal configuration of the control unit 15 is not limited to the configuration shown in Figure 2, and other configurations are also acceptable as long as they perform the information processing described later.

[0040] The acquisition unit 151 acquires various types of information. The acquisition unit 151 acquires various types of information from an external information processing device. The acquisition unit 151 acquires various types of information from the storage unit 14. The acquisition unit 151 acquires information received by the input unit 12.

[0041] The acquisition unit 151 acquires various information from the storage unit 14. The acquisition unit 151 acquires data from the data pool storage unit 141. The acquisition unit 151 acquires a model used for processing from the storage unit 14. The acquisition unit 151 acquires an LLM from the storage unit 14. The acquisition unit 151 acquires an LLM used for generating the structure of a neural network.

[0042] The acquisition unit 151 acquires the evaluation result by the evaluation unit 152 stored in the storage unit 14 from the storage unit 14. The acquisition unit 151 acquires the information generated by the first generation unit 153 and stored in the storage unit 14 from the storage unit 14. The acquisition unit 151 acquires the information generated by the second generation unit 154 and stored in the storage unit 14 from the storage unit 14. The acquisition unit 151 acquires the information indicating the network structure generated by the second generation unit 154 stored in the storage unit 14 as parent model information.

[0043] The acquisition unit 151 acquires a network structure generation model for generating the structure of a neural network. The acquisition unit 151 acquires external resource information, which is information other than the network structure generation model used for generating the structure of a neural network.

[0044] The acquisition unit 151 acquires parent model information, which is information on the source neural network that is the source of the structure of a neural network. The acquisition unit 151 acquires information regarding a neural network corresponding to a user's request as external resource information from an external device.

[0045] The acquisition unit 151 acquires parent model information indicating a source neural network that satisfies the user's request for a neural network. The acquisition unit 151 acquires parent model information indicating a source neural network that satisfies the specifications requested by the user for a neural network.

[0046] The acquisition unit 151 acquires, as parent model information, a first neural network having the structure of the neural network generated by the second generation unit 154 as the source neural network. The acquisition unit 151 acquires information regarding the weights of a learned model as external resource information.

[0047] The acquisition unit 151 acquires parent model information, which is information on the first neural network having the structure of the neural network generated by the second generation unit 154. The acquisition unit 151 acquires evaluation information indicating the evaluation of the structure of the first neural network by the evaluation unit 152.

[0048] The evaluation unit 152 executes an evaluation process for evaluating various targets. The evaluation unit 152 executes the evaluation process based on the information acquired by the acquisition unit 151. The evaluation unit 152 executes the evaluation process based on the information stored in the storage unit 14.

[0049] The evaluation unit 152 evaluates the structure of the evaluation target neural network, which is the neural network generated by the second generation unit 154. After changing the structure of the evaluation target neural network, the evaluation unit 152 evaluates the structure of the evaluation target neural network using another evaluation neural network. After adding components to the evaluation target neural network, the evaluation unit 152 evaluates the structure of the evaluation target neural network using another evaluation neural network. By having such a configuration, the information processing apparatus 100 can appropriately evaluate the structure of the neural network.

[0050] The evaluation unit 152 evaluates the structure of the evaluation target neural network using the changed neural network obtained by removing some components from the evaluation target neural network. The evaluation unit 152 evaluates the structure of the evaluation target neural network based on the comparison between the first index value corresponding to the evaluation target neural network and the second index value corresponding to the changed neural network.

[0051] The first generation unit 153 performs generation processing to generate various types of information. The first generation unit 153 generates various types of information based on the information acquired by the acquisition unit 151. The first generation unit 153 generates various types of information based on the information stored in the storage unit 14. The first generation unit 153 generates various types of information based on the evaluation results from the evaluation unit 152. The first generation unit 153 generates queries. The first generation unit 153 generates queries that are used as prompts to be input to the LLM.

[0052] The first generation unit 153 generates a resource reflection query that reflects the external resource information by using the external resource information, as a query to be input to the network structure generation model. The first generation unit 153 generates the resource reflection query as a prompt to be input to the network structure generation model, which is a large-scale language model (LLM).

[0053] The first generation unit 153 generates a resource reflection query that reflects the parent model information and external resource information by using the parent model information and external resource information. The first generation unit 153 generates a resource reflection query using information about the neural network corresponding to the user's request.

[0054] The first generation unit 153 generates a resource reflection query that reflects the user's request by using parent model information and external resource information. The first generation unit 153 generates a resource reflection query that reflects the specifications indicated by the parent model information. The first generation unit 153 generates a resource reflection query that reflects the first neural network by using parent model information indicating the first neural network and external resource information. With this configuration, the information processing device 100 can generate a neural network structure that corresponds to the user's request, and thus can appropriately generate the neural network structure.

[0055] The first generation unit 153 generates resource reflection queries that reflect the weights of the trained model by using external resource information. The first generation unit 153 generates resource reflection queries that reflect the weights of the trained model by using weight analysis information obtained by analyzing the weights of the trained model indicated by the external resource information. With this configuration, the information processing device 100 can generate the structure of a neural network using weight analysis information obtained by analyzing the weights of the model, and thus can appropriately generate the structure of a neural network.

[0056] The first generation unit 153 takes information of a trained model including weights as input and generates weight analysis information using an analysis model that outputs information showing the analysis results regarding the weights of the trained model. Using the weight analysis information generated by the analysis model, it generates a resource reflection query that reflects the weights of the trained model.

[0057] The first generation unit 153 generates resource reflection queries that reflect the weights of the trained model by using weight analysis information obtained by analyzing the weights of each component included in the trained model indicated by the external resource information. The first generation unit 153 generates resource reflection queries that reflect the first neural network by using parent model information indicating the first neural network and the external resource information.

[0058] The first generation unit 153 uses evaluation information to generate resource reflection queries that reflect the evaluation of the structure of the first neural network by the evaluation unit 152.

[0059] The first generation unit 153 may generate various types of information. The first generation unit 153 may generate various types of information to be displayed. For example, the first generation unit 153 may generate content that includes information indicating the structure of the generated network. In this case, the first generation unit 153 generates screen-related information (images) using various conventional technologies related to images as appropriate. The first generation unit 153 generates images using various conventional technologies related to GUIs as appropriate. For example, the first generation unit 153 may generate images using CSS (Cascading Style Sheets), JavaScript (registered trademark), HTML (HyperText Markup Language), or any language capable of describing the information processing such as the information display and operation reception described above.

[0060] The second generation unit 154 performs generation processing to generate various types of information. The second generation unit 154 generates various types of information based on the information acquired by the acquisition unit 151. The second generation unit 154 generates various types of information based on the information stored in the storage unit 14. The second generation unit 154 generates various types of information based on the evaluation results from the evaluation unit 152. The second generation unit 154 generates various types of information based on the information generated by the first generation unit 153.

[0061] The second generation unit 154 generates the neural network structure by inputting resource reflection queries, which are generated using external resource information, into a network structure generation model and causing the network structure generation model to output information indicating the structure of the neural network. The second generation unit 154 generates the neural network structure by inputting resource reflection queries, which are prompts generated using external resource information, into a network structure generation model such as a large-scale language model (LLM). With this configuration, the information processing device 100 can generate the neural network structure using external resource information, and therefore can appropriately generate the neural network structure.

[0062] The second generation unit 154 generates a neural network structure that has changed from the original neural network by inputting a resource reflection query generated using parent model information and external resource information into the network structure generation model. With this configuration, the information processing device 100 can generate a neural network structure using parent model information and external resource information, and therefore can appropriately generate the neural network structure. The second generation unit 154 generates a neural network structure that reflects the user's request by inputting a resource reflection query generated using parent model information and external resource information into the network structure generation model. With this configuration, the information processing device 100 can generate a neural network structure that corresponds to the user's request, and therefore can appropriately generate the neural network structure.

[0063] The second generation unit 154 generates a neural network structure that reflects the specifications requested by the user by inputting a resource reflection query that reflects the specifications into the network structure generation model. With this configuration, the information processing device 100 can generate a neural network structure that satisfies the specifications requested by the user, and thus can appropriately generate the neural network structure. The second generation unit 154 generates a second neural network structure that has changed from the first neural network by inputting a resource reflection query that reflects the first neural network into the network structure generation model. With this configuration, the information processing device 100 can repeatedly process the generation of neural network structures, and thus can appropriately generate the neural network structure.

[0064] The second generation unit 154 generates a neural network structure that reflects the weights of the trained model by inputting a resource reflection query generated using external resource information into the network structure generation model. With this configuration, the information processing device 100 can generate a neural network structure using information about the weights of the trained model, and thus can appropriately generate the neural network structure. The second generation unit 154 generates a neural network structure that reflects the weights of each component included in the trained model by inputting a resource reflection query generated using external resource information into the network structure generation model. With this configuration, the information processing device 100 can generate a neural network structure using weight analysis information in which the weights of each component included in the model have been analyzed, and thus can appropriately generate the neural network structure.

[0065] The second generation unit 154 generates the structure of the second neural network, which is a modified version of the first neural network, by inputting a resource reflection query that reflects the first neural network into the network structure generation model. The second generation unit 154 generates the structure of the second neural network, which is a modified version of the first neural network, by inputting a resource reflection query that reflects the evaluation of the structure of the first neural network by the evaluation unit 152 into the network structure generation model. With this configuration, the information processing device 100 can repeatedly process the generation of neural network structures, and thus can appropriately generate neural network structures. The second generation unit 154 generates the neural network structure based on resource reflection queries generated using external resource information. For example, the second generation unit 154 functions as a proposal unit that makes various suggestions. In addition to the neural network, the second generation unit 154 proposes initial values ​​for weights suitable for the neural network. The second generation unit 154 makes suggestions by instructing the transmission unit 155 to transmit various information to the transmission unit 155. The second generation unit 154 may also make suggestions by instructing the output unit 13 to output various information to the output unit 13. For example, the second generation unit 154 may make suggestions by instructing the display unit of the output unit 13 to display various information on the display unit of the output unit 13.

[0066] The transmitting unit 155 transmits various information. The transmitting unit 155 provides various information to an external information processing device. The transmitting unit 155 transmits various information to an external information processing device. The transmitting unit 155 transmits information stored in the storage unit 14. The transmitting unit 155 transmits information stored in the data pool storage unit 141. Based on resource reflection queries generated using external resource information, the transmitting unit 155 proposes initial values ​​for the weights of the neural network structure in accordance with the proposed neural network structure. In addition to the neural network, the transmitting unit 155 proposes initial values ​​for weights suitable for the neural network.

[0067] The transmitting unit 155 transmits the information generated by the first generation unit 153. The transmitting unit 155 transmits the query generated by the first generation unit 153 to an external device. The transmitting unit 155 transmits the information generated by the second generation unit 154. The transmitting unit 155 transmits information indicating the structure of the neural network generated by the second generation unit 154 to an external device that performs inference processing using the model.

[0068] <1-3. Information Processing Procedure According to the Embodiment> The information processing procedure according to the embodiment will be described below. First, the information processing procedure according to the embodiment will be described using Figure 3. Figure 3 is a flowchart of the information processing procedure according to the embodiment of this disclosure.

[0069] As shown in Figure 3, the information processing device 100 acquires a network structure generation model for generating the structure of a neural network (step S101). For example, the information processing device 100 acquires the network structure generation model M1 from the storage unit 14.

[0070] Furthermore, the information processing device 100 acquires external resource information, which is information outside the network structure generation model used to generate the neural network structure (step S102). For example, the information processing device 100 acquires external resource information from an external resource device 200 or a storage unit 14 (such as a data pool storage unit 141).

[0071] Then, the information processing device 100 uses the external resource information to generate a resource reflection query that reflects the external resource information, which is then input to the network structure generation model as a query (step S103). For example, the information processing device 100 generates a resource reflection query that reflects the external resource information as a prompt to input to the network structure generation model M1.

[0072] Then, the information processing device 100 inputs a resource reflection query generated using external resource information into the network structure generation model, and causes the network structure generation model to output information indicating the structure of the neural network, thereby generating the structure of the neural network (step S104). For example, the information processing device 100 inputs a resource reflection query that reflects external resource information into the network structure generation model M1, and causes the network structure generation model M1 to output information indicating the structure of the neural network, thereby generating the structure of the neural network.

[0073] <1-4. Processing Examples> From here, based on the above-mentioned premises, the various processes executed by the information processing device 100 will be described. The information processing device 100 will execute at least one of the following first to third processes. Note that the information processing device 100 may also execute a process that combines the first to third processes. Furthermore, explanations of points that are the same as those explained in Figure 1, etc., will be omitted as appropriate.

[0074] <1-4-1. First Processing (NAS based on RAG)> First, the first processing performed by the information processing device 100 will be described. The information processing device 100 performs network structure generation processing using NAS based on RAG (Retrieval Augmented Generation) as the first processing.

[0075] First, before explaining the first process, let's briefly explain the RAG technology. RAG is a technology that combines external information retrieval with generation processing by LLM (Large-Scale Language Model) in order to improve the output accuracy of LLM. For example, Self-RAG is one example of a RAG method. In Self-RAG, for example, the degree of relevance between the query and the retrieved document is evaluated, and the extent to which the sentence generated from the retrieved document contains the content of the document is evaluated, and feedback is repeated.

[0076] Furthermore, one example of a RAG-related technique is Corrective-RAG (CRAG). In CRAG, for example, the content of documents obtained through RAG is evaluated to determine whether it is correct (relevant) to the query, and if the document content is incorrect or ambiguous, the original query is rewritten and a web search is performed. By utilizing metadata of external resources through such RAG technology, it is possible to improve the output accuracy of LLM.

[0077] <1-4-1-1. Procedure for the First Processing> For example, the information processing device 100 performs the first processing as shown below. The procedure for the first processing will be explained below. The procedure for the first processing will be explained using Figure 4. Figure 4 is a flowchart of the procedure for the first processing.

[0078] The information processing device 100 acquires parent model information, which is information about the source neural network from which the structure of the neural network is generated (step S201). For example, if it is the start of optimization, the information processing device 100 acquires a predefined network structure or a network structure close to the user's request information obtained from an open-source library as parent model information. For example, if it is in the middle of optimization, the information processing device 100 acquires the user's request information or information indicating a previously generated network structure as parent model information.

[0079] Then, the information processing device 100 uses the parent model information and the external resource information to generate a resource reflection query that reflects the parent model information and the external resource information (step S202). For example, the information processing device 100 generates a resource reflection query that reflects the parent model information and the external resource information as a prompt to input to the network structure generation model M1.

[0080] Then, the information processing device 100 generates a neural network structure that has changed from the original neural network by inputting a resource reflection query generated using the parent model information and the external resource information into the network structure generation model (step S203). For example, the information processing device 100 generates a neural network structure that has changed from the original neural network by inputting a resource reflection query that reflects the parent model information and the external resource information into the network structure generation model M1.

[0081] The information processing device 100 then repeats steps S201 to S203 until it satisfies the user's hardware constraint requirements, generating an optimized neural network structure. In Figure 4, if the generated structure does not satisfy the hardware requirements (step S204: No), the information processing device 100 returns to step S201 and repeats the process. If the generated structure satisfies the hardware requirements (step S204: Yes), the information processing device 100 terminates the process.

[0082] <1-4-1-2. Overview of the First Process> From here, we will explain the overview of the first process. First, we will explain the overview of the process using the parent model using Figure 5. Figure 5 is a diagram showing an example of the process using the parent model. Note that explanations of points that are the same as those described above will be omitted as appropriate.

[0083] The information processing device 100 acquires a model to be used as the parent model. For example, at the start of optimization, the information processing device 100 acquires candidate models from the external resource device 200 or storage unit 14 that correspond to the user's requirements, such as tasks and hardware constraints. For example, during optimization, the information processing device 100 learns from multiple models selected from the candidate models and selects a parent model based on the evaluation results.

[0084] In Figure 5, the information processing device 100 selects s ​​models from the candidate models. For example, the information processing device 100 selects three neural network models NW11, NW12, and NW13 as candidate parent models.

[0085] In Figure 5, the information processing device 100 performs a learning process on s selected models (candidate parent models) and then performs an evaluation process. For example, the information processing device 100 performs a learning process on each candidate parent model and then performs an evaluation process by measuring the performance of the candidate parent model on the task after the learning process.

[0086] In Figure 5, the information processing device 100 selects the model with the best performance. For example, the information processing device 100 selects the model with the best performance from among several candidate parent models as the parent model. The information processing device 100 selects the neural network NW11, which has the best performance among the neural networks NW11, NW12, and NW13, as the parent model.

[0087] In Figure 5, the information processing device 100 generates a new model from the selected parent model. For example, the information processing device 100 generates a new model from the selected parent model by inputting a prompt to the LLM that includes information indicating the selected parent model and information indicating the specifications of the model to be proposed by the LLM. For example, the information processing device 100 generates a new model from the selected parent model by inputting a prompt PT1, as shown in Figure 6, to the network structure generation model M1. Figure 6 is a diagram showing an example of a prompt.

[0088] In Figure 6, prompt PT1 includes an example of a parent model specification and implementation pair, the specification of the model to be proposed by the LLM, and the first line of the model implementation to be generated (answered) by the LLM (e.g., class definition). In Figure 6, the first area PP1 in prompt PT1 corresponds to the example of a parent model specification and implementation pair. Also in Figure 6, the second area PP2 in prompt PT1 corresponds to the specification of the model to be proposed by the LLM.

[0089] Furthermore, the prompt PT1 may include not only the string (program code) shown in Figure 6, but also information that provides specific instructions to the LLM, such as "Please propose a model that satisfies the requirements, referring to the example of parent model specification and implementation pairs." The prompt PT1 may also include external resource information, which will be discussed later.

[0090] In Figure 5, the information processing device 100 generates a new model, the neural network NW20, from the parent model, the neural network NW11. For example, the information processing device 100 generates a new model from the parent model using the network structure generation model M1, which is an LLM (Long Life Model) for the Mutation operation of evolutionary computation. For example, the information processing device 100 inputs a prompt containing information indicating the neural network NW11 into the network structure generation model M1, causing the network structure generation model M1 to output the neural network NW20, which has a modified structure from the neural network NW11.

[0091] In Figure 5, the information processing device 100 adds the newly generated model to the candidate models, removes the oldest model from the candidate models, and repeats the process. The information processing device 100 adds the newly generated neural network NW20 to the candidate models, removes the oldest neural network NW14 from the candidate models, and repeats the process. The information processing device 100 may terminate the process if the neural network NW20 satisfies the criteria indicated by the request information.

[0092] Furthermore, since the LLM may not have knowledge of the number of parameters or accuracy for a particular model and may not be able to correctly answer the model after mutating, the information processing device 100 performs network structure generation processing using external resource information. For example, as shown in Figure 7, the information processing device 100 generates prompts to be input to the LLM using resource information. Figure 7 is a diagram showing an overview of the first process. Note that explanations of points similar to those described above will be omitted as appropriate. For example, the process shown in Figure 7 is applied to the part in Figure 5 where neural network NW20 is generated from neural network NW11 (generation of a new model).

[0093] The information processing device 100 obtains external resource information from the external software resource RS, as shown in the Retriever in Figure 7. For example, the information processing device 100 obtains external resource information from the external software resource RS using information such as that of a candidate parent model. The information processing device 100 obtains external resource information by searching the external software resource RS using information such as that of a candidate parent model. The information processing device 100 generates a prompt using the information of a candidate parent model and the external resource information.

[0094] In Figure 7, the information processing device 100 generates a prompt containing information about a document obtained by searching the external software resource RS, and information about the parent model. The information processing device 100 inputs the generated prompt into the LLM, causing the LLM to output a new model candidate with a modified structure from the parent model. For example, the information processing device 100 inputs the generated prompt into the network structure generation model M1, thereby generating a new model with a modified structure from the parent model.

[0095] The following are some of the challenges when using RAG with a NAS. For example, the information written in documents or web pages retrieved by RAG (corresponding to Retriever in Figure 7) may not necessarily match the characteristics of the hardware being searched. Also, there may be a gap between the information on the actual device and the information in the documents. For this reason, the information processing device 100 may introduce a feedback mechanism to check the search results and prompts, as shown in Figure 8. Figure 8 is a diagram showing an example of processing based on the feedback mechanism. Note that explanations of points that are the same as those described above will be omitted as appropriate.

[0096] In the process shown in Figure 8, the information processing device 100 introduces a dual feedback mechanism. The first feedback in Figure 8 is model generation feedback. The information processing device 100 performs model generation feedback if the generated new model candidate does not match the target specifications. For example, if the generated new model candidate does not match the target specifications, the information processing device 100 generates a prompt with the new model candidate as the parent model and uses that prompt to generate a new model. For example, the target specifications may be the precision indicated by the user's request information, hardware constraints, etc.

[0097] Furthermore, if the generated new model candidate matches the target specifications, the information processing device 100 may repeat the Mutation process with the new model candidate as the parent model candidate. For example, if the generated new model candidate matches the target specifications, the information processing device 100 may add that model to the parent model candidate and repeat the process.

[0098] Furthermore, the second feedback in Figure 8 is the feedback for the process shown in Retriever in Figure 8, namely the acquisition of external resource information. The information processing device 100 acquires external resource information by searching for the external software resource RS using information on the parent model candidate (candidate parent model), target specification information, etc.

[0099] The information processing device 100 then uses an evaluator to perform a determination process to determine whether the retrieved document matches the characteristics of the target hardware. For example, the information processing device 100 uses an evaluator that takes model information and information about the characteristics of the target hardware as input and outputs information indicating whether the model satisfies the characteristics of the target hardware to perform the determination process. The evaluator may use an evaluation model based on an evaluation process as shown in the third process.

[0100] If the acquired external resource information does not match the characteristics of the target hardware (target specifications, etc.), the information processing device 100 will execute the process of searching for the external resource RS again and perform a re-search. For example, when the information processing device 100 performs a re-search, it may use different information than the previous search to search for the external resource RS. For example, when the information processing device 100 performs a re-search, it may use different random numbers than the previous search to search for the external resource RS. Also, if the search process reaches a predetermined number of times, the information processing device 100 may terminate the acquisition of external resource information.

[0101] The information processing device 100 outputs search results if the acquired external resource information matches the characteristics (target specifications, etc.) of the target hardware. The information processing device 100 then saves the document in the data pool. For example, if the acquired external resource information matches the characteristics (target specifications, etc.) of the target hardware, the information processing device 100 uses the acquired external resource information to generate a prompt and registers the acquired external resource information in the storage unit 14 (data pool storage unit 141, etc.).

[0102] Here, an example of the process shown in the Retriever in Figure 8 (acquisition of external resource information) will be explained using Figure 9. Figure 9 is a diagram showing an example of acquiring external resource information based on a search. In Figure 9, the information processing device 100 extracts relevant documents from sources such as ablation studies in academic papers and open-source model zoos by searching for external resources RS.

[0103] The information processing device 100, in the process of searching for external software resources RS, determines (evaluates) whether the extracted information is related to the model's specifications. For example, the information processing device 100 searches based on data type, such as table data, and determines whether the extracted information contains descriptions of FLOPS, memory, etc. The information processing device 100 also determines (evaluates) the degree of relevance to the parent model's code in the process of searching for external software resources RS. For example, the information processing device 100 determines (evaluates) the degree of relevance to the parent model's code based on embedding distance, etc.

[0104] Furthermore, as shown in Figure 10, the information processing device 100 may perform additional processing if no search results are found. Figure 10 is a diagram showing an example of search processing that includes processing when no search results are found. In the process of searching the external resource RS, if search results are found, the information processing device 100 uses the search results as documents that match the query to generate a prompt. The information processing device 100 may also evaluate the search results using an evaluator.

[0105] When the information processing device 100 searches for an external resource RS, if no search results are found, it manually creates a document. For example, if no search results are found, the information processing device 100 may request the administrator of the information processing device 100 to create the external resource information. Alternatively, if no search results are found, the information processing device 100 may request the source of the network structure generation request (user, etc.) to create the external resource information.

[0106] The information processing device 100 may also perform document creation automatically. In this case, the information processing device 100 may create a model that is a slightly modified version of the parent model using a rule-based approach. The information processing device 100 may then perform evaluation processing on the modified version of the parent model. Furthermore, the information processing device 100 may use the created document as a document that matches the query to generate a prompt.

[0107] The information processing device 100 then saves the results in a data pool. For example, the information processing device 100 registers the created document as external resource information in the storage unit 14 (data pool storage unit 141, etc.).

[0108] Furthermore, the information processing device 100 may perform an evaluation of the retrieved documents, as shown in Figure 11. Figure 11 is a diagram showing an example of the evaluation process for external resource information. When the source is a data pool, the information processing device 100 outputs the search results without evaluation. For example, when the source of the external resource information is the storage unit 14 of the device (data pool storage unit 141, etc.), the information processing device 100 uses the search results to generate a prompt without evaluation.

[0109] On the other hand, if the source is not a data pool, the information processing device 100 evaluates whether the document matches the characteristics of the target hardware. For example, the information processing device 100 may perform the evaluation using a learning-based evaluator as described above, or it may perform the evaluation using a rule-based method. For example, when the information processing device 100 performs the evaluation using a rule-based method, it creates a model code based on the retrieved document and evaluates whether the document matches the characteristics of the target hardware by comparing the specifications written in the document with the specifications of the actual model. Then, the information processing device 100 saves the document to the data pool or performs a re-search depending on the evaluation result. For example, if the information processing device 100 evaluates that the document matches the characteristics of the target hardware, it saves the document to the data pool. For example, if the information processing device 100 evaluates that the document does not match the characteristics of the target hardware, it performs a re-search process.

[0110] Furthermore, the information processing device 100 may use the Mutation results (history) in the network structure generation process, as shown in Figure 12. Figure 12 is a diagram showing an example of processing using the Mutation history. In addition to the information of the parent model and the external resource information obtained from the external software resource RS, the information processing device 100 may also use the Mutation history information labeled "Mutation history DB" in Figure 12. For example, the Mutation history information is stored in the storage unit 14. The Mutation history information includes information such as the parent model, mutation content, FLOPS reduction amount, and the magnitude of accuracy improvement for each new model generation.

[0111] As shown in Figure 12, the information processing device 100 uses the Mutation history to generate prompts. For example, the information processing device 100 may add the Mutation history to the prompt generated based on the parent model information and external resource information. This is expected to result in changes that bring the network structure generated by the information processing device 100 closer to the target specifications.

[0112] Conventional NAS systems have not properly utilized RAG, but the information processing device 100 described above can apply RAG to the NAS and generate an appropriate network structure. Furthermore, while Self-RAG and CRAG described above evaluate the relevance between queries and documents, the information processing device 100 also evaluates whether the content of the documents matches the characteristics of the target hardware. In addition, the information processing device 100 evaluates whether the results can be used not only at each mutation but also in mutations that occur after that process. Moreover, the information processing device 100 pools and utilizes documents that match the target hardware. As a result, the information processing device 100 can partially skip searches.

[0113] <1-4-2. Second Process (NAS Based on Learned Weights)> The information processing device 100 may perform network structure generation processing using various processes, not limited to the first process described above. For example, the information processing device 100 may perform network structure generation processing using learned weight information (also simply called "weights"). This point will be explained as the second process. The information processing device 100 performs network structure generation processing using NAS based on learned weights as the second process. Note that explanations of similar points to the first process, etc., described above will be omitted as appropriate.

[0114] <1-4-2-1. Procedure for the Second Processing> For example, the information processing device 100 performs the second processing as shown below. The procedure for the second processing will be explained below. The procedure for the second processing will be explained using Figure 13. Figure 13 is a flowchart of the procedure for the second processing.

[0115] The information processing device 100 acquires information regarding the weights of the trained model as external resource information (step S301). For example, the information processing device 100 acquires information regarding the weights of the trained model from an external resource device 200 or a storage unit 14 (such as a data pool storage unit 141).

[0116] Then, the information processing device 100 generates a resource reflection query that reflects the weights of the trained model by using external resource information (step S302). For example, the information processing device 100 generates a resource reflection query that reflects the weights of the trained model as a prompt to input to the network structure generation model M1.

[0117] Then, the information processing device 100 inputs a resource reflection query generated using external resource information into the network structure generation model, thereby generating a neural network structure that reflects the weights of the trained model (step S303). For example, the information processing device 100 inputs a resource reflection query that reflects the weights of the trained model into the network structure generation model M1, thereby generating a neural network structure that reflects the weights of the trained model.

[0118] <1-4-2-2. Overview of the Second Process> From here, we will explain the overview of the second process. First, we will explain the overview of the process using the trained weights using Figure 14. Figure 14 is a diagram illustrating the overview of the second process. Note that explanations of points that are the same as those described above will be omitted as appropriate.

[0119] The information processing device 100 acquires existing trained models. For example, the information processing device 100 acquires information about a trained neural network whose weights have already been learned through a training process. The information processing device 100 receives information about trained neural networks from the external resource device 200. For example, the information processing device 100 transmits information specifying the structure to the external resource device 200 and receives information about a trained neural network having that structure from the external resource device 200. If the information processing device 100 has already registered information about a trained neural network in the storage unit 14, it acquires information about the trained neural network from the storage unit 14.

[0120] In Figure 14, the information processing device 100 acquires information about the trained neural network NW21, which is composed of layers L11, L12, and L13. The information about the trained neural network NW21 includes the network structure and trained weights for the trained neural network NW21. The information processing device 100 also acquires information about the trained neural network NW22, which is composed of layers L21, L22, and L23. For example, the information about the trained neural network NW22 includes the network structure and trained weights for the trained neural network NW22.

[0121] The information processing device 100 then analyzes the weights of the trained neural network using a weight analysis model. In Figure 14, the information processing device 100 analyzes the weights of the trained neural network using the weight analysis model M21 and generates information (also called "weight analysis information") that shows the results of the weight analysis of the trained neural network. The information processing device 100 inputs the information of the trained neural network into the weight analysis model M21 (step S21) and causes the weight analysis information of the trained neural network to be output by the weight analysis model M21 (step S22).

[0122] As a result, the information processing device 100 generates weight analysis information for the trained neural network. In Figure 14, the information processing device 100 generates weight analysis information WD21 using the weight analysis model M21. The weight analysis information WD21 includes information such as the information loss, identity conversion rate, weight variance, and function of each layer of the trained neural network NW21 and the trained neural network NW22. In Figure 14, only the information loss, identity conversion rate, weight variance, and function of layers L11 and L12 of the trained neural network NW21 are shown, but the weight analysis information WD21 also includes the information loss, identity conversion rate, weight variance, and function of layers L13 and L21-L23.

[0123] Note that the weight analysis information WD21 in Figure 14 is merely an example, and the weight analysis information is not limited to the information shown in WD21 in Figure 14, but can be any kind of information that can be used in the network structure generation process. For example, the weight analysis information is not limited to information that humans can understand, but can also be information that humans cannot understand, such as vector information. Furthermore, the weight analysis model M21 may be composed of a neural network, or it may be composed of a fixed function that calculates the variance of the weights.

[0124] The information processing device 100 generates a new model (structure) using the generated weight analysis information. For example, the information processing device 100 generates a new model from the weight analysis information by inputting the generated weight analysis information into the LLM. Note that the input to the LLM is not limited to the weight analysis information itself, but may also be tokens of the neural network (each layer, etc.). Tokens may be generated based on the weight analysis information. For example, the information processing device 100 generates a new model that reflects the weight analysis information by inputting the generated weight analysis information and a prompt containing information instructing the generation of a new structure into the LLM.

[0125] In Figure 14, the information processing device 100 generates a new model structure using the network structure generation model M1. The information processing device 100 inputs the generated weight analysis information and a prompt containing information instructing the generation of a new structure to the network structure generation model M1 (step S23), causing the network structure generation model M1 to output the neural network structure based on the weight analysis information (step S24).

[0126] As a result, the information processing device 100 generates a neural network structure that reflects the weight analysis information. In Figure 14, the information processing device 100 generates a neural network NW23 using the network structure generation model M1. For example, the information processing device 100 inputs a prompt containing weight analysis information WD21 to the network structure generation model M1 and generates a neural network NW23 structure consisting of layers L11, L22, and L13.

[0127] Furthermore, the information processing device 100 may output not only the network structure but also the initial values ​​of the weights that the user will use for training with that network structure. Initializing the weights is important so that accuracy improves during pre-training.

[0128] Simply put, the information processing device 100 outputs the structure of the neural network NW23, which consists of layers L11, L22, and L13, along with the weights used in layers L11, L22, and L13 by the trained neural networks NW21 and NW22. By using these weights as initial values ​​and fine-tuning by the user, each layer functions as analyzed by the weight analysis model M21, and, similar to pre-training, it achieves high accuracy even when fine-tuning with a small amount of data.

[0129] When proposing or outputting initial weight values, the initial weight values ​​may be output after performing more sophisticated processing, as shown in Figure 15. For example, the information processing device 100 proposes initial weight values ​​suitable for the neural network in addition to the neural network itself. Figure 15 shows an example of processing related to network weights. In Figure 15, the information processing device 100 performs processing related to weights for the neural network NW23.

[0130] For example, the information processing device 100 swaps the weight channels of layer L22 of the neural network NW23 so that they mesh with the weights of the previous layer L11. In Figure 15, the information processing device 100 swaps the first and fourth stages of the trained weights associated with layer L22, thereby changing it to layer L22 (layer L22a in Figure 15) that meshes with the weights of the previous layer L11.

[0131] Furthermore, Figure 15 shows another method by which the information processing device 100 can create better initial weight values, with the initialization of weights for layer L13 of the neural network NW23 being an example. The information processing device 100 creates initial weight values ​​for layer L13 (layer L13a in Figure 15) by selecting from trained weights associated with multiple layers (layers L21 and L23 in Figure 15) or by performing a weight ensemble.

[0132] When adjusting the combinations of weights to achieve good initialization weights through the above process, you may also adjust them by cutting and pasting weights to maximize the Zero Shot NAS metric.

[0133] Here, we show two examples of learning processes for realizing the weight analysis model M21 when the weight analysis model M21 is neural network-based. In the following, the learning process for learning the weight analysis model M21 is described as being primarily performed by the information processing device 100, but the learning process for the weight analysis model M21 is not limited to the information processing device 100; it may be performed by other devices (learning devices).

[0134] One method is to train the weight analysis model M21 individually. In this case, the training process is performed using training data (also called "training data for weight analysis model") which includes data that associates each layer with the correct information (also called "correct weight analysis information") that represents the correct weight analysis result of that layer.

[0135] The training data for the weight analysis model includes combinations of information from layer L11 and the ground truth weight analysis information for layer L11, information from layer L12 and the ground truth weight analysis information for layer L12, and information from layer L13 and the ground truth weight analysis information for layer L13. The training data for the weight analysis model may also include data corresponding to various layers such as layers L21 to L23.

[0136] The information processing device 100 generates a weight analysis model M21 using training data for the weight analysis model. The information processing device 100 learns the network parameters. For example, the information processing device 100 learns the network parameters of the weight analysis model M21.

[0137] The information processing device 100 adjusts the parameters of the weight analysis model M21 so that the weight analysis information output by the weight analysis model M21 approaches the correct information (correct weight analysis information) associated with the network input to the weight analysis model M21. For example, the information processing device 100 adjusts the parameters of the weight analysis model M21 so that the weight analysis information output by the weight analysis model M21 approaches the correct information (correct weight analysis information) associated with the layers input to the weight analysis model M21.

[0138] For example, the information processing device 100 adjusts the parameters of the weight analysis model M21 so that the weight analysis information output by the weight analysis model M21 approaches the correct information (correct weight analysis information) associated with the layer L11 input to the weight analysis model M21. For example, the information processing device 100 performs learning processing using any method such as backpropagation. For example, the information processing device 100 generates the weight analysis model M21 by performing processing such as backpropagation to minimize a predetermined loss function.

[0139] Another learning method involves implicitly training the weight analysis model M21 so that the network structure generation model M1, based on LLM or similar technologies, can produce appropriate output. An example of this method is shown with Figure 16. Figure 16 is a diagram illustrating an example of the learning process. In Figure 16, the learning process is performed using prompts as input, including the string "The following model structure:", weight analysis information for layer L11, weight analysis information for layer L12, weight analysis information for layer L13, and the string "The accuracy is 43% in MSCOCO.", based on the performance information of the trained neural network NW21.

[0140] In other words, the weight analysis model is trained to output analysis information for layers L11, L12, and L13 as appropriate tokens, so that when the structure and weight information of a pre-trained neural network NW21 are input, a network structure generation model M1 such as an LLM can predict its performance. Note that the network structure generation model M1 may also be trained during the training process. Furthermore, after the first training method for the weight analysis model M21, further fine-tuning may be performed using the second training method.

[0141] Furthermore, the learning method (learning process) is not limited to the method described above, and any known technology can be applied. The learning of the weight analysis model M21 may be performed using various conventional machine learning techniques as appropriate. Also, while the example shown in Figure 16 describes the case where the input to the weight analysis model M21 is layer information, the input to the weight analysis model M21 may be any information, as long as it is at least a part of the network. For example, the input to the weight analysis model M21 may be the entire neural network (model), a part containing multiple layers (also called a "block"), or a part containing multiple blocks (also called a "stage").

[0142] Next, an example of inference processing using the network structure generation model M1, i.e., network structure generation processing, will be explained using Figure 17. Figure 17 is a diagram illustrating an example of inference processing. Note that explanations of points similar to those described above will be omitted as appropriate.

[0143] The information processing device 100 performs inference processing by inputting prompts using the weight analysis results from the weight analysis model M21 to the network structure generation model M1. For example, the information processing device 100 performs inference processing by inputting prompts to the network structure generation model M1 that include instructions to propose a new neural network structure, and are based on the neural network structure for which accuracy should be improved and its weight analysis information.

[0144] In Figure 17, the information processing device 100 performs inference processing by inputting prompts including the string "The following model structure:", weight analysis information for layer L11, weight analysis information for layer L12, weight analysis information for layer L13, and the string "Make it even better." into the network structure generation model M1.

[0145] Upon receiving the above prompt input, the network structure generation model M1 outputs information indicating the structure of a new neural network composed of layers L11, L22, and L13. In Figure 17, the network structure generation model M1 outputs information indicating layer L11, information indicating layer L22, information indicating layer L13, and information ON21 including the string "How about this?".

[0146] For example, the information indicating each of layers L11, L22, and L13 output by the network structure generation model M1 may be identification information for identifying each of layers L11, L22, and L13. For example, the information indicating layer L11 output by the network structure generation model M1 may be a token for layer L11. For example, the information indicating layer L22 output by the network structure generation model M1 may be a token for layer L22. For example, the information indicating layer L13 output by the network structure generation model M1 may be a token for layer L13.

[0147] As a result, the information processing device 100 generates a new neural network structure using the weight analysis results from the weight analysis model M21. The information processing device 100 may also perform iterative processing such as CoT (Chain of Thought), as shown in Figure 18. Figure 18 shows an example of the prompt during inference and the output of the LLM.

[0148] In Figure 18, the phrase "Prompt-" is the prompt input to the network structure generation model M1, and the phrase "LLM-" is the output of the network structure generation model M1. In Figure 18, the information processing device 100 requests the network structure generation model M1 to suggest good modules from existing trained models. Then, based on the suggestions from the network structure generation model M1, the information processing device 100 requests the network structure generation model M1 to generate a model (structure). Note that the above is just one example, and the information processing device 100 may perform various processes. For example, the information processing device 100 may ask the network structure generation model M1 to suggest methods for fine-tuning, learning rates for each layer, etc.

[0149] <1-4-3. Third Process (NAS Based on Evaluation)> The information processing device 100 may perform a network structure generation process using an evaluation of the neural network structure. This will be explained as the third process. The information processing device 100 will perform a network structure generation process that applies NAS based on the evaluation of the neural network structure as the third process. Note that explanations of the same points as those described above for the first and second processes will be omitted as appropriate.

[0150] For example, the information processing device 100 uses information indicating the accuracy of the neural network as an evaluation of the neural network structure to perform network structure generation processing. For example, the information processing device 100 generates a neural network structure that takes into account the evaluation of the neural network structure by inputting a prompt containing information indicating the accuracy of the neural network to the network structure generation model M1.

[0151] <1-4-3-1. Procedure for the Third Processing Step> For example, the information processing device 100 performs the third processing step as shown below. The procedure for the third processing step will be explained below. The procedure for the third processing step will be explained using Figure 19. Figure 19 is a flowchart of the procedure for the third processing step.

[0152] The information processing device 100 acquires evaluation information indicating the evaluation of the structure of the first neural network (step S401). For example, the information processing device 100 acquires evaluation information indicating the evaluation of the structure of the first neural network from an external resource device 200 or a storage unit 14 (such as a data pool storage unit 141).

[0153] Then, the information processing device 100 generates a resource reflection query that reflects the evaluation of the structure of the first neural network by using the evaluation information (step S402). For example, the information processing device 100 generates a resource reflection query that reflects the evaluation of the structure of the first neural network as a prompt to input to the network structure generation model M1.

[0154] Then, the information processing device 100 generates the structure of the second neural network, which has changed from the first neural network, by inputting a resource reflection query that reflects the evaluation of the structure of the first neural network into the network structure generation model (step S403). For example, the information processing device 100 generates the structure of the second neural network, which has changed from the first neural network, by inputting a resource reflection query that reflects the evaluation of the structure of the first neural network into the network structure generation model M1.

[0155] <1-4-3-2. Overview of the Third Process> From here, we will explain the overview of the third process. First, we will explain the overview of the evaluation of the neural network (model) structure.

[0156] For example, if the accuracy of a neural network correlates with the score (proxy) of a zero-shot NAS, the structure of the neural network can be evaluated using the zero-shot NAS score, even when the accuracy is unknown. Therefore, the information processing device 100 may use the score (proxy) of a zero-shot NAS as an evaluation of the structure of the neural network. In addition, the information processing device 100 may use information from various methods, not just zero-shot NAS, as an evaluation of the structure of the neural network. For example, the information processing device 100 may use the score (proxy) of a one-shot NAS as an evaluation of the structure of the neural network.

[0157] For example, if the accuracy of a neural network does not correlate with the score (Proxy) of a zero-shot NAS, the structure of the neural network may be evaluated using other information. In this case, the information processing device 100 may evaluate the structure of the neural network using a predictor that can estimate accuracy and ranking based on inputs such as architecture information, parameter size, and number of operations. Such predictors are disclosed, for example, in the following document: • DetOFA: Efficient Training of Once-for-All Networks for Object Detection Using Path Filter, Yuiko Sakuma et al. < https: / / arxiv.org / abs / 2303.13121 >

[0158] For example, the information processing device 100 may use the Path filter from the above-mentioned literature as a predictor to evaluate the structure of the neural network. In this case, the information processing device 100 can estimate a score related to accuracy without performing a learning process, and thus can evaluate the structure of the neural network.

[0159] Furthermore, the information processing device 100 may evaluate the structure of the neural network by the following process. For example, if the accuracy of a given neural network is unknown, the information processing device 100 may evaluate the structure of the neural network by the following process.

[0160] For example, the information processing device 100 performs evaluation processing on a neural network NW31 as shown in Figure 20. Figure 20 is a diagram showing an example of the structure of a neural network. The neural network NW31 in Figure 20 includes four stages, Stages SG1 to SG4. For example, Stage SG1, labeled "Stage1," includes multiple blocks such as a convolutional processing block labeled "Conv" and a pooling processing block labeled "Pooling." Similarly, Stages SG2, SG3, and SG4, labeled "Stage2," "Stage3," and "Stage4," each include multiple blocks such as a convolutional processing block labeled "Conv" and a pooling processing block labeled "Pooling." A block may also have multiple layers.

[0161] For example, the information processing device 100 modifies the structure of the neural network NW31 and calculates the score of the zero-shot NAS for the modified neural network NW31 as the modified score. For example, the information processing device 100 modifies the structure of the neural network NW31 by adding a new layer to the neural network NW31 and calculates the modified score for the modified neural network NW31.

[0162] For example, the information processing device 100 modifies the structure of the neural network NW31 by changing the layers within the neural network NW31, and calculates a modified score for the modified neural network NW31. For example, the information processing device 100 modifies the structure of the neural network NW31 by removing layers within the neural network NW31, and calculates a modified score for the modified neural network NW31.

[0163] The information processing device 100 may evaluate the structure of the neural network based on the modified score. For example, the information processing device 100 may evaluate the structure of the neural network NW31 based on a comparison of the modified score with the score of the zero-shot NAS (also called the "pre-modification score"). If the pre-modification score is better than the modified score, the information processing device 100 may evaluate the neural network NW31 as good.

[0164] Furthermore, the information processing device 100 may use a score other than the zero-shot NAS score as the modified score. For example, the information processing device 100 may perform a predetermined number of learning processes (several iterations, several epochs, etc.) on the modified neural network NW31 and use the loss value during that learning process as the modified score. For example, since similar structures are thought to have a greater correlation with the zero-shot NAS score and accuracy (performance), the information processing device 100 can estimate where it would be best to add layers or blocks through the above-described process.

[0165] Furthermore, the information processing device 100 may perform an evaluation process targeting the neural network NW31 by skipping a portion of the neural network NW31. For example, if there are pre-trained weights, the information processing device 100 uses that information to perform inference and determine the accuracy. If there are no pre-trained weights, the learning process may be performed a predetermined number of times (several iterations, several epochs, etc.), and the loss value during that learning process or the score of the zero-shot NAS may be calculated as the score before modification.

[0166] Furthermore, the information processing device 100 skips some layers, blocks, and stages and performs the same processing as described above. For example, the information processing device 100 skips stage SG3 and performs the learning process a predetermined number of times (several iterations, several epochs, etc.), and calculates the loss value and zero-shot NAS score during that learning process as the modified score.

[0167] Then, the information processing device 100 estimates the change in accuracy due to the skipping based on the results of the above processing. The information processing device 100 evaluates the neural network NW31 using the score before the change and the score after the change. For example, the information processing device 100 evaluates the neural network NW31 using the following equation (1).

[0168] Score AB = (B-A) / B... (1)

[0169] For example, the Score in formula (1) above AB This is a partial evaluation score that indicates the evaluation of the skipped portion of the neural network NW31. In equation (1) above, A corresponds to the score before modification, and B corresponds to the score after modification. For example, a smaller partial evaluation score indicates that the skipped portion is important for accuracy. For example, a larger partial evaluation score indicates that the skipped portion is not important for accuracy. Note that the above partial evaluation score is just an example, and a score that indicates that the skipped portion is important for accuracy may be used.

[0170] For example, the information processing device 100 may sequentially skip each part of the neural network NW31 and calculate a partial evaluation score for each part. Then, the information processing device 100 may use the partial evaluation scores of each part of the neural network NW31 to calculate an evaluation of the entire neural network NW31.

[0171] Furthermore, the information processing device 100 may perform evaluation processing with an additional output layer, as shown in Figure 21. Figure 21 is a diagram showing an example of a configuration for evaluation processing. Note that explanations of points similar to those described above will be omitted as appropriate.

[0172] In Figure 21, the information processing device 100 performs evaluation processing on the neural network NW31 using the neural network NW32, which is the neural network NW31 with output layers L31, L32, and L33 added. The output layers L31, L32, and L33 are layers used to determine accuracy. For example, if the task is a classification task to determine whether the input image is a cat or something else, these layers output the classification result.

[0173] For example, if pre-trained weights are available, the information processing device 100 performs inference via output layers L31, L32, and L33. If pre-trained weights are not available, the device performs a predetermined number of training processes (e.g., several iterations, several epochs), calculates the loss value during the training process, and performs inference using the trained neural network NW32.

[0174] The information processing device 100 uses a predetermined number of input images to calculate the accuracy of each based on the correctness of the inference results at the outputs of each output layer L31, L32, L33 and the neural network NW32. For example, the information processing device 100 uses a predetermined number of input images to calculate the accuracy of each based on the accuracy of the inference results at the outputs of each output layer L31, L32, L33 and the neural network NW32.

[0175] In Figure 21, the information processing device 100 calculates accuracy A as the accuracy at output layer L31, i.e., the accuracy when using only stage SG1. The information processing device 100 also calculates accuracy B as the accuracy at output layer L32, i.e., the accuracy when using both stage SG1 and stage SG2. The information processing device 100 also calculates accuracy C as the accuracy at output layer L33, i.e., the accuracy when using stages SG1, SG2, and SG3. The information processing device 100 also calculates accuracy D as the accuracy of the entire neural network NW32, i.e., the accuracy when using stages SG1, SG2, SG3, and SG4.

[0176] The information processing device 100 may use the calculated precision to evaluate each stage. For example, the information processing device 100 may use precision A and precision B to evaluate stage SG1. For example, the information processing device 100 may evaluate stage SG1 higher the higher precision A is than precision B. The information processing device 100 may also use precision A and precision B to evaluate stage SG2. For example, the information processing device 100 may evaluate stage SG2 higher the higher precision B is than precision A.

[0177] Furthermore, the information processing device 100 may evaluate stage SG3 using accuracy B and accuracy C. For example, the information processing device 100 may rate stage SG3 higher the higher accuracy C is than accuracy B. Also, the information processing device 100 may evaluate stage SG4 using accuracy C and accuracy D. For example, the information processing device 100 may rate stage SG4 higher the higher accuracy D is than accuracy C. Note that the above evaluations are merely examples, and the information processing device 100 may evaluate each stage using any method.

[0178] Furthermore, while the above example shows the case where an output layer is added for each stage, output layers may also be added for each layer, each block, etc. For example, if an output layer is added for each block, the information processing device 100 can calculate the accuracy for each block and perform evaluation for each block. Alternatively, the information processing device 100 may calculate the zero-shot NAS score for each stage and use the calculated score to evaluate each stage.

[0179] Furthermore, the information processing device 100 may perform the evaluation process with a configuration that adds a path for acquiring feature quantities after each stage, as shown in Figure 22. Figure 22 is a diagram showing an example of a configuration for the evaluation process. Note that explanations of points that are the same as those described above will be omitted as appropriate.

[0180] In Figure 22, the information processing device 100 performs evaluation processing on the neural network NW31 using a neural network NW33, which is the neural network NW31 with a feature acquisition path added after each stage. The feature acquisition path is used to determine accuracy and is configured to acquire features after each stage.

[0181] For example, feature O31, labeled "Feature map" at the end of a path extending downwards from stage SG1, is a feature obtained through processing up to stage SG1. Feature O32, labeled "Feature map" at the end of a path extending downwards from stage SG2, is a feature obtained through processing up to stage SG2. Feature O33, labeled "Feature map" at the end of a path extending downwards from stage SG3, is a feature obtained through processing up to stage SG3. Feature O34, labeled "Feature map" at the end of a path extending downwards from stage SG4, is a feature obtained through processing up to stage SG4. When explaining features O31 to O34 without distinguishing between them, they may be referred to as "Feature O3".

[0182] The information processing device 100 acquires feature quantities before and after a stage using the neural network NW33 configuration shown in Figure 22. Note that feature quantity O3 is not limited to the feature quantity itself, but may also be information based on the feature quantity. For example, feature quantity O3 may be information obtained by dimensionality reduction of the feature quantity using PCA (Principal Component Analysis). Alternatively, for example, feature quantity O3 may be information obtained by transforming the feature quantity via an output layer.

[0183] The information processing device 100 calculates the correlation value between each of the feature quantities O3 and the ground truth data. For example, the correlation value may be an eigenvalue, a contribution rate, etc. The information processing device 100 calculates the correlation value with the ground truth data using, for example, CCA (Canonical Correlation Analysis). For example, the information processing device 100 calculates the correlation value A (also called the "correlation value of feature quantity O31") between feature quantity O31 and the ground truth data. The information processing device 100 also calculates the correlation value B (also called the "correlation value of feature quantity O32") between feature quantity O32 and the ground truth data. The information processing device 100 also calculates the correlation value C (also called the "correlation value of feature quantity O33") between feature quantity O33 and the ground truth data. The information processing device 100 also calculates the correlation value D (also called the "correlation value of feature quantity O34") between feature quantity O34 and the ground truth data.

[0184] The information processing device 100 investigates how much the correlation value changes (improves or deteriorates) before and after a stage. The information processing device 100 may use the calculated correlation value to evaluate each stage. For example, the information processing device 100 may use the correlation value of feature O31 and the correlation value of feature O32 to evaluate stage SG1. For example, the information processing device 100 may rate stage SG1 higher the higher the correlation value of feature O31 is than the correlation value of feature O32. The information processing device 100 may also use the correlation value of feature O31 and the correlation value of feature O32 to evaluate stage SG2. For example, the information processing device 100 may rate stage SG2 higher the higher the correlation value of feature O32 is than the correlation value of feature O31.

[0185] Furthermore, the information processing device 100 may evaluate stage SG3 using the correlation value of feature O32 and the correlation value of feature O33. For example, the information processing device 100 may rate stage SG3 higher the higher the correlation value of feature O33 is than the correlation value of feature O32. Also, the information processing device 100 may evaluate stage SG4 using the correlation value of feature O33 and the correlation value of feature O34. For example, the information processing device 100 may rate stage SG4 higher the higher the correlation value of feature O34 is than the correlation value of feature O33. Note that the above evaluations are merely examples, and the information processing device 100 may evaluate each stage using any method.

[0186] Furthermore, while the above example shows the case where features are acquired for each stage, features may also be acquired for each layer, each block, etc. For example, if features are acquired for each block, the information processing device 100 can calculate correlation values ​​for each block and perform evaluations for each block.

[0187] Furthermore, the information processing device 100 may perform an evaluation of the entire neural network based on the evaluation of each part, as shown in Figure 23. Figure 23 is a diagram illustrating an example of the evaluation process. For example, Figure 23 shows an example of performing an evaluation of the entire neural network based on the evaluation of each block. Note that the parts are not limited to blocks, but may also include layers, etc.

[0188] Figure 23 shows the state where the scores (evaluations) for each stage of the neural network NW31 have been calculated. In Figure 23, the score for stage SG1 is 0.8, the score for stage SG2 is 0.2, the score for stage SG3 is 0.6, and the score for stage SG4 is 1.0.

[0189] The information processing device 100 calculates the score of the entire neural network using the scores of each part. In Figure 23, the information processing device 100 calculates the score of the neural network NW31 using the scores of stages SG1 to SG4. For example, the information processing device 100 calculates the sum of the scores of stages SG1 to SG4 as the score (evaluation) of the neural network NW31. In this case, the information processing device 100 calculates the score (evaluation) of the neural network NW31 as "2.6 (= 0.8 + 0.2 + 0.6 + 1.0)".

[0190] Alternatively, the information processing device 100 may calculate the score (evaluation) of the neural network NW31 by multiplying the scores of stages SG1 to SG4. In this case, the information processing device 100 calculates the score (evaluation) of the neural network NW31 as "0.096 (= 0.8 * 0.2 * 0.6 * 1.0)". In this way, the information processing device 100 may calculate the evaluation of the entire neural network by treating the scores representing the evaluation of each part as probabilities and multiplying them.

[0191] The above is merely an example, and the information processing device 100 may use various information to calculate an evaluation of the entire neural network. For example, the information processing device 100 may use a predetermined function to calculate an evaluation of the entire neural network. The information processing device 100 may also calculate the average of the scores of stages SG1 to SG4 as the score (evaluation) of the neural network NW31. In this case, the information processing device 100 calculates the score (evaluation) of the neural network NW31 as "0.65 (= (0.8 + 0.2 + 0.6 + 1.0) / 4)".

[0192] In addition to the examples described above, the information processing device 100 may also perform evaluation processing using various types of information. Several examples of this are given below.

[0193] For example, the information processing device 100 may perform an evaluation process using rank information. For example, the information processing device 100 may calculate the rank of the matrix for the trained weights of a certain layer and perform an evaluation process using the calculated rank. For example, if the rank is small, the information processing device 100 may determine that there are many redundant channels and that the quality is poor, and may, for example, give a low evaluation to that layer.

[0194] For example, the information processing device 100 may perform evaluation processing using eigenvalue information. For example, the information processing device 100 may obtain eigenvalues ​​for the trained weights of a certain layer using PCA or SVD (Singular Value Decomposition), and perform evaluation processing using the obtained eigenvalues. For example, if the contribution rate exceeds, for example, 99% in a small number of dimensions, the information processing device 100 may determine that there are many redundant channels and that the quality is poor, and may, for example, give a low evaluation to that layer.

[0195] For example, the information processing device 100 may perform evaluation processing using sparsity information. For example, the information processing device 100 may find the number of weights close to zero for the trained weights of a certain layer, and perform evaluation processing using the information of the number of weights close to zero (sparsity) obtained. For example, if there are many weights close to zero, the information processing device 100 may determine that there are many unnecessary channels or weights, judge that the quality is poor, and, for example, lower the evaluation of that layer.

[0196] The information processing device 100 described above may perform evaluation processing using the zero-shot NAS score. For example, the information processing device 100 calculates the zero-shot NAS score from only one part of a particular stage, block, or layer. For example, if the calculated zero-shot NAS score is high, the information processing device 100 evaluates that part as good.

[0197] For example, the information processing device 100 may perform evaluation processing using the output information of the ReLU (Rectified Linear Unit). For example, if the information processing device 100 uses the ReLU in any part of a stage, block, or layer, it may analyze the zeros in the feature map output from the ReLU and perform evaluation processing based on the analysis results. For example, if the number of zeros is extremely large, the information processing device 100 may determine that the quality of that part (stage, block, layer, etc.) is poor and give that part (stage, block, layer, etc.) a low evaluation.

[0198] For example, the information processing device 100 may determine the correlation between the output results of each layer based on an algorithm such as CKA (Centered Kernel Alignment), and perform an evaluation process based on the determined correlation. For example, the information processing device 100 may determine that parts (stages, blocks, layers, etc.) that output feature maps with high correlation are wasteful and of poor quality, and may give a low evaluation to those parts (stages, blocks, layers, etc.).

[0199] The information processing device 100 uses the evaluation information obtained through the evaluation process described above to execute a network structure generation process. For example, the information processing device 100 generates a prompt that includes evaluation information indicating the evaluation of the neural network NW31. The information processing device 100 then inputs the prompt containing the evaluation information indicating the evaluation of the neural network NW31 to the network structure generation model M1, thereby reflecting the evaluation of the neural network NW31 and generating the structure of a new neural network, the neural network NW32, which is a modified version of the neural network NW31.

[0200] Furthermore, the information processing device 100 may perform processing using the neural network NW32. For example, the information processing device 100 generates evaluation information indicating an evaluation of the structure of the neural network NW32. Then, using the evaluation information of the neural network NW32, the information processing device 100 generates a resource reflection query that reflects the evaluation of the structure of the neural network NW32. Then, by inputting the resource reflection query that reflects the evaluation of the structure of the neural network NW32 into the network structure generation model M1, the information processing device 100 generates the structure of a new neural network, the neural network NW33, which is a modified version of the neural network NW32. In this way, the information processing device 100 may generate the structure of a neural network through iterative processing.

[0201] Information processing system 1 generates the structure of a neural network through network structure generation processes such as the first, second, and third processes described above. Information processing system 1 may also generate the neural network structure by performing various other processes besides those described above. For example, information processing system 1 may exclude modules (layers, etc.) that cannot be used on the target hardware from the trained model database. Furthermore, information processing system 1 may update module information such as speed and memory usage based on measurements taken on the actual hardware.

[0202] The information processing system 1 may store the information of each module as a RAG and refer to it as needed. As described above, the information processing system 1 may use the generated information to generate the next prompt, similar to the method of a genetic algorithm. For example, the information processing system 1 may use methods such as EvoPrompting, LLMATIC, or GENIUS.

[0203] Information processing system 1 may use weight information provided by various open-source software such as DINOv2, CLIP, and ResNet. In addition to proposing new neural network structures, information processing system 1 may also make various other suggestions. For example, information processing system 1 may also output and propose initial values ​​for the weights.

[0204] The information processing system 1 may perform network structure generation processing based on interaction with the user. For example, if the information processing system 1 receives instructions from the user such as "Make the model a little smaller," "Don't use layer LXX," or "Analyze and use the weights of the neural network NWXX," it may generate a neural network structure corresponding to those instructions.

[0205] <2. Others> The above-described processes are merely examples, and the information processing device 100 may perform various processes. For example, the information processing device 100 may perform a learning process to learn various AI models (mathematical models) such as the network structure generation model M1. In this case, the control unit 15 of the information processing device 100 may include a learning unit that performs the learning process. Also, for example, the storage unit 14 of the information processing device 100 may store data used for the learning process (learning data).

[0206] The learning unit of the information processing device 100 (specifically the control unit 15) executes a learning process to learn various models. The learning unit learns various information based on information from an external information processing device and information stored in the storage unit 14. The learning unit stores the model generated by the learning process in the storage unit 14. The learning unit also stores the model with updated parameters from the learning process in the storage unit 14.

[0207] The learning unit learns various types of information based on the information acquired by the acquisition unit 151. The learning unit learns (generates) a model. The learning unit learns various types of information, such as the model. The learning unit generates a model through learning. The learning unit learns the model using various machine learning techniques. For example, the learning unit learns the parameters of the model (network). The learning unit learns the model using various machine learning techniques.

[0208] The learning unit generates a network structure generation model M1. The learning unit learns the network parameters. For example, the learning unit learns the network parameters of the network structure generation model M1.

[0209] The learning unit performs learning processing based on the learning data (training data) stored in the memory unit 14. The learning unit generates a network structure generation model M1 by performing learning processing using the learning data stored in the memory unit 14. For example, the learning unit generates a model used in the network structure generation process. The learning unit generates the network structure generation model M1 by learning the network parameters of the network structure generation model M1.

[0210] For example, the learning unit performs learning processing using methods such as backpropagation so that the neural network structure output by the network structure generation model M1 approaches the correct information (the correct neural network structure) associated with the prompt input to the network structure generation model M1. For example, the learning unit adjusts the values ​​of the weights (i.e., connection coefficients) that are considered when values ​​are transmitted between nodes during the learning process. In this way, the learning unit learns the network structure generation model M1 by processing such as backpropagation to correct the parameters (connection coefficients) so that the error between the output of the network structure generation model M1 and the correct information corresponding to the input is reduced. For example, the learning unit generates the network structure generation model M1 by processing such as backpropagation to minimize a predetermined loss function. This allows the learning unit to perform learning processing to learn the parameters of the network structure generation model M1.

[0211] The learning method (learning process) is not limited to the methods described above, and any known technology can be applied. Furthermore, the learning of the network structure generation model M1 may be carried out using various conventional machine learning techniques as appropriate. For example, the learning of the network structure generation model M1 may be carried out using machine learning techniques such as linear regression or nonlinear regression. For example, the learning of the network structure generation model M1 may be carried out using supervised learning machine learning techniques such as SVM (Support Vector Machine). For example, the learning of the network structure generation model M1 may be carried out using deep learning techniques. Also, each component of the network structure generation model M1 may be learned individually.

[0212] Furthermore, the processing described in the above-described embodiments may be carried out in various other forms (modifications) besides those described above. For example, the system configuration may be various forms, not limited to the examples described above.

[0213] For example, the information processing system 1 may include a device that performs learning processing (learning device) and a device that performs inference processing using the model learned by the learning device (e.g., information processing device 100). In this case, the information processing system 1 may include both the learning device and the information processing device 100. The above is just an example, and the information processing system 1 may be implemented in various configurations. For example, the information processing device 100 may be a device that performs both learning processing and inference processing. That is, in the information processing system 1, the learning device and the inference device may be integrated.

[0214] Furthermore, among the processes described in each of the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically by known methods. In addition, the processing procedures, specific names, and information including various data and parameters shown in the above document and drawings can be changed at will unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown.

[0215] Furthermore, the components of each illustrated device are functionally conceptual and do not necessarily need to be physically configured as shown. In other words, the specific forms of distribution and integration of each device are not limited to those shown, and all or part of them can be functionally or physically distributed and integrated in any unit according to various loads and usage conditions.

[0216] Furthermore, the embodiments and modifications described above can be combined as appropriate, provided that the processing content is not inconsistent.

[0217] Furthermore, the effects described herein are merely illustrative and not limiting; other effects may also occur.

[0218] <3. Hardware Configuration> The information processing device 100 and the like according to the embodiments described above are realized by a computer 1000 having a configuration such as that shown in Figure 24. The information processing device 100 will be used as an example for explanation. Figure 24 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing device. The computer 1000 has a processing circuitry 1100, RAM 1200, ROM 1300, secondary storage device 1400, communication interface 1500, input / output interface 1600, display unit 1700, camera unit 1800, microphone 1900, and speaker 2000. The various parts of the computer 1000 are connected by a bus 1050.

[0219] The processing circuit 1100 operates based on a program stored in the ROM 1300 or secondary storage device 1400, and controls each part. For example, the processing circuit 1100 loads the program stored in the ROM 1300 or secondary storage device 1400 into the RAM 1200 and executes processing corresponding to various programs.

[0220] ROM 1300 stores boot programs such as the BIOS (Basic Input Output System) that are executed by the processing circuit 1100 when the computer 1000 starts up, as well as programs that depend on the computer 1000's hardware.

[0221] The secondary storage device 1400 is a computer-readable recording medium that non-temporarily stores programs executed by the processing circuit 1100 and data used by such programs. Specifically, the secondary storage device 1400 is a recording medium that stores programs for each process of the information processing device 100 according to the embodiment, which is an example of program data 1450.

[0222] The communication interface 1500 is an interface for the computer 1000 to connect to the external network 1550. The communication interface 1500 corresponds to the communication unit 11 of the information processing device 100. For example, the processing circuit 1100 receives data from other devices or transmits data generated by the processing circuit 1100 to other devices via the communication interface 1500.

[0223] The input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000. For example, the processing circuit 1100 receives data from input devices such as a microphone 1900 or a touch panel via the input / output interface 1600. The processing circuit 1100 also transmits data to output devices such as a display unit 1700 or a speaker 2000 via the input / output interface 1600. The input / output interface 1600 may also function as a media interface for reading programs recorded on a predetermined recording medium (media). Examples of media include optical recording media such as DVDs (Digital Versatile Discs) and PDs (Phase Change Rewritable Disks), magneto-optical recording media such as MOs (Magneto-Optical Disks), tape media, magnetic recording media, or semiconductor memory.

[0224] The display unit 1700 is an interface for displaying information processed by the computer 1000. The display unit 1700 is, for example, a liquid crystal display or an organic electroluminescent display (Organic Electro Luminescence Display). Alternatively, the display unit 1700 may be a touch panel display device or an image projection device.

[0225] The camera unit 1800 is an interface for the computer 1000 to capture images. The microphone 1900 is an interface for the computer 1000 to capture sound. The speaker 2000 is an interface for the computer 1000 to output processed sound. The various parts of the computer 1000 are connected by the bus 1050. Each interface does not necessarily have to be located inside the computer 1000, but may be located outside the computer 1000 via a network or the like. Furthermore, each part of the computer 1000 may be controlled by a circuit different from the processing circuit 1100. For example, the display unit 1700 may be controlled not by the processing circuit 1100, but by a circuit dedicated to display processing provided within the display unit 1700.

[0226] For example, when the computer 1000 functions as an information processing device 100 according to the embodiment, the processing circuit 1100 of the computer 1000 functions as a control unit 15 by executing a program loaded onto the RAM 1200. The secondary storage device 1400 stores the information processing program according to this disclosure and various data stored by the storage unit 14. The processing circuit 1100 reads and executes the program data 1450 from the secondary storage device 1400, but as another example, these programs may be obtained from other devices via an external network 1550. In other words, the secondary storage device 1400 is not limited to being inside the computer 1000, but may be located outside the computer 1000. The processing circuit 1100 is an example of an integrated circuit, and CPU, MPU, GPU, APU, ASIC, and FPGA can all be considered integrated circuits.

[0227] Furthermore, this technology can also take the following configurations: (1) An information processing device comprising: a network structure generation model for generating the structure of a neural network; an acquisition unit for acquiring external resource information, which is information outside the network structure generation model used for generating the structure of the neural network; a first generation unit that generates a resource reflection query, which reflects the external resource information, as a query to be input to the network structure generation model by using the external resource information; and a second generation unit that generates the structure of the neural network by inputting the resource reflection query generated using the external resource information to the network structure generation model and causing the network structure generation model to output information indicating the structure of the neural network. (2) The information processing device according to (1), wherein the first generation unit generates the resource reflection query as a prompt to be input to the network structure generation model, which is a large-scale language model; and the second generation unit generates the structure of the neural network by inputting the resource reflection query, which is a prompt generated using the external resource information, to the network structure generation model, which is a large-scale language model. (3) The information processing apparatus according to (1) or (2), wherein the acquisition unit acquires parent model information, which is information of a source neural network that is the source of the structure of the neural network; the first generation unit generates a resource reflection query that reflects the parent model information and the external resource information by using the parent model information and the external resource information; and the second generation unit generates the structure of the neural network which has been changed from the source neural network by inputting the resource reflection query generated using the parent model information and the external resource information into the network structure generation model.(4) The information processing apparatus according to (3), wherein the acquisition unit acquires information relating to the neural network corresponding to the user's request from an external device as the external resource information, and the first generation unit generates the resource reflection query using the information relating to the neural network corresponding to the user's request. (5) The information processing apparatus according to (4), wherein the acquisition unit acquires the parent model information indicating the source neural network that satisfies the user's request for the neural network, the first generation unit generates the resource reflection query that reflects the user's request by using the parent model information and the external resource information, and the second generation unit generates the structure of the neural network that reflects the user's request by inputting the resource reflection query generated using the parent model information and the external resource information into the network structure generation model. (6) The information processing apparatus according to (5), wherein the acquisition unit acquires parent model information indicating the source neural network that satisfies the specifications requested by the user for the neural network; the first generation unit generates a resource reflection query that reflects the specifications indicated by the parent model information; and the second generation unit generates the structure of the neural network that reflects the specifications requested by the user by inputting the resource reflection query that reflects the specifications into the network structure generation model.(7) The acquisition unit acquires the parent model information as the source neural network, which has the structure of the neural network generated by the second generation unit; the first generation unit generates a resource reflection query that reflects the first neural network by using the parent model information and the external resource information that represent the first neural network; and the second generation unit generates the structure of the second neural network which has changed from the first neural network by inputting the resource reflection query that reflects the first neural network into the network structure generation model. The information processing device according to any one of (3) to (6). (8) The acquisition unit acquires information regarding the weights of a trained model as the external resource information; the first generation unit generates a resource reflection query that reflects the weights of the trained model by using the external resource information; and the second generation unit generates the structure of the neural network based on the resource reflection query generated using the external resource information. The information processing device according to any one of (1) to (7). (9) The information processing device according to (8), wherein the second generation unit proposes initial values ​​for weights suitable for the neural network in addition to the neural network. (10) The information processing device according to (8) or (9), wherein the first generation unit generates a resource reflection query that reflects the weights of the trained model by using weight analysis information obtained by analyzing the weights of the trained model indicated by the external resource information. (11) The information processing device according to (10), wherein the first generation unit generates the weight analysis information using an analysis model that takes information of the trained model including the weights as input and outputs information indicating the analysis results regarding the weights of the trained model, and generates a resource reflection query that reflects the weights of the trained model by using the weight analysis information generated by the analysis model.(12) The information processing apparatus according to (10) or (11), wherein the first generation unit generates a resource reflection query that reflects the weights of the trained model by using the weight analysis information obtained by analyzing the weights of each component included in the trained model indicated by the external resource information, and the second generation unit generates the structure of the neural network that reflects the weights of each component included in the trained model by inputting the resource reflection query generated using the external resource information into the network structure generation model. (13) The information processing device according to any one of (8) to (12). (14) The information processing device according to any one of (1) to (13), comprising: (13) The acquisition unit acquires parent model information which is information of a first neural network having the structure of the neural network generated by the second generation unit; the first generation unit generates a resource reflection query that reflects the first neural network by using the parent model information and the external resource information that represent the first neural network; and the second generation unit generates a structure of a second neural network that has been changed from the first neural network by inputting the resource reflection query that reflects the first neural network into the network structure generation model. (16) The information processing apparatus according to (15), wherein the evaluation unit evaluates the structure of the neural network to be evaluated using the modified neural network, which has components added to the neural network to be evaluated.(17) The information processing device according to (15) or (16), wherein the evaluation unit evaluates the structure of the neural network to be evaluated using the modified neural network, which has some components removed from the neural network to be evaluated. (18) The information processing device according to any one of (15) to (17), wherein the evaluation unit evaluates the structure of the neural network to be evaluated based on a comparison between a first index value corresponding to the neural network to be evaluated and a second index value corresponding to the modified neural network. (19) The information processing device according to any one of (14) to (18), wherein the acquisition unit acquires evaluation information indicating the evaluation of the structure of the first neural network by the evaluation unit, the first generation unit generates a resource reflection query that reflects the evaluation of the structure of the first neural network by the evaluation unit using the evaluation information, and the second generation unit generates the structure of a second neural network modified from the first neural network by inputting the resource reflection query that reflects the evaluation of the structure of the first neural network by the evaluation unit into the network structure generation model. (20) An information processing method comprising: a network structure generation model for generating the structure of a neural network; acquiring external resource information which is information outside the network structure generation model used for generating the structure of the neural network; generating a resource reflection query that reflects the external resource information by using the external resource information, as a query to be input to the network structure generation model; and generating the structure of the neural network by inputting the resource reflection query generated using the external resource information to the network structure generation model and causing the network structure generation model to output information indicating the structure of the neural network.(21) An information processing program that causes a computer to perform the following actions: a network structure generation model for generating the structure of a neural network; acquiring external resource information, which is information outside the network structure generation model used for generating the structure of the neural network; generating a resource reflection query that reflects the external resource information by using the external resource information, as a query to be input to the network structure generation model; and generating the structure of the neural network by inputting the resource reflection query generated using the external resource information to the network structure generation model and causing the network structure generation model to output information indicating the structure of the neural network.

[0228] 1. Information Processing System 100. Information Processing Device 11. Communication Unit 12. Input Unit 13. Output Unit 14. Storage Unit 141. Data Pool Storage Unit 15. Control Unit 151. Acquisition Unit 152. Evaluation Unit 153. First Generation Unit 154. Second Generation Unit 155. Transmission Unit 200. External Resource Device

Claims

A network structure generation model that generates the structure of a neural network, and an acquisition unit that acquires external resource information, which is information outside the network structure generation model used to generate the structure of the neural network. A first generation unit generates a resource reflection query that reflects the external resource information by using the external resource information, as a query to be input to the network structure generation model, A second generation unit generates the structure of the neural network by inputting the resource reflection query generated using the external resource information into the network structure generation model and causing the network structure generation model to output information indicating the structure of the neural network, An information processing device equipped with the following features.   The first generation unit is, The resource reflection query is generated as a prompt to be input to the network structure generation model, which is a large-scale language model. The second generation unit is, The structure of the neural network is generated by inputting the resource reflection query, which is a prompt generated using the external resource information, into the network structure generation model, which is the large-scale language model. The information processing apparatus according to claim 1.   The acquisition unit is, The parent model information, which is information about the source neural network from which the structure of the aforementioned neural network is generated, is obtained. The first generation unit is, By using the parent model information and the external resource information, a resource reflection query is generated that reflects the parent model information and the external resource information. The second generation unit is, By inputting the resource reflection query, generated using the parent model information and the external resource information, into the network structure generation model, the structure of the neural network, which has been modified from the original neural network, is generated. The information processing apparatus according to claim 1.   The acquisition unit is, Information regarding the neural network that responds to user requests is obtained from an external device as external resource information. The first generation unit is, The resource reflection query is generated using information about the neural network that corresponds to the user's request. The information processing apparatus according to claim 3.   The acquisition unit is, The parent model information is obtained, which indicates the source neural network that satisfies the user's requirements for the neural network. The first generation unit is, By using the parent model information and the external resource information, the resource reflection query that reflects the user's request is generated. The second generation unit is, By inputting the resource reflection query, generated using the parent model information and the external resource information, into the network structure generation model, the structure of the neural network that reflects the user's request is generated. The information processing apparatus according to claim 4.   The acquisition unit is, The parent model information is obtained, which indicates the source neural network that satisfies the specifications required by the user for the neural network. The first generation unit is, The resource reflection query is generated, which reflects the specifications indicated by the parent model information. The second generation unit is, By inputting the resource reflection query, which reflects the aforementioned specifications, into the network structure generation model, the neural network structure that reflects the specifications requested by the user is generated. The information processing apparatus according to claim 5.   The acquisition unit is, The first neural network having the structure of the neural network generated by the second generation unit is used as the source neural network to acquire the parent model information. The first generation unit is, By using the parent model information representing the first neural network and the external resource information, the resource reflection query reflecting the first neural network is generated. The second generation unit is, By inputting the resource reflection query, which reflects the first neural network, into the network structure generation model, the structure of the second neural network, which has been modified from the first neural network, is generated. The information processing apparatus according to claim 3.   The acquisition unit is, Information regarding the weights of the trained model is obtained as the external resource information, The first generation unit is, By using the aforementioned external resource information, a resource reflection query is generated that reflects the weights of the trained model. The second generation unit is, Based on the resource reflection query generated using the external resource information, the structure of the neural network is generated. The information processing apparatus according to claim 1.   The second generation unit is, In addition to the aforementioned neural network, we propose initial weight values ​​suitable for the aforementioned neural network. The information processing apparatus according to claim 8.   The first generation unit is, By using the weight analysis information obtained by analyzing the weights of the trained model indicated by the external resource information, the resource reflection query is generated that reflects the weights of the trained model. The information processing apparatus according to claim 8.   The first generation unit is, The weight analysis information is generated using an analysis model that takes the information of the trained model, including the weights, as input and outputs information showing the analysis results regarding the weights of the trained model. The resource reflection query that reflects the weights of the trained model is then generated using the weight analysis information generated by the analysis model. The information processing apparatus according to claim 10.   The first generation unit is, By using the weight analysis information obtained by analyzing the weights for each component included in the trained model indicated by the external resource information, a resource reflection query is generated that reflects the weights of the trained model. The second generation unit is, By inputting the resource reflection query generated using the external resource information into the network structure generation model, the structure of the neural network is generated in which the weights are reflected for each component included in the trained model. The information processing apparatus according to claim 10.   The acquisition unit is, The parent model information, which is information of the first neural network having the structure of the neural network generated by the second generation unit, is obtained. The first generation unit is, By using the parent model information representing the first neural network and the external resource information, the resource reflection query reflecting the first neural network is generated. The second generation unit is, By inputting the resource reflection query, which reflects the first neural network, into the network structure generation model, the structure of the second neural network, which has been modified from the first neural network, is generated. The information processing apparatus according to claim 8.   An evaluation unit evaluates the structure of the neural network to be evaluated, which is the neural network generated by the second generation unit. Equipped with The information processing apparatus according to claim 1.   The evaluation unit described above, The structure of the neural network to be evaluated is evaluated using the modified neural network, which is a modified version of the neural network whose structure has been altered. The information processing apparatus according to claim 14. The evaluation unit described above, The structure of the neural network to be evaluated is evaluated using the modified neural network, which has components added to the neural network to be evaluated. The information processing apparatus according to claim 15.   The evaluation unit described above, The structure of the neural network to be evaluated is evaluated using the modified neural network, which has some of its components removed. The information processing apparatus according to claim 15.   The evaluation unit described above, The structure of the neural network to be evaluated is evaluated based on a comparison between a first index value corresponding to the neural network under evaluation and a second index value corresponding to the neural network after modification. The information processing apparatus according to claim 15.   The acquisition unit is, The evaluation unit obtains evaluation information indicating the evaluation of the structure of the first neural network, The first generation unit is, By using the aforementioned evaluation information, the resource reflection query is generated that reflects the evaluation of the structure of the first neural network by the evaluation unit. The second generation unit is, By inputting the resource reflection query, which reflects the evaluation of the structure of the first neural network by the evaluation unit, into the network structure generation model, the structure of the second neural network, which is modified from the first neural network, is generated. The information processing apparatus according to claim 14.   A network structure generation model that generates the structure of a neural network, and the acquisition of external resource information, which is information outside the network structure generation model used to generate the structure of the neural network, By using the aforementioned external resource information, a resource reflection query that reflects the aforementioned external resource information is generated as a query to be input to the network structure generation model, The resource reflection query generated using the external resource information is input to the network structure generation model, and the network structure generation model outputs information indicating the structure of the neural network, thereby generating the structure of the neural network. Information processing methods including   A network structure generation model that generates the structure of a neural network, and the acquisition of external resource information, which is information outside the network structure generation model used to generate the structure of the neural network, By using the aforementioned external resource information, a resource reflection query that reflects the aforementioned external resource information is generated as a query to be input to the network structure generation model, The resource reflection query generated using the external resource information is input to the network structure generation model, and the network structure generation model outputs information indicating the structure of the neural network, thereby generating the structure of the neural network. An information processing program that causes a computer to execute something.