Information processing device, information processing method, and program
The information processing apparatus and method address the challenge of estimating language model performance without prior evaluation by using acquired metadata, effectively estimating scores for unevaluated pairs.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- NEC CORP
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-24
Smart Images

Figure 2026103664000001_ABST
Abstract
Description
[Technical Field]
[0001] This disclosure relates to an information processing device, an information processing method, and a program. [Background technology]
[0002] With the development of language models (LMs) using machine learning, techniques for evaluating the performance of language models have been proposed. For example, Non-Patent Document 1 discloses a technique for evaluating the performance of large language models (LLMs) in which language model features (LLM features) and problem features are calculated from the scores of evaluated pairs of problems and language models, and these are used to estimate the scores of unevaluated pairs. [Prior art documents] [Non-patent literature]
[0003] [Non-Patent Document 1] Felipe Maia Polo et al., tinyBenchmarks: evaluating LLMs with fewer examples, arXiv :2402.14992 (May 2024) [Overview of the project] [Problems that the invention aims to solve]
[0004] However, the technology described in Non-Patent Document 1 has the problem that it cannot perform performance estimation for language models for which evaluated pairs (evaluation information) have not been obtained.
[0005] This disclosure has been made in view of the above-mentioned problems, and one exemplary objective is to provide a technique that can suitably estimate the performance of a language model even if evaluation information is not available. [Means for solving the problem]
[0006] An information processing apparatus according to an exemplary aspect of the present disclosure includes: a first acquisition unit that acquires evaluation information of at least one of one or more language models, the evaluation information being related to at least one of one or more problems; a second acquisition unit that acquires meta information related to at least one of the one or more language models; and an estimation unit that performs performance estimation processing related to at least one of the one or more language models by referring to the evaluation information and the meta information.
[0007] An information processing method according to an exemplary aspect of the present disclosure includes: one or more processors acquiring evaluation information of at least one of one or more language models, the evaluation information being related to at least one of one or more problems; acquiring meta information related to at least one of the one or more language models; and performing performance estimation processing related to at least one of the one or more language models by referring to the evaluation information and the meta information.
[0008] A program according to an exemplary aspect of the present disclosure is a program that causes a computer to function as an information processing apparatus, and causes the computer to function as: a first acquisition unit that acquires evaluation information of at least one of one or more language models, the evaluation information being related to at least one of one or more problems; a second acquisition unit that acquires meta information related to at least one of the one or more language models; and an estimation unit that performs performance estimation processing related to at least one of the one or more language models by referring to the evaluation information and the meta information.
Advantages of the Invention
[0009] According to an exemplary aspect of the present disclosure, there is an exemplary effect that performance estimation can be suitably performed even for a language model for which evaluation information is not obtained.
Brief Description of the Drawings
[0010] [Figure 1]This is a block diagram showing the configuration of the information processing device related to this disclosure. [Figure 2] This is a flowchart showing the flow of the information processing method related to this disclosure. [Figure 3] This is a block diagram showing the configuration of the information processing system related to this disclosure. [Figure 4] This diagram illustrates an example of a problem setting handled by the information processing system related to this disclosure. [Figure 5] A flowchart illustrating an example of the processing flow in the information processing system related to this disclosure. [Figure 6] This diagram illustrates an example of processing in the information processing system related to this disclosure. [Figure 7] This diagram illustrates an example of processing in the information processing system related to this disclosure. [Figure 8] This diagram illustrates an example of processing in the information processing system related to this disclosure. [Figure 9] This is a block diagram showing the configuration of a computer that functions as an information processing device related to this disclosure. [Modes for carrying out the invention]
[0011] The following are examples of embodiments of the present invention. However, the present invention is not limited to the exemplary embodiments shown below, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining some or all of the technologies (things or methods) employed in each of the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, embodiments obtained by appropriately omitting some of the technologies employed in each of the exemplary embodiments shown below may also be included in the scope of the present invention. In addition, the effects mentioned in each of the exemplary embodiments shown below are examples of effects that can be expected in that exemplary embodiment and do not define the scope of the present invention. That is, embodiments that do not produce the effects mentioned in each of the exemplary embodiments shown below may also be included in the scope of the present invention.
[0012] [First Embodiment] A first exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. This exemplary embodiment is the basic form for each of the exemplary embodiments described later. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems occur. Furthermore, each technology shown in the drawings referenced to explain this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems occur.
[0013] (Configuration of Information Processing Device 1) The configuration of the information processing device 1 according to this exemplary embodiment will be described with reference to Figure 1. Figure 1 is a block diagram showing the configuration of the information processing device 1. As shown in Figure 1, the information processing device 1 includes a first acquisition unit 11, a second acquisition unit 12, and an estimation unit 13.
[0014] (First acquisition section 11) The first acquisition unit 11 acquires evaluation information for at least one of one or more language models. Here, the one or more language models may include, for example, a large language model (LLM) that has been pre-trained. Furthermore, the evaluation information includes, for example, the results of evaluating the language model using at least one of one or more problems. More specifically, the evaluation information includes, for example, the results of having the language model solve at least one of one or more problems. Therefore, the first acquisition unit 11 can be described as acquiring evaluation information for at least one of one or more language models, and evaluation information for that language model with respect to at least one of one or more problems.
[0015] Furthermore, the evaluation information includes, as an example, the score in an evaluated pair, which is a pair of at least one of the one or more language models and the problem solved by that language model. However, this is not limited to this exemplary embodiment. The evaluation information can also be expressed as evaluation information for the problem relating to the language model that solved the problem.
[0016] (Second acquisition section 12) The second acquisition unit 12 acquires metadata relating to at least one of the one or more language models. Here, metadata relating to the language model refers, for example, to information other than the evaluation information relating to the language model. The second acquisition unit 12 acquires metadata for language models for which the evaluation information has not been obtained. The second acquisition unit 12 may also acquire metadata for language models for which the evaluation information has been obtained.
[0017] Note that specific examples of language model metadata are not limited to this exemplary embodiment, but as an example, • Name of the language model in question • Parameters referenced by the language model • The dataset used to train the language model in question. • Developer of the language model in question • Development period of the language model in question • Architecture of the language model in question • History of model merging related to the language model in question • History of additional training for the language model in question This may include any of the following information. Here, the history of the additional training may include relational information such as, for example, "one model was fine-tuned to become another model." The history of model merging and the history of additional training for the language model are sometimes collectively referred to as the history of the language model.
[0018] (Estimation part 13) The estimation unit 13 performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information acquired by the first acquisition unit 11 and the language model metadata acquired by the second acquisition unit. As an example of this performance estimation processing, the estimation unit 13 estimates the score in an unevaluated pair, which is a pair of one of the one or more language models and a problem that the language model has not yet solved.
[0019] Furthermore, the estimation unit 13 may, as an example, be configured to estimate the score of the unevaluated pair by referring to a first loss corresponding to the score of the evaluated pair and a second loss (also called a constraint term) corresponding to the metadata acquired by the second acquisition unit 12. Also, in calculating the second loss, the estimation unit 13 ·Referring to the metadata, calculate the similarity between the multiple language models. • The second loss is calculated using the calculated similarity. Such a configuration may also be adopted. However, these examples are not intended to limit the present exemplary embodiments.
[0020] (Effects of Information Processing Device 1) As described above, in the information processing device 1, • Evaluation information for at least one of one or more language models, and obtaining evaluation information for said language model relating to at least one of one or more problems, - Obtain metadata relating to at least one of the one or more language models described above, - Perform performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. This configuration is adopted. In this way, the information processing device 1 acquires metadata about at least one of the one or more language models and performs performance estimation processing for at least one of the one or more language models by referring to the metadata. Therefore, performance estimation can be suitably performed even for language models for which evaluation information has not been obtained.
[0021] (Information processing method S1 flow) Next, the flow of the information processing method S1 according to this exemplary embodiment will be explained with reference to Figure 2. Figure 2 is a flowchart showing the flow of the information processing method S1. As shown in Figure 2, the information processing method S1 includes a step (process) S11 for acquiring evaluation information, a step (process) S12 for acquiring metadata, and a step (process) S13 for estimating the performance of the language model.
[0022] (Step S11) In step S11, the first acquisition unit 11 acquires evaluation information for at least one of one or more language models, which is evaluation information for at least one of one or more problems. A more detailed explanation of the first acquisition unit 11 has been given above, so it will not be explained here.
[0023] (Step S12) In step S12, the second acquisition unit 12 acquires metadata relating to at least one of the one or more language models. A more detailed explanation of the second acquisition unit 12 has been given above, so it will not be explained here.
[0024] (Step S13) In step S13, the estimation unit 13 performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information acquired by the first acquisition unit 11 in step S11 and the language model metadata acquired by the second acquisition unit in step S12. A more detailed explanation of the estimation unit 13 has been given above, so it will be omitted here.
[0025] (Effects of information processing method S1) As described above, in the information processing method S1, • Evaluation information for at least one of one or more language models, and obtaining evaluation information for said language model relating to at least one of one or more problems, - Obtain metadata relating to at least one of the one or more language models described above, - Perform performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. This configuration is employed. According to the above configuration, the same effect as that of the information processing device 1 is achieved.
[0026] [Second Embodiment] A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same function as those described in the above-described exemplary embodiment are denoted by the same reference numerals, and their descriptions are omitted as appropriate. The scope of application of each technology adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technology adopted in this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise. Furthermore, each technology shown in the drawings referenced to describe this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure, to the extent that no particular technical problems arise.
[0027] (Configuration of Information Processing System 100A) The configuration of the information processing system 100A according to this exemplary embodiment will be described with reference to Figure 3. Figure 3 is a block diagram showing the configuration of the information processing system 100A. As shown in Figure 3, the information processing system 100A comprises an information processing device 1A and a server device 60 connected to the information processing device 1A via a network N. Here, the specific configuration of the network N is not limited to this exemplary embodiment, but as an example, a wireless LAN (Local Area Network), a wired LAN, a WAN (Wide Area Network), a public telephone network, a mobile data communication network, or a combination of these networks can be used.
[0028] (Server device 60) As shown in Figure 3, the server device 60 comprises a control unit 61, a storage unit 62, and a communication unit 63. The communication unit 63 communicates with devices outside the server device 60. For example, the communication unit 63 communicates with the information processing device 1A of the information processing system 100A. The communication unit 63 transmits data supplied from the control unit 61 to the information processing device 1A and supplies data received from the information processing device 1A to the control unit 61. The data that the communication unit 63 receives from the information processing device 1A may include a set of problems provided by the information processing device 1A. Furthermore, the data that the communication unit 63 provides to the information processing device 1A may include the results of solving at least one of the problems included in the set of problems using at least one of the language models included in the language model group LLMG, which will be described later.
[0029] The memory unit 62 stores a group of language models LLMG, which includes one or more language models. For example, the memory unit 62 stores multiple parameters that define the one or more language models. These parameters are, for example, parameters that have been pre-learned by machine learning (parameters that have undergone update processing by machine learning), but this is not limited to this exemplary embodiment. Furthermore, a large-scale language model that has been trained by machine learning can be used as the language model.
[0030] The control unit 61 acquires information generated by the language model by using the language model. For example, the control unit 61 inputs a prompt containing a problem received from the information processing device 1A to the language model and acquires the result of the language model solving the problem. It also provides this result to the information processing device 1A via the communication unit 63.
[0031] In this exemplary embodiment, the server device 60 is shown as a separate device from the information processing device 1A, but this does not limit this exemplary embodiment. The control unit 61 of the server device 60, or the function of the language model execution unit in the control unit 61, may be provided by the control unit of the information processing device 1A. Similarly, the language model group LLMG stored in the storage unit 62 of the server device 60 may be stored in the storage unit of the information processing device 1A, and the information processing device 1A itself may be able to execute the language model group LLMG.
[0032] (Configuration of Information Processing Device 1A) Next, the configuration of the information processing device 1A according to this exemplary embodiment will be described with reference to Figure 3. As shown in Figure 3, the information processing device 1A includes a control unit 10, a storage unit 20, a communication unit 30, and an input / output unit 40.
[0033] (Communications Section 30) The communication unit 30 communicates with devices outside the information processing device 1A. For example, the communication unit 30 communicates with the server device 60. The communication unit 30 transmits data supplied from the control unit 10 to the server device 60 and supplies data received from the server device 60 to the control unit 10. The data that the communication unit 30 transmits to the server device 60 includes a set of problems to be solved by at least one of the one or more language models included in the language model group LLMG described above. The data that the communication unit 30 receives from the server device 60 may include the results of solving at least one of the one or more problems included in the set of problems by at least one of the one or more language models included in the language model group LLMG.
[0034] (Input / output section 40) The input / output unit 40 is configured to include at least one of the following input / output devices: a keyboard, mouse, display, printer, touch panel, etc. Alternatively, the input / output unit 40 may be configured to have input / output devices such as a keyboard, mouse, display, printer, touch panel, etc. connected to it. In this configuration, the input / output unit 40 receives various types of information from the connected input device to the information processing device 1A. The input / output unit 40 also outputs various types of information to the connected output device under the control of the control unit 10. An interface such as USB (Universal Serial Bus) can be used as an example of the input / output unit 40.
[0035] (Storage unit 20) The storage unit 20 stores various data referenced by the control unit 10, as well as various data generated by the control unit 10. For example, the storage unit 20 stores: • Evaluation Information EI • LLM metadata MIL • Metadata for the problem (MIP) LLM features FL • Problem features FP • Output information OUT The following are stored. Here, the evaluation information EI includes, as an example, the results of evaluating at least one of one or more language models using at least one of one or more problems. More specifically, the evaluation information EI includes, as an example, the results of having the language model solve at least one of one or more problems. Specific examples of the evaluation information EI will be described later.
[0036] The LLM metadata MIL is information acquired by the second acquisition unit 12, which will be described later, and is metadata relating to at least one of the language models of one or more language models. Specific examples of the LLM metadata MIL will be described later.
[0037] The metadata MIP for the problem is information acquired by the third acquisition unit 14, which will be described later, and is metadata relating to at least one of one or more problems. Specific examples of the metadata MIP for the problem will be described later.
[0038] The LLM feature vector FL and the problem feature vector FP are features calculated by the feature calculation unit 133, which will be described later. Specific examples of the LLM feature vector FL and the problem feature vector FP will be described later.
[0039] The output information OUT is information generated by the output information generation unit 15, which will be described later, and includes performance estimation results for at least one of one or more language models. Specific examples of the output information OUT will be described later.
[0040] (Example of a problem setting handled by the information processing device 1A) Before providing a more detailed explanation of the information processing device 1A, we will refer to Figure 4 to describe an example of a problem setting that the information processing device 1A can handle. Figure 4 is a schematic diagram illustrating an example of a problem setting that the information processing device 1A can handle.
[0041] As shown in Figure 4, in this example, at least one of the problems (x1~x5, x1'~x3') included in the problem set is solved by at least one of the language models (m1~m4, m1'~m2') included in the language model set LLMG (also called the LLM set). Here, in the example shown in Figure 4, the score in the evaluated pair, which is a pair of a language model and the problem that the language model solved, is: • If the language model's solution to the problem is correct, then 1 • If the language model's solution to the problem is incorrect, the result is 0. A score is assigned to each pair. In the example shown in Figure 4, the evaluated pair (m1, x1) is assigned a score of 1, and the evaluated pair (m1, x4) is assigned a score of 0. The score of the evaluated pair is an example of the evaluation information EI described above. The calculation of the score may be performed by the control unit 61 of the server device 60 described above, or by the control unit 10 of the information processing device 1A. Note that specific examples of the score are not limited to this exemplary embodiment. For example, the score may be a continuous value.
[0042] On the other hand, some problems in the set are solved by one language model but not by others. For example, problem x4 is solved by language models m1, m2, and m4, but not by language model m3. Also, some problems in the set are not solved by any language model. For example, problems x1' to x3' are not solved by any language model. Pairs of such problems and problems that have not been solved are called unevaluated pairs. As an example, the pair (m2, x5) is an unevaluated pair.
[0043] Furthermore, within the LLMG language model group, there are language models that have not solved any of the problems. In the example in Figure 4, language models m1' to m2' have not solved any of the problems. Pairs (m1', x5) and (m2', x2') are also examples of unevaluated pairs.
[0044] The information processing device 1A according to this exemplary embodiment performs a process to estimate the score for the unevaluated pair described above, as an example of the performance estimation process for at least one of the language models included in the language model group LLMG.
[0045] (Control Unit 10) Returning to Figure 3, the configuration of the control unit 10 of the information processing device 1A will be described. As shown in Figure 3, the control unit 10 includes a first acquisition unit 11, a second acquisition unit 12, a third acquisition unit 14, an estimation unit 13, and an output information generation unit 15.
[0046] (First acquisition section 11) The first acquisition unit 11 acquires evaluation information EI of at least one of one or more language models, relating to at least one of one or more problems. Here, as described above, the evaluation information EI includes the score in the evaluated pair, which is a pair of at least one of the one or more language models and the problem that the language model solved. Specific examples of evaluation information and evaluated pairs have been described above and will not be explained here. Note that the evaluation information EI can also be expressed as evaluation information for the problem relating to the language model that solved the problem.
[0047] (Second acquisition section 12) The second acquisition unit 12 acquires metadata MIL for at least one of the one or more language models. Here, metadata MIL for the language model refers, for example, to information other than the evaluation information EI for that language model. The second acquisition unit 12 acquires metadata MIL for language models for which the evaluation information EI has not been obtained. The second acquisition unit 12 may also acquire metadata MIL for language models for which the evaluation information EI has been obtained.
[0048] Note that specific examples of the language model metadata MIL are not limited to this exemplary embodiment, but as with exemplary embodiment 1, as an example, • Name of the language model in question • Parameters referenced by the language model • The dataset used to train the language model in question. • Developer of the language model in question • Development period of the language model in question • Architecture of the language model in question • History of model merging related to the language model in question • History of additional training for the language model in question This may include any of the following information. Here, the history of the additional training may include relational information such as, for example, "one model was fine-tuned to become another model." The history of model merging and the history of additional training for the language model are sometimes collectively referred to as the history of the language model.
[0049] (Third acquisition section 14) The third acquisition unit 14 acquires metadata MIP for at least one of the one or more problems. Here, metadata MIP for a problem refers, for example, to information other than the evaluation information EI for that problem. The second acquisition unit 12 acquires metadata MIP for problems for which the evaluation information EI has not been obtained. The second acquisition unit 12 may also acquire metadata MIP for problems for which the evaluation information EI has been obtained.
[0050] Note that the specific example of the metadata MIP in question is not limited to this exemplary embodiment, but as an example, • The text of the problem in question (also called the problem statement or prompt statement) • Source of information regarding the issue • Creator of the problem • Date and time the question was created It may contain any of the following information.
[0051] (Estimation part 13) The estimation unit 13 performs performance estimation processing for any of the language models included in the language model group LLMG by referring to the evaluation information EI acquired by the first acquisition unit 11 and the metadata MIL of at least one of the language models included in the language model group LLMG. Here, in this estimation processing, the estimation unit 13 may further refer to the metadata MIP of one or more problems acquired by the third acquisition unit 14. As shown in Figure 3, the estimation unit 13 includes, as an example, an LLM similarity calculation unit 131, a problem similarity calculation unit 132, a feature calculation unit 133, and an estimated score calculation unit 134. The processing in each of these units will be described later with reference to different diagrams.
[0052] (Output information generation unit 15) The output information generation unit 15 generates output information OUT, which includes performance estimation results for at least one of the one or more language models derived by the estimation unit 13. The output information may also include at least one of the language model metadata MIL acquired by the second acquisition unit 12 and the problem metadata MIP acquired by the third acquisition unit 14. The specifics of the output information OUT generated by the output information generation unit 15 will be described later.
[0053] (Example of processing by information processing device 1A) Next, we will explain an example of processing by the information processing device 1A with reference to Figures 5 to 8. Figure 5 is a flowchart showing the processing flow by the information processing device 1A.
[0054] (Step S11) In step S11, the first acquisition unit 11 collects one or more evaluated pairs as evaluation information EI for one or more language models included in the language model group LLMG. More specifically, the first acquisition unit 11 acquires evaluation information EI including the scores of the one or more evaluated pairs.
[0055] (Step S12) In step S12, the second acquisition unit 12 acquires the metadata MIL of at least one of the one or more language models included in the language model group LLMG.
[0056] (Step S14) In step S14, the third acquisition unit 14 acquires metadata MIP for at least one of the one or more problems included in the problem set.
[0057] (Step S131) Next, in step S131, the LLM similarity calculation unit 131 calculates the LLM similarity by referring to the language model metadata MIL obtained in step S12. The upper part of Figure 6 shows an example of LLM similarity calculation by the LLM similarity calculation unit 131. In the example shown in the upper part of Figure 6, the process refers to history information regarding fine-tuning (FT) and model merging of the language model as the language model metadata MIL.
[0058] The LLM similarity calculation unit 131, as an example, • Weighting coefficient after fine tuning: 0.8 • Weight coefficient when merged: 0.5 As shown above, weight coefficients are set according to the history of the models, and these weight coefficients are used as the similarity between models. Furthermore, for models with a history of multiple generations, the similarity between models is calculated by multiplying the weight coefficients of the multiple generations.
[0059] For example, the relationship between model m1 and model m1' is fine-tuning, so a weight coefficient of 0.8 is set, and this is used as the similarity between model m1 and model m1'. Similarly, the relationship between model m1' and model m2' is merge, so a weight coefficient of 0.5 is set, and this is used as the similarity between model m1' and model m2'. Furthermore, the LLM inter-similarity calculation unit 131 calculates 0.8 × 0.5 = 0.4 as the similarity between model m1 and model m2', and this is used as the similarity between model m1 and model m2'.
[0060] By performing this process, the LLM inter-similarity calculation unit 131 calculates the similarity between LLM(k) and LLM(l) as the kl component (K (M) kl LLM similarity matrix K ) (M) The result is calculated. Here, k and l are indices used to distinguish multiple language models from one another.
[0061] The above examples of weight coefficient settings are not limited to this exemplary embodiment. For example, a configuration using different weight coefficients depending on the type of fine tuning or merging may be used.
[0062] Furthermore, the processing example of the LLM similarity calculation unit 131 in this step is not limited to the example described above. The LLM similarity calculation unit 131 performs the following: • Similarity of language model parameters and task parameters, and • The internal state of the language model when it solves a problem The configuration may be such that the LLM similarity is calculated according to at least one of the above.
[0063] (Step S132) Next, in step S132, the problem-inter-problem similarity calculation unit 132 calculates the problem-inter-problem similarity by referring to the problem metadata MIP obtained in step S14. The lower part of Figure 6 shows an example of problem-inter-problem similarity calculation by the problem-inter-problem similarity calculation unit 132. In the example shown in the lower part of Figure 6, the process refers to the problem statement of the problem as the problem metadata MIP. More specifically, the problem-inter-problem similarity calculation unit 132 calculates the problem-inter-problem similarity by referring to the problem metadata MIP of the problem. The problem statement for problem (k), "Summarize the following sentence...", is input into the sentence embedding model. • The embedding vector output by the sentence embedding model: [0.1,···,0.8] T Get The process is as follows: Furthermore, the problem similarity calculation unit 132 performs the following: • Problem statement for problem (l): "Function f(x) = x 2 "Find the derivative of +3x+5," input this into the sentence embedding model. • The embedding vector output by the sentence embedding model in question: [0.6,···,0.2] T Get The following process is performed. Here, k and l are indices for distinguishing multiple problems from each other. Then, the problem similarity calculation unit 132 performs the following: • The above embedding vector [0.1,···,0.8] T and [0.6,···,0.2]T Calculate the cosine similarity with · Use the calculated cosine similarity as the similarity between problem (k) and problem (l). Perform the following processing.
[0064] By performing such processing, the inter - problem similarity calculation unit 132 calculates the inter - problem similarity matrix K that has the similarity between problem (k) and problem (l) as the kl component (K (X) kl ) (X) Calculate.
[0065] Note that the processing example of the inter - problem similarity calculation unit 132 in this step is not limited to the above example. The inter - problem similarity calculation unit 132 may be configured to calculate the inter - problem similarity according to at least one of · The inter - problem similarity using the sentence embedding model, where the index other than the above - mentioned cosine similarity is used for the inter - problem similarity · Re - ranking using the generative model · Manually determined similarity etc.
[0066] (Step S133) Subsequently, in step S133, the feature calculation unit 133 · The evaluated pairs collected in step S11 · The LLM - to - LLM similarity calculated in step S131 · The inter - problem similarity calculated in step S132 Refer to and calculate the feature amount FP of each problem included in the problem group and the feature amount FL of each language model included in the language model group LLMG.
[0067] FIG. 7 shows an example of the feature calculation process in this step. As shown in FIG. 7, step S133 includes, as an example, a step S1331 of calculating a loss function and a step S1332 of calculating a feature amount by referring to the loss function. In the following description, the relationship between the feature amount FL (LLM feature amount) of the language model, the feature amount FP of the problem, and the score is
number
number
[0068] (Step S1331) In step S1331, the feature calculation unit 133 calculates the loss function L
number
number
number
number
number
[0069] Here, the above constraint term L X The derivation of can be explained in detail as follows. First, the feature calculation unit 133 calculates the similarity matrix K (X) Adjacent matrix A (X) Convert to the adjacency matrix A. (X) This is called a sparse matrix A (X) It is sometimes called this. Here, the similarity matrix K (X) Component K (X) kl As mentioned above, this represents the similarity between problem (k) and problem (l). As this similarity, as mentioned above, the cosine similarity obtained when the problem statement (prompt) is converted into a sentence embedding vector can be used.
[0070] On the other hand, adjacency matrix A (X) The ingredients are,
number
[0071] Then, the feature calculation unit 133 calculates the feature x between adjacent problems. k and x l A loss term (loss function) is set such that the difference between the two is small. As an example, the feature calculation unit 133 sets the loss term L X
number
[0072] If we rearrange equation 9 above,
number
[0073] Furthermore, the above constraint term L M The derivation of can be explained in detail as follows. First, the feature calculation unit 133 calculates the similarity matrix K (M) Adjacent matrix A (M) Convert to the adjacency matrix A. (M) This is called a sparse matrix A (M) It is sometimes called this. Here, the similarity matrix K (M) Component K (M) kl As mentioned above, this represents the similarity between language model (k) and language model (l). As an example of this similarity, as mentioned above, the history information of the language models can be referenced, and weight coefficients corresponding to the history of each model can be used as the similarity between the models.
[0074] On the other hand, adjacency matrix A (M) For example, the ingredients are:
number
[0075] Then, the feature calculation unit 133 calculates the feature m between adjacent language models. k and m l A loss term (loss function) is set such that the difference between the two is small. As an example, the feature calculation unit 133 sets the loss term L M
number
[0076] By rearranging equation 12 above,
number
[0077] (Step S1332) Then, in step S1332, the feature calculation unit 133 refers to the scores of the evaluated pairs and the loss function L (Equation 3) calculated in step S1331, and calculates the features for each problem and the features for each language model so that the loss function L becomes smaller.
[0078] (Step S134) Then, in step S134, the estimated score calculation unit 134 calculates the score of the unrated pair using the feature quantities of each problem calculated in step S133 and the feature quantities of each language model. More specifically, the feature quantities of the language model in the unrated pair are calculated using m i And the feature x of the problem in the unevaluated pair. i Using this, the score z of the unevaluated pair ik of,
number
[0079] (Step S135) Then, in step S135, the output information generation unit 15 generates output information OUT using the performance estimation results, including the scores of the unevaluated pairs calculated in step S134. Figure 8 shows an example of output information OUT as visually presented via the display of the input / output unit 40. As shown in Figure 8, the output information OUT includes the scores of each unevaluated pair as performance estimation results for each language model.
[0080] Furthermore, as shown in Figure 8, the output information OUT includes: • MIL metadata for the language model used for performance estimation • Metadata of the problem used for performance estimation (MIP) The configuration may also include the following.
[0081] (Effects of Information Processing Device 1A) As described above, in the information processing device 1A, - Evaluation information EI of at least one of one or more language models, and obtaining the evaluation information EI of said language model with respect to at least one of one or more problems, - Obtain metadata MIL for at least one of the one or more language models mentioned above, - Perform performance estimation processing for at least one of the one or more language models by referring to the evaluation information EI and the metadata MIL. This configuration is adopted. In this way, the information processing device 1A acquires metadata MIL for at least one of the one or more language models and performs performance estimation processing for at least one of the one or more language models by referring to the metadata MIL, so that performance estimation can be suitably performed even for language models for which evaluation information EI has not been obtained. As an example, as shown in Figure 8, the information processing device 1A can suitably perform performance estimation even for language models (m1'~m2') for which none of the problems have been solved.
[0082] Furthermore, in the information processing device 1A, Further metadata is obtained regarding at least one of the above one or more problems, • Further referencing metadata related to the aforementioned problem, the performance estimation process is performed. This configuration is adopted. As a result, the information processing device 1A can suitably estimate performance even for problems for which evaluation information EI is not available. For example, as shown in Figure 8, the information processing device 1A can suitably estimate performance even for problems (x1'~x3') that have not been solved by any language model.
[0083] Furthermore, in the information processing device 1A, in the performance estimation process described above, • First loss L based on the score in the evaluated pair BCEAnd a second loss L corresponding to the language model's metadata MIL. M Refer to For each of one or more language models, the feature vector FL of the language model and For each of one or more problems, the feature vectors (FP) of that problem and Calculate, The language model score for an unrated pair is estimated by referencing the language model feature vector (FL) and the problem feature vector (FP) for that unrated pair. Because this configuration is adopted, it is possible to achieve the various effects mentioned above while keeping the increase in computational costs down.
[0084] The following describes an application example of the information processing system 100A. In this example, the language model group LLMG is used. • Language Model A • Language Model B • Language Model C Consider the case where the server device 60 receives an instruction via the communication unit 63 from the information processing device 1A or another device to solve a set of problems X, which includes one or more problems. Here, these language models may include language models for which evaluation information has not been obtained.
[0085] In this case, the server device 60 communicates with the information processing device 1A, Instructions to evaluate which of the above language models is suitable for problem set X. • Some of the problems included in problem set X and The information is sent to the information processing device 1A. Upon receiving the instruction, the information processing device 1A obtains the scores of the evaluated pairs of language models A, B, and C, and metadata for language models A, B, and C. Here, the scores may be calculated by the information processing device 1A, or they may be calculated by another device.
[0086] The information processing device 1A then performs the performance evaluation process described above using problem set X', which includes some of the problems in problem set X, to estimate the performance of each of language models A, B, and C (calculating the estimated score for unevaluated pairs). The information processing device 1A then provides the performance estimation results to the server device 60. The server device 60 refers to the performance estimation results and selects the language model from language models A, B, and C that is most suitable for problem set X. Then, it executes the process of actually solving problem set X using the selected language model.
[0087] [Examples of implementation using software] Some or all of the functions of the information processing devices 1,1A (hereinafter also referred to as "each of the above devices") may be implemented by hardware such as integrated circuits (IC chips) or by software.
[0088] In the latter case, each of the above devices is implemented, for example, by a computer that executes instructions for a program, which is software that realizes each function. An example of such a computer (hereinafter referred to as Computer C) is shown in Figure 9. Figure 9 is a block diagram showing the hardware configuration of Computer C, which functions as each of the above devices.
[0089] Computer C comprises at least one processor C1 and at least one memory C2. Memory C2 stores a program P that causes computer C to operate as each of the above-mentioned devices. In computer C, processor C1 reads program P from memory C2 and executes it, thereby realizing each of the above-mentioned devices.
[0090] For processor C1, for example, a CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating Point Number Processing Unit), PPU (Physics Processing Unit), TPU (Tensor Processing Unit), quantum processor, microcontroller, or a combination thereof can be used. For memory C2, for example, flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.
[0091] Computer C may also be equipped with RAM (Random Access Memory) for loading program P at runtime and for temporarily storing various data. Furthermore, computer C may be equipped with communication interfaces for sending and receiving data with other devices. Additionally, computer C may be equipped with input / output interfaces for connecting input / output devices such as keyboards, mice, displays, and printers.
[0092] Furthermore, program P can be recorded on a non-temporary, tangible recording medium M that is readable by computer C. Such a recording medium M could be, for example, tape, disk, card, semiconductor memory, or programmable logic circuitry. Computer C can acquire program P via such a recording medium M. Program P can also be transmitted via a transmission medium. Such a transmission medium could be, for example, a communication network or broadcast waves. Computer C can also acquire program P via such a transmission medium.
[0093] Furthermore, each of the above functions of each of the above devices may be implemented by a single processor in a single computer, by multiple processors in a single computer working together, or by multiple processors in each of multiple computers working together. In addition, the programs for implementing each of the above functions in each of the above devices may be stored in a single memory in a single computer, distributed and stored in multiple memories in a single computer, or distributed and stored in multiple memories in each of multiple computers.
[0094] [Additional Note A] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0095] (Note A1) Evaluation information of at least one of one or more language models, and a first acquisition means for acquiring evaluation information of the language model relating to at least one of one or more problems, A second acquisition means for acquiring metadata relating to at least one of the one or more language models, Estimation means that performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. An information processing device equipped with the following features.
[0096] (Appendix A2) The aforementioned evaluation information includes: The scores in the evaluated pairs, which are pairs of at least one of the one or more language models and the problem that the language model was made to solve, are included. The estimation means is, The score is estimated for an unevaluated pair, which is a pair of one or more of the aforementioned language models and a problem that the language model has not yet solved. The information processing device described in Appendix A1.
[0097] (Note A3) The estimation means is, The score in the unevaluated pair is estimated by referring to a first loss corresponding to the score in the evaluated pair and a second loss corresponding to the metadata. The information processing device described in Appendix A2.
[0098] (Note A4) The estimation means is, By referring to the aforementioned metadata, the similarity between the multiple language models is calculated. The second loss is calculated using the calculated similarity score. The information processing device described in Appendix A3.
[0099] (Note A5) The estimation means is, Referencing the first loss and the second loss, For each of the one or more language models, the feature quantities of the language model and, For each of the one or more problems mentioned above, the feature quantities of the problem and Calculate, The score of the language model in the unevaluated pair is estimated by referring to the features of the language model in the unevaluated pair and the features of the problem in the unevaluated pair. The information processing device described in Appendix A3 or A4.
[0100] (Note A6) The metadata relating to the language model includes the history information of the language model. An information processing device as described in any one of the appendices A1 to A5.
[0101] (Note A7) The system includes output information generation means that generates output information including the estimation results from the estimation means and the history information. The information processing device described in Appendix A6.
[0102] (Note A8) The acquisition means further comprises a third acquisition means for acquiring metadata relating to at least one of the one or more of the aforementioned problems. The estimation means further refers to metadata related to the problem and performs the performance estimation process. An information processing device as described in any one of the appendices A1 to A7.
[0103] (Note A9) The metadata related to the aforementioned problem includes the problem statement for that problem. The information processing device described in Appendix A8.
[0104] [Additional Notes B] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0105] (Note B1) A first acquisition process in which at least one processor acquires evaluation information of at least one of one or more language models relating to at least one of one or more problems, The at least one processor performs a second acquisition process to acquire metadata relating to at least one of the language models of the one or more language models, The at least one processor performs an estimation process that performs a performance estimation process relating to at least one of the one or more language models by referring to the evaluation information and the metadata. An information processing method that includes this.
[0106] (Note B2) The aforementioned evaluation information includes: The scores in the evaluated pairs, which are pairs of at least one of the one or more language models and the problem that the language model was made to solve, are included. In the estimation process described above, the at least one processor, The score is estimated for an unevaluated pair, which is a pair of one or more of the aforementioned language models and a problem that the language model has not yet solved. The information processing method described in Appendix B1.
[0107] (Note B3) In the estimation process described above, the at least one processor, The score in the unevaluated pair is estimated by referring to a first loss corresponding to the score in the evaluated pair and a second loss corresponding to the metadata. The information processing method described in Appendix B2.
[0108] (Note B4) In the estimation process described above, the at least one processor, By referring to the aforementioned metadata, the similarity between the multiple language models is calculated. The second loss is calculated using the calculated similarity score. The information processing method described in Appendix B3.
[0109] (Note B5) In the estimation process described above, the at least one processor, Referencing the first loss and the second loss, For each of the one or more language models, the feature quantities of the language model and, For each of the one or more problems mentioned above, the feature quantities of the problem and Calculate, The score of the language model in the unevaluated pair is estimated by referring to the features of the language model in the unevaluated pair and the features of the problem in the unevaluated pair. The information processing method described in Appendix B3 or B4.
[0110] (Note B6) The metadata relating to the language model includes the history information of the language model. The information processing method described in any one of the appendices B1 through B5.
[0111] (Note B7) The at least one processor includes an output information generation process that generates output information including the estimation result from the estimation process and the history information. The information processing method described in Appendix B6.
[0112] (Note B8) The at least one processor further includes a third acquisition process for acquiring metadata relating to at least one of the one or more problems, The at least one processor performs the performance estimation process by further referring to metadata relating to the problem. The information processing method described in any one of the appendices B1 through B7.
[0113] (Note B9) The metadata related to the aforementioned problem includes the problem statement for that problem. The information processing method described in Appendix B8.
[0114] [Additional Note C] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0115] (Note C1) A program that makes a computer function as an information processing device. The aforementioned computer, Evaluation information of at least one of one or more language models, and a first acquisition means for acquiring evaluation information of the language model relating to at least one of one or more problems, A second acquisition means for acquiring metadata relating to at least one of the one or more language models, Estimation means that performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. An information processing program that functions as such.
[0116] (Note C2) The aforementioned evaluation information includes: The scores in the evaluated pairs, which are pairs of at least one of the one or more language models and the problem that the language model was made to solve, are included. The estimation means is, The score is estimated for an unevaluated pair, which is a pair of one or more of the aforementioned language models and a problem that the language model has not yet solved. The information processing program described in Appendix C1.
[0117] (Note C3) The estimation means is, The score in the unevaluated pair is estimated by referring to a first loss corresponding to the score in the evaluated pair and a second loss corresponding to the metadata. The information processing program described in Appendix C2.
[0118] (Note C4) The estimation means is, By referring to the aforementioned metadata, the similarity between the multiple language models is calculated. The second loss is calculated using the calculated similarity score. The information processing program described in Appendix C3.
[0119] (Note C5) The estimation means is, Referencing the first loss and the second loss, For each of the one or more language models, the feature quantities of the language model and, For each of the one or more problems mentioned above, the feature quantities of the problem and Calculate, The score of the language model in the unevaluated pair is estimated by referring to the features of the language model in the unevaluated pair and the features of the problem in the unevaluated pair. The information processing program described in Appendix C3 or C4.
[0120] (Appendix C6) The metadata relating to the language model includes the history information of the language model. An information processing program described in any one of the appendices C1 to C5.
[0121] (Note C7) The aforementioned computer, This function functions as an output information generation process that generates output information including the estimation results from the estimation means and the history information. The information processing program described in Appendix C6.
[0122] (Note C8) The aforementioned computer, The acquisition means is further configured to function as a third acquisition means for acquiring metadata relating to at least one of the one or more of the aforementioned problems. The estimation means further refers to metadata related to the problem and performs the performance estimation process. An information processing program described in any one of the appendices C1 through C7.
[0123] (Note C9) The metadata related to the aforementioned problem includes the problem statement for that problem. The information processing program described in Appendix C8.
[0124] [Additional Note D] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0125] (Note D1) It comprises at least one processor, and the at least one processor is Evaluation information of at least one of one or more language models, a first acquisition process for acquiring evaluation information of said language model relating to at least one of one or more problems, A second acquisition process for acquiring metadata relating to at least one of the one or more language models, An estimation process that performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. An information processing device that performs the following actions.
[0126] The information processing device may also include memory. Furthermore, the memory may store a program that causes at least one processor to execute each of the aforementioned processes.
[0127] (Note D2) The aforementioned evaluation information includes: The scores in the evaluated pairs, which are pairs of at least one of the one or more language models and the problem that the language model was made to solve, are included. In the estimation process described above, the at least one processor, The score is estimated for an unevaluated pair, which is a pair of one or more of the aforementioned language models and a problem that the language model has not yet solved. The information processing device described in Appendix D1.
[0128] (Note D3) In the estimation process described above, the at least one processor, The score in the unevaluated pair is estimated by referring to a first loss corresponding to the score in the evaluated pair and a second loss corresponding to the metadata. The information processing device described in Appendix D2.
[0129] (Note D4) In the estimation process described above, the at least one processor, By referring to the aforementioned metadata, the similarity between the multiple language models is calculated. The second loss is calculated using the calculated similarity score. The information processing device described in Appendix D3.
[0130] (Note D5) In the estimation process described above, the at least one processor, Referencing the first loss and the second loss, For each of the one or more language models, the feature quantities of the language model and, For each of the one or more problems mentioned above, the feature quantities of the problem and Calculate, The score of the language model in the unevaluated pair is estimated by referring to the features of the language model in the unevaluated pair and the features of the problem in the unevaluated pair. The information processing device described in Appendix D3 or D4.
[0131] (Note D6) The metadata relating to the language model includes the history information of the language model. An information processing device as described in any one of the appendices D1 to D5.
[0132] (Note D7) The aforementioned at least one processor is An output information generation process is executed to generate output information that includes the estimation results from the estimation process and the history information. The information processing device described in Appendix D6.
[0133] (Note D8) The aforementioned at least one processor, In the acquisition process, the at least one processor further performs a third acquisition process to acquire metadata relating to at least one of the one or more problems. In the estimation process, the at least one processor further refers to metadata relating to the problem to perform the performance estimation process. An information processing device as described in any one of the appendices D1 to D7.
[0134] (Note D9) The metadata related to the aforementioned problem includes the problem statement for that problem. The information processing device described in Appendix D8.
[0135] [Additional Note E] This disclosure includes the technologies described in the following appendices. However, the present invention is not limited to the technologies described in the following appendices, and various modifications are possible within the scope of the claims.
[0136] (Note E1) A program that makes a computer function as an information processing device. To the aforementioned computer, Evaluation information of at least one of one or more language models, a first acquisition process for acquiring evaluation information of said language model relating to at least one of one or more problems, A second acquisition process for acquiring metadata relating to at least one of the one or more language models, An estimation process that performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. A non-temporary recording medium that stores an information processing program that executes that program. [Explanation of Symbols]
[0137] 1. 1A Information Processing Device 100A Information Processing System 11. First acquisition unit (first acquisition means) 12. Second acquisition unit (second acquisition means) 13 Estimation part (estimation means) 14. Third acquisition unit (third acquisition means) 15 Output Information Generation Unit (Output Information Generation Means)
Claims
1. Evaluation information of at least one of one or more language models, and a first acquisition means for acquiring evaluation information of the language model relating to at least one of one or more problems, A second acquisition means for acquiring metadata relating to at least one of the one or more language models, Estimation means that performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. An information processing device equipped with the following features.
2. The aforementioned evaluation information includes: The scores in the evaluated pairs, which are pairs of at least one of the one or more language models and the problem that the language model was made to solve, are included. The estimation means is, The score is estimated for an unevaluated pair, which is a pair of one or more of the aforementioned language models and a problem that the language model has not yet solved. The information processing apparatus according to claim 1.
3. The estimation means is, The score of the unevaluated pair is estimated by referring to a first loss corresponding to the score of the evaluated pair and a second loss corresponding to the metadata. The information processing apparatus according to claim 2.
4. The estimation means is, By referring to the aforementioned metadata, the similarity between the multiple language models is calculated. The second loss is calculated using the calculated similarity. The information processing apparatus according to claim 3.
5. The estimation means is, Referencing the first loss and the second loss, For each of the one or more language models, the feature quantities of the language model and, For each of the one or more problems mentioned above, the feature quantities of the problem and Calculate, The score of the language model in the unevaluated pair is estimated by referring to the features of the language model in the unevaluated pair and the features of the problem in the unevaluated pair. The information processing apparatus according to claim 3.
6. The metadata relating to the language model includes the history information of the language model. The information processing apparatus according to any one of claims 1 to 5.
7. The system includes output information generation means that generates output information including the estimation results from the estimation means and the history information. The information processing apparatus according to claim 6.
8. The acquisition means further comprises a third acquisition means for acquiring metadata relating to at least one of the one or more of the aforementioned problems. The estimation means further refers to metadata related to the problem and performs the performance estimation process. The information processing apparatus according to any one of claims 1 to 5.
9. One or more processors, Evaluation information for at least one of one or more language models, and obtaining evaluation information for said language model with respect to at least one of one or more problems, Obtain metadata relating to at least one of the one or more language models mentioned above, The performance estimation process for at least one of the one or more language models is performed by referring to the evaluation information and the metadata. An information processing method that includes this.
10. A program that makes a computer function as an information processing device. The aforementioned computer, Evaluation information of at least one of one or more language models, and a first acquisition means for acquiring evaluation information of the language model relating to at least one of one or more problems, A second acquisition means for acquiring metadata relating to at least one of the one or more language models, Estimation means that performs performance estimation processing for at least one of the one or more language models by referring to the evaluation information and the metadata. A program that makes it function as such.