Program, information processing device, and information processing method
The program and device refine task vectors to optimize parameters, reducing computational costs and enhancing model performance for target tasks by generating specialized models with improved inference capabilities.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- KK TOSHIBA
- Filing Date
- 2024-12-03
- Publication Date
- 2026-06-15
AI Technical Summary
Existing methods for fine-tuning base models for specific tasks require high computational costs and result in insufficient performance improvement.
A program and information processing device that refines task vectors by correcting less important parameters to specified values, adjusts coefficients using training data to optimize task vectors, and generates specialized models by combining refined task vectors with base models, reducing computational costs and mitigating catastrophic forgetting.
Generates specialized models with higher inference performance for target tasks while minimizing computational overhead and improving model specialization.
Smart Images

Figure 2026096325000001_ABST
Abstract
Description
[Technical Field] 【0001】 Embodiments of the present invention relate to a program, an information processing device, and an information processing method. [Background technology] 【0002】 One method to improve the task performance of a base model is to fine-tune all of its parameters. However, fine-tuning all parameters for each new dataset, for example, requires extremely high computational costs. 【0003】 A technique using task vectors has been proposed to obtain a model specialized for a specific task (target task, target task). In such a technique, for example, a task vector for the target task is obtained by subtracting parameters (weights, etc.) between a pre-trained model (base model) that has been fine-tuned for the target task and the original model. Then, by performing calculations using the task vector, a specialized model that is specialized for the target task can be obtained with less computational cost. However, the performance improvement of the specialized model obtained with such a technique has been insufficient. [Prior art documents] [Patent Documents] 【0004】 [Patent Document 14] U.S. Patent Application Publication No. 2022 / 0383126 [Non-patent literature] 【0005】 [Non-Patent Document 1] Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi. “Editing Models with Task Arithmetic”. In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023). Kigali, Rwanda.2023. [Overview of the project] [Problems that the invention aims to solve] 【0006】 The present invention aims to provide a program, an information processing device, and an information processing method that can generate a model capable of achieving higher inference performance for a desired task. [Means for solving the problem] 【0007】 The program of the embodiment is a program that causes a computer to perform a refining step, an adjustment step, a task vector generation step, and a model generation step. The refining step performs a refining process that corrects the values of parameters that are less important than other parameters among the multiple parameters included in the task vector to specified values. The adjustment step adjusts coefficients to correct the task vector to an optimal task vector using training data. The task vector generation step generates an optimal task vector by multiplying the task vector by the coefficients. The model generation step generates a specialized model using the base model and the optimal task vector. [Brief explanation of the drawing] 【0008】 [Figure 1] Block diagram of the information processing device of the first embodiment. [Figure 2] A diagram illustrating an example of the relationship between the base model, task vector, and parameter matrix. [Figure 3] A diagram showing an example of a segmentation method. [Figure 4] Flowchart of the model generation process in the first embodiment. [Figure 5] Block diagram of the information processing apparatus according to the second embodiment. [Figure 6] Flowchart of the model generation process in the second embodiment. [Figure 7] Block diagram of the information processing apparatus according to the third embodiment. [Figure 8] Flowchart of the model generation process in the third embodiment. [Figure 9] A diagram showing an example of an operation screen. [Figure 10] Flowchart of the model generation / confirmation process. [Figure 11] Hardware configuration diagram of the information processing apparatus according to the first to third embodiments. 【Best Mode for Carrying Out the Invention】 【0009】 Hereinafter, preferred embodiments of the information processing apparatus according to the present invention will be described in detail with reference to the accompanying drawings. 【0010】 (First Embodiment) The information processing apparatus according to the first embodiment refines the task vector so as to retain important parameters with high importance for the target task. By using the refined task vector, it is possible to remove information that becomes noise for the inference of the target task and generate a model with higher inference performance. 【0011】 FIG. 1 is a block diagram showing an example of the configuration of the information processing apparatus 100 according to the first embodiment. As shown in FIG. 1, the information processing apparatus 100 includes a storage unit 121, a display unit 122, an acquisition unit 101, a segmentation unit 102, a refinement unit 103, an adjustment unit 104, a task vector generation unit 105, a model generation unit 106, and an output control unit 111. 【0012】 The memory unit 121 stores various types of information used in the information processing device. For example, the memory unit 121 stores training data for one or more target tasks. A target task is a task (downstream task) that needs to be solved for a specific domain (also called a field or domain) or application scenario. In the field of natural language processing, downstream tasks include information retrieval, document summarization, and document generation. In the field of image analysis, downstream tasks include object recognition (image classification), object detection (location of objects), and semantic segmentation. 【0013】 If the target task is maintenance work, the training data for the target task would be, for example, text data consisting of pairs of questions and answers related to maintenance work. 【0014】 The memory unit 121 may store learning data for domains belonging to the target task. For example, the business sectors of industrial activity include domains such as healthcare, finance, and manufacturing. If the domain is the infrastructure sector, the domain's learning data would be business documents in the infrastructure sector, such as natural documents like reports. 【0015】 The memory unit 121 may store both training data for the target task and training data for the domain. The training data may be created manually or using AI technology. The training data may be in any format. Examples of training data formats are given below. • Text data (including natural language data) • Image data (including video and still image data) • Audio data • Music data • Time series data • Sensor data • Control signal data • Symbol data • Log data • Transaction data 【0016】 The training data does not need to be in one of the above formats; it may include data in multiple formats. The training data may also be a combination of data in multiple formats. 【0017】 The storage unit 121 can be composed of any commonly used storage medium, such as flash memory, memory card, RAM (Random Access Memory), HDD (Hard Disk Drive), and optical disc. 【0018】 The display unit 122 is a display device such as a liquid crystal display for displaying various information. For example, the display unit 122 displays various information in accordance with the control by the output control unit 111. 【0019】 The acquisition unit 101 acquires various types of information used by the information processing device 100. The method by which the acquisition unit 101 acquires information can be any method, but for example, it can be a method of receiving information from an external device via a network, or a method of reading information from a storage medium. 【0020】 For example, the acquisition unit 101 acquires one or more task vectors. Note that a specialized model can also be generated by combining multiple task vectors corresponding to multiple tasks with a base model. For example, by combining the task vector for a document generation task and the task vector for an image detection task, a specialized model tailored to these two tasks can be generated. In such cases, two or more task vectors may be acquired. 【0021】 The task vector can be obtained in any way, but for example, it can be obtained using the following methods. • Retrieve pre-generated task vectors (such as publicly available task vectors). The task vector is obtained by calculating the difference between the parameters of a pre-trained base model and the parameters of a model that has been fine-tuned for a particular task. The base model and the fine-tuned model may be publicly available models or models obtained by any other method. 【0022】 Each model (such as a base model) is, for example, a neural network model. Below, we will mainly describe an example where the model is a neural network model containing multiple layers (hierarchies). The model parameters can be represented in any structure. Below, we will mainly describe an example where parameters represented in matrix form (parameter matrices) for each of the multiple layers are used. The task vector has the same structure as the model because it is the difference in parameters between two models. 【0023】 Figure 2 shows an example of the relationship between the base model, task vector, and parameter matrix. The base model 201 is an N (where N is an integer greater than or equal to 2) layer neural network model. Ln (where n is 1 ≤ n ≤ N) represents the parameter matrix of the nth layer of the base model 201. Task vector 202 is the task vector corresponding to the base model 201. ΔW n This represents the parameter matrix of the nth layer of task vector 202. Parameter matrix 203 represents an example of the parameter matrix of a single layer. 【0024】 Returning to the explanation of Figure 1, the division unit 102 divides each of the one or more task vectors into multiple blocks. Any method of division is acceptable, but for example, the following method can be applied. (D1) Divide into blocks, one for each layer. (D2) Divide into blocks based on the characteristics of the layers. For example, divide into two blocks: task vectors for lower layers and task vectors for higher layers. (D3) Divide the parameter matrix into blocks by either the rows or columns, or both. 【0025】 FIG. 3 is a diagram showing an example of a division method. Division example 301 corresponds to (D1) above. Division example 302 corresponds to (D3) above. In the present embodiment, the coefficient λ is adjusted for each divided block. In division example 301, the coefficients λ1 to λ N for each layer are adjusted. In division example 302, the coefficients λ 11 ~λ lm for each row and column of the parameter matrix are adjusted. Then, for example, a specialized model is generated by adding a base model and a task vector multiplied by a coefficient. Division examples 301 and 302 can also be interpreted as representing the generated specialized models. 【0026】 If the task vector is divided more finely, the number of coefficients λ to be adjusted increases, so that it becomes possible to generate a specialized model more suitable for the target task. Note that the task vector may not be divided, and one coefficient for the entire task vector may be adjusted. In this case, the division unit 102 may not be provided. 【0027】 Returning to the description of FIG. 1, the refinement unit 103 executes a refinement process for each one or more task vectors to correct the value of a parameter whose importance (usefulness, score) is smaller than that of other parameters among the plurality of parameters included in the task vector to a specified value. The specified value is, for example, zero. When the task vector is divided into blocks, the refinement unit 103 may execute the refinement process for each one or more task vectors and for each of the plurality of blocks. 【0028】 The importance is calculated by, for example, any one of the following calculation methods (CM1) to (CM4). (CM1) The refinement unit 103 calculates the absolute value of the component of the base model corresponding to the component of the task vector, or the absolute value of the component of the task vector, as the importance. For example, when the component of the parameter of the base model is θ i and the component of the parameter of the task vector is φ i , the importance s(φ i ) is s(φ i ) = |θi | or s(φ i )=|φ i It is calculated by |. (CM2) The refining unit 103 calculates importance as the absolute value obtained by adding or subtracting the components of the task vector and the components of the base model corresponding to the components of the task vector. For example, importance s(φ i ) is s(φ i )=|θ i +φ i | or s(φ i )=|θ i -φ i It is calculated by |. (CM3) The refining unit 103 calculates the change in loss due to inference using a specialized model as importance. For example, D is appropriate training data, L is the loss function, θ is the parameters (parameter matrix) of the base model, and θ is each component of the parameters. i , the components of the task vector parameters are φ i When this is the case, the importance s(φ i ) is s(φ i )=|(∂L(D;θ) / ∂θ i ) × φ i It is calculated by |. (CM4) The refining unit 103 calculates importance as the result of comparing the sign of the components of the base model corresponding to the components of the task vector with the sign of the components of the task vector. For example, importance s(φ i ) is s(φ i )=(θ i / |θ i |) × (φ i / |φ i It is calculated by |). 【0029】 Parameters with lower importance than other parameters are determined, for example, as follows: • In the case of (CM1)~(CM3): A certain percentage (γ%) of parameters in order of increasing importance, a certain number of parameters in order of increasing importance, or parameters whose importance is less than a threshold. • In the case of (CM4): Importance s(φ i The parameter whose value is "-1" 【0030】 The adjustment unit 104 uses the training data to learn (adjust) the coefficients for each block of the task vector to be optimized for the target task. For example, for one or more task vectors, the adjustment unit 104 uses the training data to adjust the coefficients for modifying the task vector into an optimal task vector optimized for the target task. 【0031】 More specifically, the adjustment unit 104 trains a model, which is obtained by adding the parameters of the refined task vector and the parameters of the base model, using the training data. During training, the adjustment unit 104 adjusts the coefficients for each block of the task vector while keeping the parameters of the base model and the task vector fixed. 【0032】 Any learning method is acceptable, but for example, one method can be applied that involves repeatedly adjusting the coefficient values to minimize the error (loss) between the training data and the correct data. Since the adjusted coefficients are used to generate the specialized model, adjusting the coefficients can be interpreted as equivalent to training the specialized model. 【0033】 The task vector generation unit 105 generates an optimal task vector by combining the refined task vector with the coefficient obtained as a result of the adjustment. For example, the task vector generation unit 105 generates one or more optimal task vectors by multiplying each task vector by a coefficient. It is also possible to configure the unit to use only the refinement of the task vector by setting the coefficient to 1. 【0034】 The model generation unit 106 generates a specialized model optimized for the target task using a base model and one or more optimal task vectors. For example, the model generation unit 106 generates a specialized model by combining the optimal task vectors and the base model using operations such as addition and subtraction. The specialized model can also be interpreted as being generated by weighted addition using the base model, task vectors, and adjusted coefficients. The base model may be the same model used in learning (for example, the model used to calculate the task vectors) or a different model. 【0035】 As described above, specialized models can also be generated by combining (multi-combining) multiple task vectors with a base model. In such cases, multiple task vectors are combined according to the corresponding blocks among multiple blocks. 【0036】 The patterns for combining optimal task vectors include, for example, the following patterns: (P1) A pattern that combines one task vector with one base model. (P2) A pattern that combines multiple task vectors for a single base model. (P3) A pattern in which one task vector is combined for each of multiple base models. (P4) A pattern that combines multiple task vectors for multiple base models. 【0037】 The output control unit 111 controls the output of various types of information used by the information processing device 100. For example, the output control unit 111 outputs the generated specialized model (parameters of the specialized model) to an external device that uses the specialized model. The output control unit 111 also displays a screen for checking the generated specialized model, for example, on the display unit 122. The method of outputting information can be any method, but for example, methods such as transmitting information to an external device via a network and displaying it on the display unit 122 can be applied. 【0038】 At least a portion of each of the above parts (acquisition unit 101, division unit 102, refining unit 103, adjustment unit 104, task vector generation unit 105, model generation unit 106, and output control unit 111) may be implemented by one or more processing units. Each of the above parts may be implemented by, for example, one or more processors. For example, each of the above parts may be implemented by having a processor such as a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit) execute a program, i.e., by software. Each of the above parts may be implemented by a processor such as a dedicated IC (Integrated Circuit), i.e., by hardware. Each of the above parts may be implemented by using both software and hardware. When multiple processors are used, each processor may implement one of the above parts, or two or more of the above parts. 【0039】 The information processing device 100 may be composed of one physical device or multiple physical devices. For example, the information processing device 100 may be built on a cloud environment. Furthermore, each part of the information processing device 100 may be distributed and provided on multiple devices. 【0040】 Next, the model generation process by the information processing device 100 of the first embodiment will be described. Figure 4 is a flowchart showing an example of the model generation process in the first embodiment. 【0041】 The acquisition unit 101 acquires the task vector (parameters of the task vector) (step S101). For example, the acquisition unit 101 acquires the parameter matrix of the task vector by calculating the difference between the parameter matrix of the pre-trained base model and the parameter matrix of the model in which the base model has been fine-tuned. 【0042】 The division unit 102 divides the parameter matrix of the task vector into multiple blocks (step S102). 【0043】 The refining unit 103 refines the task vector (step S103). For example, the refining unit 103 determines the parameters with high importance based on the magnitude of the absolute values of each component (parameter) of the parameter matrix of the task vector, and corrects the components of the parameters with low importance to 0. 【0044】 The acquisition unit 101 acquires training data (step S104). For example, the acquisition unit 101 may acquire different training data for each iterative process of adjusting the coefficients (learning) (steps S104 to S106). 【0045】 The adjustment unit 104 uses the training data to adjust the coefficients for each block of the task vector to optimize it for the target task (step S105). For example, the adjustment unit 104 uses the training data to train a model (training model) which is obtained by adding the parameters of the refined task vector and the parameters of the base model. During training, for example, the coefficients for each block are adjusted to minimize the loss (error relative to the ground truth data, etc.) while keeping the parameters of the task vector and the parameters of the base model fixed. 【0046】 The adjustment unit 104 determines whether or not to terminate the adjustment of the coefficient (step S106). For example, the adjustment unit 104 determines to terminate the adjustment when the number of iterations reaches the upper limit or when the loss falls below a threshold. If the adjustment is not terminated (step S106: No), the process returns to step S104 and is repeated. 【0047】 If the adjustment is to be completed (step S106: Yes), the task vector generation unit 105 generates an optimal task vector by combining (for example, multiplying) the refined task vector and the coefficients for each block (step S107). 【0048】 The model generation unit 106 generates a specialized model by combining the generated optimal task vector with the base model (step S108). For example, the model generation unit 106 generates a specialized model by adding or subtracting the parameter matrix of the optimal task vector with the parameter matrix of the base model. 【0049】 Thus, the information processing device of the first embodiment refines the task vector to retain parameters of high importance that are useful for the target task, and generates a specialized model using the refined task vector. As a result, it is possible to generate a model that is more specialized for the target task and can achieve higher inference performance. 【0050】 Furthermore, because it allows for adjustment of coefficients for each block of the task vector rather than tuning all the model parameters, computational costs can be reduced. Also, because the model is generated using task vectors, it can mitigate catastrophic forgetting compared to techniques such as continuous learning. 【0051】 (Second embodiment) The information processing device of the second embodiment adjusts the coefficients of the task vector using elements that represent the characteristics of the training data (feature elements) from among the multiple elements included in the training data. 【0052】 Figure 5 is a block diagram showing an example of the configuration of the information processing device 100-2 according to the second embodiment. As shown in Figure 5, the information processing device 100-2 includes a storage unit 121, a display unit 122, an acquisition unit 101, a division unit 102, a refining unit 103, an adjustment unit 104-2, a task vector generation unit 105, a model generation unit 106, an extraction unit 107-2, and an output control unit 111. 【0053】 In the second embodiment, the extraction unit 107-2 is added, and the function of the adjustment unit 104-2 differs from that of the first embodiment. The other configurations and functions are the same as those in Figure 1, which is a block diagram of the information processing device 100 of the first embodiment, so they are denoted by the same reference numerals and their description is omitted here. 【0054】 The extraction unit 107-2 extracts feature elements from among multiple elements contained in the training data that represent the characteristics of the training data. These feature elements can be interpreted as elements that represent the characteristics of the domain to which the training data belongs. 【0055】 An element corresponds to the smallest meaningful unit of the training data. For example, if the training data is text data, an element is a token containing one or more morphemes. If the training data is image data, an element is a pixel or a unit containing multiple pixels (such as a patch). If the training data is audio data, an element is a phoneme. 【0056】 Any method can be used to extract the feature elements, but for example, the following extraction methods (EM1) to (EM3) can be used. (EM1) Method using a reference model. For example, the extraction unit 107-2 inputs the training data into the reference model and obtains the inference loss by the reference model for each element of the training data as the baseline loss (first loss). Next, the extraction unit 107-2 calculates the difference between the current loss (second loss), which is the inference loss by the training model, and the baseline loss for each iteration of adjusting the coefficients (training model) using the training data. The extraction unit 107-2 extracts elements whose difference is greater than that of other elements as feature elements. Elements whose difference is greater than that of other elements are, for example, the top a certain percentage (k%) of elements in descending order of difference value, a certain number of elements in descending order of difference value, or elements whose difference value is greater than a threshold. (EM2) A method that utilizes a system (such as a publicly available tool) that has the function of extracting feature elements. For example, an automated term extraction tool (such as TermExtract) that extracts technical terms from document data can be used. The extracted technical terms correspond to feature elements. (EM3) A method of evaluating and extracting the importance of each element. For example, metrics such as TFIDF (Term Frequency-Inverse Document Frequency), which is used in natural language processing techniques to measure the importance of words, can be used as the importance. The extraction unit 107-2 extracts elements with a higher importance than other elements from among multiple elements of the training data, which is natural language data containing multiple words as elements, as feature elements. Elements with a higher importance than other elements are, for example, the top a certain percentage (k%) of elements in descending order of importance, a certain number of elements in descending order of importance, or elements with an importance value greater than a threshold. 【0057】 The adjustment unit 104-2 adjusts the coefficients using feature elements. For example, the adjustment unit 104-2 adjusts the coefficients by prioritizing the use of feature elements. For example, while keeping the parameters of the base model and the task vector fixed, the adjustment unit 104-2 adjusts the coefficients for each block of the task vector by prioritizing the use of the loss that estimates the feature elements (estimated loss). 【0058】 The following methods can be used to prioritize the estimation loss of feature elements. • Use only the feature element estimation loss. • Scale and utilize the estimated loss of feature elements. For example, use an enlarged or reduced estimated loss by multiplying it by a scale value. The estimated loss LA for feature elements and the estimated loss LB for non-feature elements are used in a weighted manner. For example, a high weight is set for the estimated loss LA of feature elements, and a low weight is set for the estimated loss LB of non-feature elements. 【0059】 Next, the model generation process by the information processing device 100-2 of the second embodiment will be explained using Figure 6. Figure 6 is a flowchart showing an example of the model generation process in the second embodiment. 【0060】 Steps S201 to S204 are the same as steps S101 to S104 in the information processing device 100 of the first embodiment, so their explanation will be omitted. 【0061】 In this embodiment, during the iterative process of adjusting the coefficients, the extraction unit 107-2 extracts feature elements from the training data (step S205). 【0062】 Let's explain an example where the specialized model to be generated is a language model. The training data will be text data, the reference model will be a large-scale language model (LLM) such as Llama (Large Language Model Meta AI), and the loss function will be cross-entropy loss. 【0063】 The extraction unit 107-2 inputs the training data (text data) into the LLM and obtains the inference loss by the LLM as the baseline loss for each token in the input text data. Next, the extraction unit 107-2 calculates the difference between the current loss, which is the loss by the model under training, and the baseline loss for each token. The extraction unit 107-2 sorts the differences and, for example, extracts the top k% of elements (tokens) in descending order of difference as domain tokens of interest in the domain of the training data, i.e., feature elements. 【0064】 The adjustment unit 104-2 adjusts the coefficients of the task vector using the training data and extracted feature elements (step S206). For example, the adjustment unit 104-2 adjusts the coefficients of each block of the task vector by utilizing the estimation loss of the extracted feature elements. 【0065】 Steps S207 to S209 are the same as steps S106 to S108 in the information processing device 100 of the first embodiment, so their explanation will be omitted. 【0066】 In this embodiment, the task vector refinement process (step S203 in Figure 6) does not need to be performed. Even with this configuration, the function to adjust the coefficients using the feature elements of the training data makes it possible to generate a model that can achieve higher inference performance. In other words, in this embodiment, the coefficients of the task vector can be adjusted using the feature elements of the training data. To put it another way, in this embodiment, domain information can be efficiently incorporated by adjusting the coefficients of the task vector by focusing on the feature elements. Therefore, for example, it is possible to generate a model that can achieve higher inference performance for the domain of the training data. 【0067】 (Third embodiment) The information processing device of the third embodiment adjusts the coefficients of the task vector using inference output information from an inference model separate from the base model and the specialized model to be generated. 【0068】 Figure 7 is a block diagram showing an example of the configuration of the information processing device 100-3 according to the third embodiment. As shown in Figure 7, the information processing device 100-3 includes a storage unit 121, a display unit 122, an acquisition unit 101, a division unit 102, a refining unit 103, an adjustment unit 104-3, a task vector generation unit 105, a model generation unit 106, an inference unit 108-3, and an output control unit 111. 【0069】 In the third embodiment, the inference unit 108-3 is added, and the function of the adjustment unit 104-3 differs from that of the first embodiment. The other configurations and functions are the same as those in Figure 1, which is a block diagram of the information processing device 100 of the first embodiment, so they are denoted by the same reference numerals and their description is omitted here. 【0070】 The inference unit 108-3 obtains inference output information for the training data using an inference model that takes training data as input and outputs inference output information. Inference output information is information output during inference by the inference model. For example, inference output information includes the output of the final layer of the inference model (e.g., logits), the value obtained by applying a predetermined function (e.g., the Softmax function) to the output of the final layer, and the output of the intermediate layer of the inference model. 【0071】 The inference model can be any model, but for example, a representative foundational model for the domain related to the target task, or a model specific to that domain, can be used. If the domain is language processing, the inference model could be a closed LLM such as GPT-4 (Generative Pre-trained Transformer 4) or an open-source LLM such as Llama. If the domain is image analysis, the inference model could be an image analysis model such as a residual neural network (ResNet) or a Vision Transformer. 【0072】 The adjustment unit 104-3 adjusts the coefficients using the training data and the inference output information. For example, the adjustment unit 104-3 adjusts the coefficients by knowledge distillation using loss (knowledge distillation learning) to bring the inference output information, which is the output of the inference model, closer to the output of the specialization model. 【0073】 More specifically, the adjustment unit 104-3 adjusts the coefficients for each block of the task vector by utilizing the loss between the output distributions of the inference model and the model under training, so that the output distribution of the model under training (student model) matches the desired output distribution, which is the inference output information of the inference model (teacher model). The loss between the output distributions can be, for example, the Kullback-Leibler divergence. 【0074】 Next, the model generation process by the information processing device 100-3 of the third embodiment will be explained using Figure 8. Figure 8 is a flowchart showing an example of the model generation process in the third embodiment. 【0075】 Steps S301 to S304 are the same as steps S101 to S104 in the information processing device 100 of the first embodiment, so their explanation will be omitted. 【0076】 In this embodiment, during the iterative process of adjusting the coefficients, the inference unit 108-3 inputs the training data into the inference model and obtains the inference output information output by the inference model (step S305). For example, the inference unit 108-3 inputs the text data, which is the training data, into the LLM (inference model) and obtains the logit output by the LLM as inference output information. 【0077】 The adjustment unit 104-3 adjusts the coefficients of the task vector using the training data and inference output information (step S306). For example, the adjustment unit 104-3 performs knowledge distillation using the inference output information and adjusts the coefficients of each block of the refined task vector. 【0078】 Steps S307 to S309 are the same as steps S106 to S108 in the information processing device 100 of the first embodiment, so their explanation will be omitted. 【0079】 In this embodiment, the task vector refinement process (step S303 in Figure 8) does not necessarily have to be performed. Even with this configuration, the function to perform knowledge distillation using the inference output information of the inference model makes it possible to generate a model that can obtain higher inference performance. In other words, in this embodiment, the performance of the generated specialized model can be improved by adjusting the coefficients of the task vector while incorporating the knowledge of the inference model through knowledge distillation. 【0080】 (Example of a user interface) Next, we will describe examples of user interfaces applicable to each of the above embodiments (the first to third embodiments). 【0081】 Figure 9 shows an example of an operation screen 900 that can be used for generating specialized models and verifying the generated specialized models. For example, the output control unit 111 displays an operation screen 900 as shown in Figure 9 on the display unit 122. 【0082】 The operation screen 900 includes a model selection field 901, a vector selection field 902, a coefficient change field 903, a coefficient display field 904, an input field 905, and an output field 906. 【0083】 The model selection field 901 is a field for selecting the base model. The vector selection field 902 is a field for selecting the task vector to be linked to the base model. The model selection field 901 and the vector selection field 902 may be in a format where the desired option is selected from multiple options using checkboxes, or they may be in a format where the desired option is selected from multiple options, such as a pull-down menu. 【0084】 The coefficient change field 903 is a field that allows further modification of the coefficients adjusted by the method of each embodiment. The operation screen 900 shows an example of the coefficient change field 903 for changing eight coefficients λ1 to λ8 corresponding to eight layers of the base model. The coefficient change field 903 is in the form of changing the coefficients by a slider bar, but it may be a field that changes the coefficients in any other form. 【0085】 min and max are, for example, the minimum and maximum values of a predetermined coefficient. The output control unit 111 sets the coefficient values obtained by the method of each embodiment as initial values of the coefficients to be displayed on the slider bar of the coefficient change field 903. 【0086】 The coefficient display field 904 is a field for displaying the status of the coefficients. For example, the user can check whether the status of the task vector changes is optimized or not by looking at the coefficients displayed in the coefficient display field 904. 【0087】 When a coefficient is changed by the coefficient change field 903, the specialized model is modified according to the changed coefficient. For example, the task vector generation unit 105 generates an optimal task vector by combining the changed coefficient with the refined task vector. The model generation unit 106 generates a specialized model using the base model and the optimal task vector. 【0088】 Input field 905 is a field for the user to specify the input to the modified specialized model. When an input to the specialized model is specified in input field 905, for example, the output control unit 111 inputs the specified input to the specialized model and obtains the output of the specialized model. 【0089】 Output field 906 is a field for displaying the output of the specialized model. 【0090】 Next, we will explain an example of a model generation and verification process using the operation screen 900 shown in Figure 9. Figure 10 is a flowchart showing an example of a model generation and verification process. 【0091】 The output control unit 111 displays an operation screen 900, as shown in Figure 9, on, for example, the display unit 122. The user selects a board model on the operation screen 900. The output control unit 111 accepts the selection of the board model (step S401). The user also selects a task vector on the operation screen 900. The output control unit 111 accepts the selection of the task vector (step S402). 【0092】 After this, the model generation process according to the method of each embodiment is executed (step S403). The output control unit 111 may display the coefficient values obtained in the model generation process in the coefficient change field 903 (initial value) and the coefficient display field 904. 【0093】 If the coefficient is changed in the coefficient change field 903 on the operation screen 900, the output control unit 111 accepts the coefficient change (step S404). When the coefficient is changed by the coefficient change field 903, the specialized model is changed according to the changed coefficient. 【0094】 If an input for a specialized model is specified in the input field 905, the output control unit 111 accepts the input for the specialized model (step S405). The output control unit 111 inputs the specified input to the specialized model and obtains the inference result, which is the output of the specialized model (step S406). The output control unit 111 displays the inference result in the output field 906 (step S407) and terminates the model generation and verification process. 【0095】 As described above, according to the first to third embodiments, it is possible to generate a model that can achieve higher inference performance for a desired task. 【0096】 Next, the hardware configuration of the information processing device according to the first to third embodiments will be described using Figure 11. Figure 11 is an explanatory diagram showing examples of the hardware configuration of the information processing device according to the first to third embodiments. 【0097】 The information processing devices of the first to third embodiments include a control device such as a CPU (Central Processing Unit) 51, a storage device such as a ROM (Read Only Memory) 52 and a RAM (Random Access Memory) 53, a communication interface 54 for communication via a network, and a bus 61 for connecting the various parts. 【0098】 The programs to be executed in the information processing devices of the first to third embodiments are provided pre-installed in a ROM 52 or the like. 【0099】 The programs executed by the information processing devices of the first to third embodiments may be configured to be provided as computer program products by recording them in an installable or executable file format onto a computer-readable recording medium such as a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD), a CD-R (Compact Disk Recordable), or a DVD (Digital Versatile Disk). 【0100】 Furthermore, the information processing device of the first to third embodiments may be configured to store the program on a computer connected to a network such as the Internet and provide it by allowing download via the network. Alternatively, the information processing device of the first to third embodiments may be configured to provide or distribute the program via a network such as the Internet. 【0101】 The programs executed in the information processing apparatus of the first to third embodiments can cause the computer to function as a part of the information processing apparatus described above. This computer can read the program from a computer-readable storage medium onto the main memory and execute it using the CPU 51. 【0102】 While several embodiments of the present invention have been described, these embodiments are presented as examples only and are not intended to limit the scope of the invention. These novel embodiments can be carried out in a variety of other forms, and various omissions, substitutions, and modifications can be made without departing from the spirit of the invention. These embodiments and their variations are included in the scope and spirit of the invention, as well as in the claims of the invention and its equivalents. [Explanation of Symbols] 【0103】 100, 100-2, 100-3 Information Processing Device 101 Acquisition Department 102 Division 103 Refining Department 104,104-2,104-3 Adjustment section 105 Task Vector Generation Unit 106 Model Generation Unit 107-2 Extraction part 108-3 Reasoning section 111 Output Control Unit 121 Storage section 122 Display section
Claims
[Claim 1] On the computer, A refining step in which, for each of the one or more task vectors, a refining process is performed to adjust the value of a parameter whose importance is less than that of other parameters among the multiple parameters included in the task vector to a specified value, For each of the task vectors, an adjustment step is performed to adjust coefficients using training data to modify the task vector into an optimal task vector optimized for the target task; A task vector generation step of generating one or more optimal task vectors by multiplying each of the task vectors by the coefficient, A model generation step that generates a specialized model optimized for the target task using a base model and one or more of the aforementioned optimal task vectors, A program to execute. [Claim 2] The method further includes a splitting step of dividing each of the one or more task vectors into multiple blocks, The coefficient is determined for each of the multiple blocks, The program according to claim 1. [Claim 3] The method further includes an extraction step of extracting feature elements that represent the characteristics of the training data from among a plurality of elements included in the training data, The adjustment step involves adjusting the coefficients using the training data and the feature elements. The program according to claim 1. [Claim 4] The adjustment step involves adjusting the coefficient by prioritizing the use of the feature element. The program according to claim 3. [Claim 5] The extraction step involves extracting, as feature elements, elements from among a plurality of elements included in the training data, in which the difference between the first loss in the inference by the reference model using the training data and the second loss in the inference by the specialized model is greater than that of the other elements. The program according to claim 3. [Claim 6] The aforementioned training data is natural language data containing multiple words as elements. The extraction step involves extracting, as feature elements, the phrases that have a greater importance than other phrases from among a plurality of phrases included in the training data. The program according to claim 3. [Claim 7] The inference step further includes obtaining the inference output information for the training data using an inference model that takes the training data as input and outputs inference output information, The adjustment step adjusts the coefficients using the training data and the inference output information. The program according to claim 1. [Claim 8] The adjustment step involves adjusting the coefficients by knowledge distillation using loss to bring the inference output information, which is the output of the inference model, closer to the output of the specialization model. The program according to claim 7. [Claim 9] The aforementioned specified value is zero. The program according to claim 1. [Claim 10] The refining step calculates the importance which is the absolute value of the component of the base model corresponding to the component of the task vector, or the absolute value of the component of the task vector. The program according to claim 1. [Claim 11] The refining step calculates the importance, which is the absolute value of the value obtained by adding or subtracting the components of the base model corresponding to the components of the task vector and the components of the task vector. The program according to claim 1. [Claim 12] The refining step calculates the importance, which is the change in loss based on inference using the specialization model. The program according to claim 1. [Claim 13] The refining step calculates the importance, which is the result of comparing the sign of the components of the base model corresponding to the components of the task vector with the sign of the components of the task vector. The program according to claim 1. [Claim 14] The refining step performs the refining process for one of the task vectors, The adjustment step adjusts the coefficient for one of the task vectors, The task vector generation step generates one optimal task vector by multiplying one task vector by the coefficient, The model generation step generates the specialized model using the base model and one of the optimal task vectors. The program according to claim 1. [Claim 15] The refining step performs the refining process for each of the multiple task vectors, The adjustment step involves adjusting the coefficient for each of the multiple task vectors, The task vector generation step generates a plurality of optimal task vectors by multiplying each of the plurality of task vectors by the coefficient, The model generation step generates the specialized model using the base model and a plurality of optimal task vectors. The program according to claim 1. [Claim 16] On the computer, An extraction step of extracting feature elements that represent the characteristics of the training data from among multiple elements contained in the training data, A refining step in which, for each of the one or more task vectors, a refining process is performed to adjust the value of a parameter whose importance is less than that of other parameters among the multiple parameters included in the task vector to a specified value, For each of the task vectors, an adjustment step is performed to adjust coefficients for modifying the task vector into an optimal task vector optimized for the target task, using the training data and the feature elements. A task vector generation step of generating one or more optimal task vectors by multiplying each of the task vectors by the coefficient, A model generation step that generates a specialized model optimized for the target task using a base model and one or more of the aforementioned optimal task vectors, A program to execute. [Claim 17] On the computer, An inference step to obtain the inference output information for the training data using an inference model that takes training data as input and outputs inference output information, A refining step in which, for each of the one or more task vectors, a refining process is performed to adjust the value of a parameter whose importance is less than that of other parameters among the multiple parameters included in the task vector to a specified value, For each of the task vectors, an adjustment step is performed to adjust coefficients for modifying the task vector into an optimal task vector optimized for the target task, using the training data and the inference output information. A task vector generation step of generating one or more optimal task vectors by multiplying each of the task vectors by the coefficient, A model generation step that generates a specialized model optimized for the target task using a base model and one or more of the aforementioned optimal task vectors, A program to execute. [Claim 18] A refining unit that performs a refining process for each of the multiple parameters included in the task vector, which has a lower importance than other parameters, to a specified value, For each of the task vectors, an adjustment unit adjusts coefficients using training data to modify the task vector into an optimal task vector optimized for the target task, A task vector generation unit generates one or more optimal task vectors by multiplying each of the task vectors by the coefficient, A model generation unit that generates a specialized model optimized for the target task using a base model and one or more of the aforementioned optimal task vectors, An information processing device equipped with the following features. [Claim 19] An information processing method performed by an information processing device, A refining step in which, for each of the one or more task vectors, a refining process is performed to adjust the value of a parameter whose importance is less than that of other parameters among the multiple parameters included in the task vector to a specified value, For each of the task vectors, an adjustment step is performed to adjust coefficients using training data to modify the task vector into an optimal task vector optimized for the target task; A task vector generation step of generating one or more optimal task vectors by multiplying each of the task vectors by the coefficient, A model generation step that generates a specialized model optimized for the target task using a base model and one or more of the aforementioned optimal task vectors, Information processing methods including