Retraining device, retraining method, and retraining program
The retraining apparatus and method efficiently improve learning model accuracy by grouping training data, identifying contributing data, and selectively retraining specific groups, addressing resource and accuracy challenges in existing retraining methods.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- HITACHI LTD
- Filing Date
- 2024-12-20
- Publication Date
- 2026-07-02
AI Technical Summary
Existing methods for retraining generative AI models, such as large language models (LLMs), face challenges in determining when to perform retraining, require significant GPU resources, and may not achieve sufficient accuracy due to the quality of training datasets, while existing analysis processes increase costs.
A retraining apparatus and method that divides training data into groups based on predetermined conditions, identifies contributing data for inference results, calculates contribution frequency and accuracy change, and selectively retrains specific groups using retraining data to improve model accuracy.
Enables efficient retraining that enhances the accuracy of learning models by focusing on data that frequently contributes to outputs and improves inference accuracy, reducing resource usage and data requirements.
Smart Images

Figure 2026110346000001_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the retraining of learning models.
Background Art
[0002] Many companies are attempting to use large language models (LLMs) in their operations. However, each company has its own unique business, and if a general-purpose LLM is used as is, it may not provide information with sufficient accuracy for the business. Therefore, there is a need for an LLM adapted to each company's respective business.
[0003] In order to adapt an LLM to a business through fine-tuning with supervised learning data, cleansing and annotation of large amounts of data through questions and answers (Q&A) are required, so it is not easy to prepare a necessary and sufficient amount of data.
[0004] As a method for retraining generative artificial intelligence models (generative AI models) such as LLMs, there is a known method of improving the generative AI while operating it through retraining using logs of questions to the generative AI model and answers obtained from the generative AI model. However, in this method, it is difficult to determine at what契机 it is appropriate to perform retraining.
[0005] Also, generally, retraining of generative AI models requires a large amount of GPU (Graphics Processing Unit) resources. However, even if retraining is performed using a large amount of resources, sufficient accuracy may not be obtained depending on the quality of the training dataset.
[0006] Patent Document 1 discloses a method in a configuration where there are two models that perform inferences respectively. When the accuracy of the inference by the model decreases, it is evaluated whether the tendency of the data has changed. When the tendency of the data has changed, retraining is performed with priorities assigned to the models.
Prior Art Documents
[0007] [Patent Document 1] International Patent Application Publication WO2022 / 113175 Specification [Overview of the Initiative] [Problems that the invention aims to solve]
[0008] However, the method disclosed in Patent Document 1 requires analyzing all data to determine whether or not to perform retraining, and this analysis process increases the cost required for retraining.
[0009] One of the purposes included in this disclosure is to provide a technology that enables retraining to efficiently improve the accuracy of a learning model. [Means for solving the problem]
[0010] One embodiment of the apparatus included in this disclosure is a relearning apparatus for relearning a learning model, Equipped with a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. The learning model is retrained based on the aforementioned retraining data.
[0011] A retraining method according to one aspect of the present disclosure is a retraining method for causing a learning model to be retrained by an apparatus comprising a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. The learning model is retrained based on the aforementioned retraining data.
[0012] A retraining program according to one aspect of this disclosure is a retraining program for causing a learning model to be retrained in a device comprising a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. Cause the processor to When performing inference by the learning model according to an input, identify which group of data contributed to the output of the inference result by the learning model, and associate the identified group with the input / output data. For each of the groups, calculate a contribution frequency, which is the frequency with which data corresponding to the group contributed to the output of the inference result by the learning model, and a precision change indicating a change in the inference precision measured by test data created based on the input / output data associated with the group from the inference precision measured in the learning data. Select a re-learning target group from the groups based on the contribution frequency and precision change of each of the groups. Create re-learning data based on the input / output data corresponding to the re-learning target group according to the conditions. Re-learn the learning model based on the re-learning data. Execute the above.
Advantages of the Invention
[0013] According to one aspect included in the present disclosure, re-learning that can efficiently improve the accuracy of a learning model becomes possible.
Brief Description of the Drawings
[0014] [Figure 1] Conceptual diagram showing a configuration example of an AI system according to an embodiment of the present disclosure. [Figure 2] Conceptual diagram showing an example of processing by a data preparation device according to an embodiment of the present disclosure. [Figure 3] Conceptual diagram showing an example of processing by an inference device according to an embodiment of the present disclosure. [Figure 4] Conceptual diagram showing an example of processing by a re-learning device according to an embodiment of the present disclosure. [Figure 5] Flowchart exemplifying processing by a data splitting unit according to an embodiment of the present disclosure. [Figure 6] Flowchart exemplifying processing by a data selection unit according to an embodiment of the present disclosure. [Figure 7] A flow chart exemplifying the processing by the data reclassification unit according to an embodiment of the present disclosure. [Figure 8] A conceptual diagram showing a configuration example of a learning model according to an embodiment of the present disclosure. [Figure 9] A block diagram showing a hardware configuration example of various devices or systems according to an embodiment of the present disclosure.
Mode for Carrying Out the Invention
[0015] Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0016] FIG. 1 is a conceptual diagram showing a configuration example of an AI system according to an embodiment of the present disclosure.
[0017] The AI system 100 includes a data preparation device 120, a learning device 140, an inference device 160, and a relearning device 180.
[0018] The data preparation device 120 includes a data splitting unit 122, an ID assignment unit 124, and a storage device 126. The data splitting unit 122 has the function of splitting the input data 102. The data 102 includes text such as design documents, manuals, procedures, and conversation histories, structured documents such as JSON (JavaScript Object Notation) and HTML (Hyper Text Markup Language), and data that can be converted into text, such as character information and transcripts extracted from images and audio. The data splitting by the data splitting unit 122 may split the data based on the similarity of documents, the similarity of answers, the version of a document, etc. Such data splitting may be mechanical splitting based on information about the data structure of the data 102, such as data about the structure such as punctuation, paragraphs, chapters, pages, tables, and bullet points, as well as splitting that takes into account the semantic similarity of words contained in a block of text, and the semantic connections between preceding and succeeding words that can be estimated by prefixes, etc. The data splitting process performed here may be performed individually according to two or more different rules. Furthermore, the division described here does not only refer to the physical division of data, but may also be performed by storing addresses that indicate data boundaries, such as file names, character counts, paragraphs, and byte positions. In this way, data can be logically divided at various boundaries. Here, punctuation, paragraphs, chapters, pages, tables, bullet points, and semantic proximity become the division rules 216. However, while the boundary conditions for division based on punctuation, etc., are clear, there are cases where the boundary conditions, such as semantic proximity, are not clear. Therefore, in this paper, the criteria for determining which data group a data belongs to when it is included in a data group or close to two or more data groups is called a scoring pattern. Examples of such scoring patterns include patterns that use a certain string as a boundary condition, and patterns that calculate the vector distance between the multidimensional vector information of the data in question and the multidimensional vector of the center of a data group, or the multidimensional vector of the data at the boundary of a data group, and determine that it belongs to the nearest data group.The ID assignment unit 124 has the function of assigning IDs to the data that has been divided by the data division unit 122. If the data has been divided according to two or more different rules as described above, the ID may be managed to include an ID indicating the division pattern and an ID indicating the data that has been physically or logically divided within that pattern. The divided data to which the IDs have been assigned is stored in the storage device 126.
[0019] The learning device 140 comprises a learning processing unit 142, an evaluation processing unit 144, a model deployment unit 146, and a storage device 148. The learning processing unit 142 has the function of performing learning processing using segmented data acquired from the data preparation device 120. Here, the learning process is a process that patterns the features of input data and makes them recognizable, classifiable, transformable, and inferable, such as machine learning, deep learning, or large language models, and is defined by a program or data. The model generated by the learning process has the characteristic of producing results of recognition, classification, transformation, and inference as output for inputs that do not deviate from the range of the learned data, based on the characteristics of the learned data. The evaluation processing unit 144 has the function of evaluating the learning results by the learning processing unit 142. The evaluation of the learning results here is a process to verify whether the model possesses the characteristics of the input data used for learning. This involves using evaluation data similar to the input data and ground truth data related to the results of recognition, classification, transformation, and inference. When the evaluation data is processed by the model, it is verified whether ground truth data or output that can be considered ground truth data is obtained. This includes calculating the percentage of ground truth data or output that can be considered ground truth data obtained, and the characteristics of the input data from which ground truth data or output that can be considered ground truth data was not obtained. The model deployment unit 146 has the function of deploying models that have been evaluated favorably by the evaluation processing unit 144. Deployment here refers to the process of deploying a model stored as data or a file onto a computer in an executable state, making it accessible and processable. Examples include deploying an application on an OS (Operating System) or deploying a container on a container environment. The storage device 148 has the function of storing the model and other necessary data. Specific examples of the storage device 148 include database servers, file servers, and database services and file store services on the cloud. The memory device 148 stores models obtained through learning by the learning processing unit 142, as well as models acquired from the retraining device 180, which will be described later. The models stored in the memory device 148 are loaded into the learning processing unit 142 when model training is performed.The models stored in the memory device 148 are passed to the model deployment unit 146 when the models are to be used.
[0020] The inference device 160 comprises an AI model 162, an ID assignment unit 164, an inference execution unit 166, and a storage device 168. The AI model 162 is a model deployed to the inference device 160 by the model deployment unit 146. The ID assignment unit 164 has the function of assigning IDs to the input / output data of the AI model 162. Here, the ID assigned to the input / output data of the AI model 162 is a unique ID assigned to processing requests from a user or system, and this ID is stored in association with the ID of the user or system that made the processing request. Furthermore, processing requests may be stored in association with various call processes performed within the AI model 162. Specific examples of this ID include, when the AI model 162 is divided into multiple AI models and multiple calls are made between AI models for a single processing request, an ID that identifies processing requests to other AI models; when there is an application at the front end of the AI model 162 that aggregates processing requests to the AI model 162, or when processing requests are made from such applications to other applications or systems, an ID that identifies the processing request. The inference execution unit 166 has the function of performing inference using input and output data of the AI model 162 based on instructions from the user 104. Here, the inference process is the process of processing input data with the model and performing recognition, classification, transformation, and inference. For example, if Y is the output, X is the input, and A is the model, then the process would be such that Y = AX. The storage device 168 stores log data of the executed inference process. Here, the log data includes an ID indicating the user or system performing this process, input data, an ID indicating the input data, output data, an ID indicating the output data, as well as the intermediate state when the AI model 162 processes the input and output data, communication logs with other AI models or applications if the AI model 162 is composed of multiple models or applications, an ID indicating the communication log, and the relationship between the communication ID and the ID indicating the input and output data or the ID indicating the user or system performing this process.
[0021] The retraining device 180 retrains the learning model. The retraining device 180 comprises a memory device and a processor. The memory device stores training data divided into two or more groups based on predetermined conditions, and a learning model trained using the training data. When the processor performs inference using the learning model in response to input, it identifies which group of data contributed to the output of the inference result by the learning model, and associates the identified group with the input and output data. For each group, the retraining device 180 calculates the contribution frequency, which is the frequency with which the data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which shows the change from the inference accuracy measured on the training data to the inference accuracy measured on test data created based on the input and output data associated with that group. As an example of this process, in this embodiment, the data of a trained basic model is divided into multiple groups based on features, and training is performed for each divided group. In this training, the basic model itself is trained as an additional submodule without changing its own parameters, and the results of the basic model and the submodule are combined at the output stage to reflect the additional training. This generally involves a technique called adapter tuning. In this process, the training data and the additional training portion correspond to each other, and by including some information about the underlying data source in the model's output, it is possible to estimate which dataset contributed to the output. In this way, by repeatedly inputting and outputting data to the model and taking statistics on the relationship between the inputs and outputs and the data that contributed to the output, it is possible to calculate which data contributed most to generating the output in the use case in which this model is used. This statistical data indicates the frequency of contribution. Next, in the testing phase, if a test dataset is generated based on the assumed use case and a test is performed, the test results using this test data will show whether the correct output was obtained for each additional training portion based on the relationship between the training data and the additional training portion. In other words, the inference accuracy is derived for each additional training portion. However, in actual use cases, different inputs and outputs occur than in the test. Therefore, the inference accuracy for each additional training portion changes in the evaluation of logs in actual use cases.Therefore, by comparing the correct test data for each additional learning portion included in the test data with the trend of outputs where a particular additional learning portion has a large contribution, it is possible to determine whether the output was close to the correct answer. However, this process can be performed by a human judging whether the log is correct or not, or by using a separate LLM or program for accuracy evaluation. The change in the inference accuracy for each additional learning portion calculated as a result, over a certain time unit, is the accuracy change. The relearning device 180 selects a group to be relearned from among the groups based on the contribution frequency and accuracy change of each group. This selection is performed by the data selection unit 186. The relearning device 180 creates relearning data based on the input and output data that corresponds to the relearning target group according to the data conditions. The relearning data may be a combination of questions and answers. However, in many cases, questions are in a state of insufficient information, so the AI model 162 and the application used in combination with the AI model 162 may perform inference after converting the questions to a state in which the information necessary for the answers is included. For example, if an inquiry regarding an error message and its solution is received at the support desk, and the AI model 162 is to generate an answer, then information missing from the question, such as the product name, the conditions under which the error message occurs, a manual describing the error message, or product specifications, will be needed. A question and answer combination containing this missing information may be used as training data. Alternatively, for frequently occurring input / output cases where the model did not get the correct answer, a new correct answer to the question may be created and used as training data. This process may also involve instructing a human to generate an answer to a question and then importing the result. The data selection unit 186 creates this relearning data. The learning processing unit 182 relearns the learning model based on the relearning data. The relearned learning model is stored in the storage device 148 of the learning device 140.
[0022] The relearning device 180 includes a learning processing unit 182, an evaluation processing unit 184, a data selection unit 186, and a data reclassification unit 190. The processor functionally implements the learning processing unit 182, the evaluation processing unit 184, the data selection unit 186, and the data reclassification unit 190 by executing a program stored in the memory device.
[0023] The learning processing unit 182 has the function of performing learning processing using the learning data selected by the data selection unit 186. The evaluation processing unit 184 has the function of evaluating the learning results by the learning processing unit 182. The data selection unit 186 has the function of selecting a group to be retrained from among the groups based on the contribution frequency and accuracy change of each group. The data selection unit 186 also has the function of creating retraining data based on input and output data that correspond to the group to be retrained according to the data conditions. The data reclassification unit 190 has the function of dividing a group into two or more groups for each characteristic if there are two or more input groups with different characteristics among the inputs of the input and output data associated with a group.
[0024] Figure 2 is a conceptual diagram showing an example of processing by a data preparation device according to the embodiment of this disclosure.
[0025] The data partitioning unit 122 performs a data partitioning process 214 on the data 210 based on the data 102 to generate partitioned data 212. The partitioning algorithm at this time is based on partitioning rules 216, which are information that defines predetermined rules.
[0026] The ID assignment unit 124 performs an ID assignment process 232 on the divided data 212 to generate divided data 234 with assigned IDs. In this example, the ID 1 is assigned to texts A to C, which are included in the first group of the divided data 212. Similarly, the ID 2 is assigned to texts D to L, which are included in the second group. Thereafter, the IDs for each group formed by the division are assigned to the divided data 212 accordingly. The ID assignment process 232 may perform the above ID assignment process by referring to the information of the division rule 216. The divided data 234 with assigned IDs is stored in the storage device 126.
[0027] Figure 3 is a conceptual diagram showing an example of processing by the inference device according to the embodiment of this disclosure.
[0028] User 104 inputs question 302 via their own terminal device. In this example, question 302 is "I bought a TV made by Company A, but the TV doesn't work."
[0029] The application 310 implemented in the inference execution unit 166 acquires the information from question 302. The application 310 sends data 312, which is the question 302, i.e., the inquiry content, with related FAQ information for product A added, to the ID assignment processing unit 332.
[0030] The ID assignment processing unit 332 performs ID assignment processing, for example, using the AI model 162. The ID assignment processing unit 332 inputs data to the AI model 162, which includes the inquiry content and related FAQ information for product A, with additional instructions for assigning an ID. The AI model 162 then outputs data 336, which includes the answer and the assigned ID. Based on this data 336, the ID assignment processing unit 332 generates log data 338 and stores the log data in the storage device 340. The log data 338 includes the question, the answer, and the assigned ID as information items.
[0031] The set of question, answer, and ID may be stored as a knowledge database in the storage device 314 on the inference execution unit 166 side. This knowledge database constitutes the related FAQ information mentioned above. The application 310 may obtain the answer 316 and output the answer 316 to the user, but since the ID mentioned above is unnecessary information for the user, the ID does not need to be included in the output data to the user. Note that the storage device 314 may be the storage device 168 shown in Figure 3. The data stored in the storage device 314 may be transmitted to the storage device 168 shown in Figure 3.
[0032] As described above, the user input provided by the user is input to the learning model with an instruction to add identification information to the output indicating which group of data contributed to the inference result output. The learning model then identifies which group of data contributed to the inference result output from the identification information output, and the user output obtained by removing the identification information from the learning model output is presented to the user. The user input here corresponds to question 302 in the figure. The identification information indicating which group of data contributed to the inference result output corresponds to the ID. The learning model corresponds to the AI model.
[0033] Figure 4 is a conceptual diagram showing an example of processing by a retraining device according to the embodiment of this disclosure.
[0034] The storage device 168 stores a set 402 of questions, answers, and log IDs. The data selection unit 186 performs aggregation processing 410 to aggregate the set 402 for each log ID. The aggregation result 412 includes aggregate values for each log ID. For example, if the set 402 contains three data entries with log ID 1, the aggregate value for log ID=1 will be 3.
[0035] The data selection unit 186 selects data to be used for fine tuning 432 based on the aggregated values of the aggregation result 412. For example, fine tuning 432 may be performed based on data with aggregated values of 2 or more.
[0036] The data generation unit 434 generates data to be used for fine tuning 432 based on the data stored in set 402.
[0037] Fine-tuning 432 is performed on the AI model 452 stored in the memory device 148 to obtain the tuned AI model 454. This fine-tuning 432 is performed by inputting the data used for fine-tuning into the AI model and performing an evaluation process on the output data obtained.
[0038] Figure 5 is a flowchart illustrating the processing performed by the data partitioning unit according to an embodiment of the present disclosure.
[0039] The data partitioning unit 122 performs data partitioning (step 502). The data partitioning unit 122 selects a scoring pattern (step 504). The scoring pattern is determined by the partitioning rule 216 performed by the data partitioning unit 122. Since the data partitioning unit 122 can logically or physically differentiate data using multiple scoring patterns, two or more scoring patterns can be set. This step involves selecting one of these patterns.
[0040] The data partitioning unit 122 scores the data based on the selected scoring pattern (step 506). Next, the data partitioning unit 122 performs clustering (step 508).
[0041] The data partitioning unit 122 determines whether the data will be divided into two or more groups as a result of clustering (step 510). If it is determined that the data will be divided into two or more groups, the process proceeds to step 512. If it is determined that the data will not be divided into two or more groups, the process returns to step 504.
[0042] In step 512, the data splitting unit 122 divides the data into two or more groups. Then, it assigns a group ID and a splitting ID to the divided data (step 514).
[0043] The data splitting unit 122 determines whether processing has been completed for all scoring patterns (step 516). If processing is completed for all patterns, the process shown in Figure 5 ends. If processing is not completed for all patterns, the process returns to step 504 to select the next scoring pattern.
[0044] Figure 6 is a flowchart illustrating the processing performed by the data selection unit according to an embodiment of the present disclosure.
[0045] The data selection unit 186 selects a scoring pattern (step 602). The data selection unit 186 calculates the aggregated value of the segmented IDs (see Figure 4) and sorts them in descending order of aggregated value (step 604).
[0046] The data selection unit 186 selects the target data (step 606). The data selection unit 186 may select the target data based on aggregate values, for example, by selecting data with an aggregate value of 2 or more.
[0047] The data selection unit 186 calculates the contribution and determines whether the contribution is above a predetermined threshold (step 608). One example of how to determine the contribution threshold is to consider the variance of the input data. For example, if the characteristics of the input data are based on the standard deviation, the threshold may be set to approximately 95% or more, which corresponds to 2σ. Alternatively, the threshold may be determined based on the accuracy target. For example, if the contribution of the top five data groups exceeds the accuracy target required for the task, the threshold may be set to adopt up to the fifth group. If the input data is above the threshold, the process proceeds to step 610. If it is not above the threshold, the process proceeds to step 618. The contribution is a value that indicates which group of data contributed to the output of the inference result by the learning model.
[0048] In step 610, the data selection unit 186 evaluates the change in inference accuracy. In step 612, the data selection unit 186 determines whether or not there is a decrease in accuracy based on the evaluation described above. If there is a decrease in accuracy, the process proceeds to step 616. If there is no decrease in accuracy, the process proceeds to step 614.
[0049] In step 614, the data selection unit 186 determines whether or not an improvement in accuracy can be expected based on the evaluation of the accuracy change described above. Here, an improvement in accuracy based on the evaluation of the accuracy change refers to cases where the evaluation of frequently occurring input and output data in a certain time unit is not correct, resulting in an accuracy change. In this case, as mentioned above, an improvement in accuracy can be expected by generating correct answers from humans as retraining data and training the model. If an improvement in accuracy can be expected, the process proceeds to step 616. If an improvement in accuracy cannot be expected, the process returns to step 606.
[0050] In step 616, the data selection unit 186 selects the target data selected in step 606 as the data to be retrained. Then it returns to step 606 and selects the next target data.
[0051] In step 618, the data selection unit 186 determines whether processing for all scoring patterns has been completed. If it has been completed, the process shown in Figure 6 ends. If it has not been completed, the process returns to step 602 and the next scoring pattern is selected.
[0052] As described above, the processor may calculate the contribution frequency and accuracy change of each group using two or more classification methods, and select the group to be retrained using one of the selected classification methods. The classification method referred to here corresponds to the scoring pattern in Figure 6. The contribution frequency corresponds to the contribution level in Figure 6. The accuracy change corresponds to the change in inference accuracy in step 610.
[0053] Figure 7 is a flowchart illustrating the processing performed by the data reclassification unit according to an embodiment of this disclosure.
[0054] The data reclassification unit 190 selects a scoring pattern (step 702). The data reclassification unit 190 selects a question for each segment ID (step 704).
[0055] The data reclassification unit 190 scores the data based on the selected scoring pattern (step 706). Next, the data reclassification unit 190 performs clustering (step 708).
[0056] The data reclassification unit 190 determines whether the data will be divided into two or more groups as a result of clustering (step 710). If it is determined that the data will be divided into two or more groups, the process proceeds to step 712. If it is determined that the data will not be divided into two or more groups, the process proceeds to step 718.
[0057] In step 712, the data reclassification unit 190 divides the data according to the clusters of the questions. If such division is possible (step 714: Yes), the process proceeds to step 716. If such division is not possible (step 714: No), the process proceeds to step 718.
[0058] In step 716, the data reclassification unit 190 divides the data and assigns new IDs.
[0059] In step 718, the data reclassification unit 190 determines whether processing for all IDs has been completed. If it has been completed, the process proceeds to step 720. If it has not been completed, the process returns to step 704.
[0060] In step 720, the data reclassification unit 190 determines whether processing has been completed for all scoring patterns. If completed, the process shown in Figure 7 ends. If not completed, the process returns to step 702.
[0061] The data reclassification unit 190 may reclassify the data based on the similarity of the questions. More specifically, the data reclassification unit 190 may calculate the degree of similarity between the inputs of the input and output data, and if there are two or more input groups with a similarity of a predetermined value or higher among the inputs of the input and output data associated with a classification, i.e., a group, it may divide the group into two or more groups for each input group.
[0062] For example, if a data point is associated with multiple sets of questions, the data set may be split. Similarly, if multiple data points are associated with a specific data point, the data may be merged. Vector distances between data points may be used to determine the similarity between questions and data.
[0063] Figure 8 is a conceptual diagram showing an example of the configuration of a learning model according to the present disclosure.
[0064] The learning model may include one base model for all groups and multiple partitioned models for each of the groups. The dataset used for training is divided into multiple partitioned datasets. Each partitioned model is then trained using its respective partitioned dataset.
[0065] The processor may retrain only the partitioned models for the target group based on the retraining data.
[0066] The processor may exclude the partitioned model for any group with zero contribution frequency from the learning model. If there is data in the log data that is not related to the question, the data and the partial LLM generated from that data may be excluded from the system or moved to a low-priority knowledge database. This allows for a streamlined dataset and model.
[0067] Figure 9 is a block diagram showing examples of hardware configurations of various devices or systems according to the embodiments of this disclosure.
[0068] The hardware configuration of the device shown in Figure 9 may correspond to any of the following: data preparation device 120, learning device 140, inference device 160, retraining device 180, or AI system 100. In other words, each of these devices may have the hardware shown in Figure 9. Furthermore, two or more of these devices may be integrated into a single device.
[0069] The device shown in Figure 9 comprises a processor 201, main memory 202, storage device 203, communication device 204, input device 205, display device 206, and bus 207. The processor 201, main memory 202, storage device 203, communication device 204, input device 205, and display device 206 are connected to each other via bus 207.
[0070] The processor 201 functionally implements various processes performed by the device by loading and executing programs and data stored in the main memory 202 or storage device 203. The main memory 202 and storage device 203 store programs and various data used by the device. The communication device 204 has the function of communicating with external devices. The input device 205 consists of, for example, a keyboard, mouse, or touch panel, and accepts user input. The display device 206 consists of, for example, a display, and displays information to the user.
[0071] The processor may assign a weight to each of the divided data groups according to the contribution frequency of that group, and create retraining data of an amount corresponding to the weight of the group to be retrained, based on the input and output data corresponding to the group to be retrained. Such processing may be performed, for example, by the retraining device 180.
[0072] The ID may be stored in an external database located outside the device. The storage device further stores an external database containing identification information for each group that identifies that group. The processor performs inference using a learning model to determine which group's data contributed to the output of the inference result. When the processor refers to the external database to identify the identification information for the identified group, it associates the identification information with the input / output data.
[0073] The identification information that identifies each member of the group may be a string of characters whose name does not appear in user input, and may be represented by special symbols that do not appear in user input. The identification information referred to here corresponds to the aforementioned ID.
[0074] The embodiments of the present invention described above are illustrative for the purpose of explaining the present invention and are not intended to limit the scope of the present invention to those embodiments only. Those skilled in the art can implement the present invention in various other forms without departing from the scope of the present invention. Furthermore, embodiments of the present invention include the following matters. However, the matters included in embodiments of the present invention are not limited to those listed below.
[0075] (Item 1) As described above, in a device that retrains a learning model, the device includes a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. The learning model is retrained based on the aforementioned retraining data. This allows for efficient retraining to improve the accuracy of the learning model, as retraining is performed using data that frequently contributes to the generation of output by the learning model and is expected to improve inference accuracy through retraining.
[0076] (Item 2) In the apparatus described in item 1, if there are two or more input groups with different characteristics among the inputs of the input / output data associated with the group, the processor divides the group into two or more groups for each characteristic. This allows for further division of the group, reducing the amount of data required for retraining and improving the efficiency of retraining.
[0077] (Item 3) In the apparatus described in item 1, the processor calculates the contribution frequency and accuracy change of each group using two or more classification methods, and selects the group to be retrained using one of the selected classification methods. This allows for more effective retraining by selecting and using the most suitable classification method from among multiple options.
[0078] (Item 4) In the apparatus described in item 1, The learning model includes one base model for all groups and multiple partitioned models for each of the groups. The processor retrains only the partitioned model relating to the group to be retrained, based on the retraining data. This allows for efficient retraining because only the classification model of the target group is retrained using the retraining data for that group.
[0079] (Item 5) In the apparatus described in item 1, The aforementioned processor, Each of the aforementioned groups is assigned a weight corresponding to the frequency of its contribution, Based on the input and output data corresponding to the group to be retrained, retraining data is created with an amount of data corresponding to the weight of the group to be retrained. This allows for the creation of retraining data that takes into account the contribution frequency of each group, enabling highly accurate retraining.
[0080] (Item 6) In the apparatus described in item 2, The processor calculates the degree of similarity between the inputs of the input and output data, and if there are two or more input groups with a similarity of a predetermined value or higher among the inputs of the input and output data associated with the group, it divides the group into two or more groups for each input group. This allows the group to be further divided based on similarity, reducing the amount of data required for retraining and improving the efficiency of retraining.
[0081] (Item 7) In the apparatus described in item 4, If there is a group with a contribution frequency of zero, the processor excludes the classification model for that group from the learning model. This allows for a reduction in the amount of data and training models that need to be stored.
[0082] (Item 8) In the apparatus described in item 1, The aforementioned processor, The user input provided by the user is given an instruction to add identification information to the output indicating which group of data contributed to the inference result, and this instruction is then input to the learning model. From the identification information output from the aforementioned learning model, we identify which group of data contributed to the output of the inference result. The user is presented with the user output obtained by removing the identification information from the output of the learning model. This allows us to obtain group identification information without compromising user convenience.
[0083] (Item 9) In the apparatus described in item 1, The aforementioned storage device further stores an external database in which identification information for each group is set to identify that group, The processor performs inference using the learning model to identify which group of data contributed to the output of the inference result, and then, by referring to the external database, identifies the identification information of the identified group and associates the identification information with the input and output data. This allows the learning model to accurately identify the groups that contributed to the output of the inference results by storing identification information in an external database.
[0084] (Item 10) In the apparatus described in item 1, The identification information that identifies each member of the group is a string of characters whose name does not appear in user input, and is represented by special symbols that do not appear in user input. This makes it possible to avoid confusion between identification information and user input, and to accurately identify identification information in the output of the learning model.
[0085] (Item 11) In a retraining method for retraining a learning model using a device comprising a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. A retraining method for retraining the learning model based on the aforementioned retraining data. This allows for efficient retraining to improve the accuracy of the learning model, as retraining is performed using data that frequently contributes to the generation of output by the learning model and is expected to improve inference accuracy through retraining.
[0086] (Item 12) A retraining program for a device comprising a memory device and a processor, wherein the program is used to retrain a learning model. The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. A function to create relearning data based on the input and output data corresponding to the relearning target group according to the above conditions, The system is made to retrain the learning model based on the aforementioned retraining data. This allows for efficient retraining to improve the accuracy of the learning model, as retraining is performed using data that frequently contributes to the generation of output by the learning model and is expected to improve inference accuracy through retraining. [Explanation of symbols]
[0087] 100…AI system, 102…Data, 104…User, 120…Data preparation device, 122…Data splitting unit, 124…ID assignment unit, 126…Storage device, 140…Learning device, 142…Learning processing unit, 144…Evaluation processing unit, 146…Model deployment unit, 148…Storage device, 160…Inference device, 162…AI model, 164…ID assignment unit, 166…Inference execution unit, 168…Storage device, 180…Retraining device 182...Learning Processing Unit, 184...Evaluation Processing Unit, 186...Data Selection Unit, 190...Data Reclassification Unit, 201...Processor, 202...Main Memory, 203...Storage Device, 204...Communication Device, 205...Input Device, 206...Display Device, 207...Bus, 314...Storage Device, 332...ID Assignment Processing Unit, 338...Log Data, 340...Storage Device, 434...Data Generation Unit, 452...AI Model, 454...AI Model
Claims
1. A relearning device for relearning a learning model, Equipped with a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. A relearning device that relearns the learning model based on the aforementioned relearning data.
2. The processor, if there are two or more input groups with different characteristics among the input / output data associated with the group, divides the group into two or more groups according to the characteristics. The relearning device according to claim 1.
3. The aforementioned processor, The frequency of contribution and accuracy changes for each group using two or more classification methods are calculated. The group to be retrained is selected using one of the chosen classification methods. The relearning device according to claim 1.
4. The learning model includes one base model for all groups and multiple partitioned models for each of the groups. The processor retrains only the partitioned model relating to the group to be retrained, based on the retraining data. The relearning device according to claim 1.
5. The aforementioned processor, Each of the aforementioned groups is assigned a weight corresponding to the frequency of its contribution, Based on the input and output data corresponding to the group to be retrained, retraining data is created in an amount corresponding to the weight of the group to be retrained. The relearning device according to claim 1.
6. The processor calculates the degree of similarity between the inputs of the input and output data, and if there are two or more input groups with a similarity of a predetermined value or higher among the inputs of the input and output data associated with the group, it divides the group into two or more groups for each input group. The relearning device according to claim 2.
7. The retraining device according to claim 4, wherein the processor excludes the classification model relating to a group from the learning model if there is a group whose contribution frequency is zero.
8. The aforementioned processor, The user input provided by the user is given an instruction to add identification information to the output indicating which group of data contributed to the inference result, and this instruction is then input to the learning model. From the identification information output from the aforementioned learning model, we identify which group of data contributed to the output of the inference result. The user is presented with the user output obtained by removing the identification information from the output of the learning model. The relearning device according to claim 1.
9. The storage device further stores an external database in which identification information for each group is set to identify that group. The processor, after the learning model performs inference and identifies which group of data contributed to the output of the inference result, and after referring to the external database to identify the identification information of the identified group, associates the identification information with the input and output data. The relearning device according to claim 1.
10. The relearning device according to claim 1, wherein the identification information that identifies each of the groups is a string of characters whose name does not appear in user input, and the identification information is represented by special symbols that do not appear in user input.
11. A retraining method for retraining a learning model using a device comprising a memory device and a processor, The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. A retraining method for retraining the learning model based on the aforementioned retraining data.
12. A retraining program for a device comprising a memory device and a processor, wherein the program is used to retrain a learning model. The storage device stores training data divided into two or more groups based on predetermined conditions, and a training model trained using the training data. The aforementioned processor, When the learning model performs inference in response to the input, it identifies which group of data contributed to the output of the learning model's inference result, and associates the identified group with the input and output data. For each of the aforementioned groups, the contribution frequency, which is the frequency with which data corresponding to that group contributed to the output of the inference result by the learning model, and the accuracy change, which is the change from the inference accuracy measured in the training data to the inference accuracy measured by test data created based on the input and output data associated with that group, are calculated. Based on the contribution frequency and accuracy change of each of the aforementioned groups, a group to be retrained is selected from among the aforementioned groups. Based on the input and output data corresponding to the group to be retrained according to the above conditions, retraining data is created. The learning model is retrained based on the aforementioned retraining data. A relearning program that enables a task to be performed.