Learning apparatus, method and program

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The learning device addresses computational cost and accuracy issues in machine learning by using scaling coefficients to selectively use training data based on errors, reducing costs and maintaining accuracy.

JP2026100172APending Publication Date: 2026-06-19DENSO TEN LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: DENSO TEN LTD
Filing Date: 2024-12-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Dec 2024

Application

19 Jun 2026

Publication

JP2026100172A

IPC: G06N3/08

AI Tagging

Application Domain

Neural learning methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026100172000001_ABST

Patent Text Reader

Abstract

To reduce the computational cost of machine learning while maintaining its accuracy. [Solution] Machine learning is performed on the target model for multiple epochs based on the training data set. Reference training data is set from all the training data. In the first epoch, all the training data is set as reference training data (S14). From the second epoch onward, only a portion of all the training data is set as reference training data. In each epoch, the training error for each reference training data is scaled using the corresponding scaling coefficient, and the training parameters of the target model are updated based on the result (S22-S24). From the second epoch onward, based on the latest training error, a portion of all the training data is set as reference training data for the next epoch, and the scaling coefficient corresponding to the reference training data for the next epoch is updated.

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This invention relates to a learning device, method, and program. [Background technology]

[0002] There is a growing development of techniques to enable models to perform various tasks through machine learning using models composed of DNNs (Deep Neural Networks). In recent years, with the emergence of large-scale language models and foundational models, the size of models (number of DNN parameters) has increased dramatically. As a result, the computational cost during training has also increased significantly, and techniques to reduce computational cost have been proposed (see Patent Documents 1 and 2 below). [Prior art documents] [Patent Documents]

[0003] [Patent Document 1] Japanese Patent Publication No. 2023-13082 [Patent Document 2] Japanese Patent Publication No. 2022-158224 [Overview of the project] [Problems that the invention aims to solve]

[0004] Since computational cost is proportional to the amount of data used for training, reducing the amount of data used for training is effective in reducing computational cost. A simple method (first reference method) is to reduce the training data by random sampling. However, the first reference method may remove data that is important for training, raising concerns about a decrease in accuracy. Another method (second reference method) is active learning. In active learning, training data is scored using an existing pre-trained model, and data is removed in order of lowest score (corresponding to the importance of training), thereby reducing the amount of training data while suppressing a decrease in accuracy. However, the second reference method requires the creation of a pre-trained model for scoring in advance and the processing of inference on all training data for scoring. Additional calculations are required to satisfy these requirements (incurring additional computational costs).

[0005] The present invention aims to provide a technology that reduces the computational cost of machine learning while maintaining the accuracy of machine learning. [Means for solving the problem]

[0006] The learning device according to the present invention is a learning device that performs machine learning on a target model for multiple epochs based on a set of learning data, and includes a controller. The controller assigns a scaling coefficient to each of the learning data that constitutes the set of learning data. In each epoch, the controller derives a training error for each reference learning data set selected from the set of learning data, and then updates the learning parameters of the target model based on scaling error information obtained by scaling each derived training error using the corresponding scaling coefficient. In the first epoch, the controller sets each of the learning data as reference learning data for the first epoch and sets a reference value for each scaling coefficient for each reference learning data. After the j-th epoch and before the (j+1)th epoch, the controller sets a portion of the learning data as reference learning data for the (j+1)th epoch and updates the scaling coefficient corresponding to the reference learning data for the (j+1)th epoch based on the latest training errors derived for each of the learning data that constitutes the set of learning data, where j is an integer of 1 or more. The controller derives the scaling error information in the (j+1)th epoch by multiplying the training error derived for the reference learning data for the (j+1)th epoch by the corresponding scaling coefficient. [Effects of the Invention]

[0007] A portion of the training data is set as reference training data for the (j+1)th epoch, and in the (j+1)th epoch, the training of the target model proceeds by deriving the training error for the reference training data. That is, in the (j+1)th epoch, only a portion of the total training data is used for training, and the remainder is excluded from the data used for training. This reduces the computational cost in machine learning. Since the data to be excluded from the data used for training (removed data) is identified based on the training error that is inherently derived in machine learning, no particular additional computational cost is incurred. There is a concern that setting removed data may introduce some bias into the data used for training, but the effect of bias can be reduced by scaling using the scaling coefficient described above. By reducing the effect of bias, the degradation of the accuracy of machine learning (inference accuracy of the trained model) can be suppressed. [Brief explanation of the drawing]

[0008] [Figure 1] This is an overall configuration diagram of a learning system according to an embodiment of the present invention. [Figure 2] This figure shows the input and output data of a model according to an embodiment of the present invention. [Figure 3] This figure shows an image dataset, which is an example of a training data set, according to an embodiment of the present invention. [Figure 4] This figure shows how training error, scaling coefficient, and removal flag are associated with training data according to an embodiment of the present invention. [Figure 5] This is a structural diagram of a table referenced in a machine learning process according to an embodiment of the present invention. [Figure 6] This is a flowchart of the machine learning process in a learning device according to an embodiment of the present invention. [Figure 7] This is a flowchart of the machine learning process in a learning device according to an embodiment of the present invention. [Figure 8] This is a flowchart of the removal process according to a first embodiment belonging to the present invention. [Figure 9]This figure shows a specific example of the flow of a machine learning process according to a first embodiment belonging to the embodiments of the present invention. [Figure 10] This figure shows a specific example of the flow of a machine learning process according to a first embodiment belonging to the embodiments of the present invention. [Figure 11] This is a conceptual diagram of scaling relating to a first embodiment of the present invention. [Figure 12] This figure shows a sorting array for multiple evaluation training errors, relating to a first embodiment of the present invention. [Figure 13] This is a flowchart of the removal process according to a second embodiment belonging to the embodiments of the present invention. [Figure 14] This is a flowchart of the removal process according to a third embodiment of the present invention. [Figure 15] This figure shows a third embodiment belonging to the embodiments of the present invention, and illustrates the first and second examples of specific numerical ranges. [Figure 16] This is a flowchart of the removal process according to a fourth embodiment belonging to the embodiments of the present invention. [Figure 17] This is a functional block diagram of a controller relating to a sixth embodiment of the present invention. [Modes for carrying out the invention]

[0009] Hereinafter, examples of embodiments of the present invention will be specifically described with reference to the drawings. In each of the referenced drawings, the same parts are denoted by the same reference numerals, and redundant descriptions relating to the same parts are omitted as a general rule. In addition, in this specification, for the sake of simplification of the description, symbols or reference numerals that refer to information, signals, physical quantities, functional parts, circuits, elements, or components may be indicated, and the names of the information, signals, physical quantities, functional parts, circuits, elements, or components corresponding to such symbols or reference numerals may be omitted or abbreviated.

[0010] Figure 1 shows the overall configuration of a learning system (machine learning system) according to an embodiment of the present invention. The learning system in Figure 1 comprises a learning device 10, which is a machine learning device, and a database 20. The learning device 10 is connected to a communication network including the Internet and intranets. The learning device 10 may be composed of one or more computer devices (server devices) connected to the communication network. The learning device 10 may also be configured using cloud computing. The learning device 10 comprises a controller 11, a memory 12, and a communication unit 13. The learning described in this embodiment is machine learning.

[0011] The controller 11 comprehensively controls the operation of each part of the learning device 10. The controller 11 is equipped with a processing unit including a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an NPU (Neural Processing Unit) as hardware resources. The controller 11 may implement the following functions to be realized by executing a program recorded in the memory 12, the database 20, or any other recording medium (not shown).

[0012] Memory 12 is comprised of non-volatile memory such as ROM (Read-only memory) or flash memory, and volatile memory such as RAM (Random Access Memory). Memory 12 stores various data referenced by the controller 11, as well as various programs to be executed by the controller 11.

[0013] The communication unit 13 transmits and receives arbitrary signals between the learning device 10 and a different counterpart device. The counterpart device for the communication unit 13 includes the database 20, as well as any computer device connected to the communication network. The controller 11 can transmit and receive arbitrary information with the counterpart device using the communication unit 13, but the description of the communication unit 13 may be omitted below.

[0014] The database 20 is a large-capacity recording medium that stores and holds the training data set 21. The training data set 21 comprises multiple training data E. Let the multiple training data E be M training data E, and let the M training data E be denoted as training data E[1] to E[M]. M represents an integer greater than or equal to 2, and basically has an integer value much larger than 2 (for example, several hundred thousand to several billion). The database 20 is connected to the learning device 10 by wire or wireless. The controller 11 can freely read any data held in the database 20. Furthermore, the controller 11 may also be able to freely write any data to the database 20. The database 20 may also be a collection of multiple physically separate recording media, in which case the training data set 21 is held in multiple recording media. Part or all of the database 20 may be provided in the learning device 10.

[0015] Model 110 is constructed in Controller 11. Model 110 is the model (target model) that will be subjected to machine learning. Since machine learning is performed on Model 110, Model 110 can also be called the learning model. Figure 2 shows the input and output data of Model 110. Model 110 is an algorithm that generates model output data from model input data. Model input data and model output data may be referred to by the codes "Din" and "Dout", respectively. Model 110 is composed of DNN 120. DNN 120 is a deep neural network. DNN 120 is configured using hardware resources within Controller 11. Controller 11 performs machine learning on Model 110 using the training data set 21. Note that machine learning on Model 110 can also be called machine learning on DNN 120. In this embodiment, the machine learning performed in Controller 11 is assumed to be supervised machine learning.

[0016] Model 110, which has undergone machine learning, is a trained model (inference model) that performs predetermined inferences. The trained model may be, for example, an object detector that performs object detection as an inference. In object detection, the region in which an object image exists and the type of that object are estimated (in other words, detected) within the two-dimensional image input to Model 110.

[0017] When object detection is performed as inference, the training data set 21 is the image dataset 21a shown in Figure 3. The image dataset 21a consists of a large amount of training data, which includes image data and ground truth data for training images. In the image dataset 21a, ground truth data is added to each training image. Each training image in the image dataset 21a is a two-dimensional image containing images of various types of objects. The ground truth data for a given training image includes information indicating the type of object present in that training image, and information identifying the location and shape of the region where the image of the object exists in that training image. The controller 11 can construct a model 110 that performs object detection as inference by running machine learning on the model 110 using the image dataset 21a. One training data set E is formed by a pair of one training image and one ground truth data that correspond to each other. Therefore, the image dataset 21a has M pairs of pairs of one training image and one ground truth data that correspond to each other.

[0018] The process in which machine learning is performed is called the machine learning process. The machine learning process is executed by the controller 11. The machine learning process when building a model 110 for object detection using machine learning will be described below. For the sake of clarity, the training image of interest will be referred to as the "interest image". In the machine learning process, the controller 11 inputs the image data of the interest image as model input data Din to the model 110. The model 110 then estimates the region in which the image of the object exists and the type of the object based on the model input data Din, and generates model output data Dout that shows the estimation result. The ground truth data for the interest image indicates the correct content that should be estimated by the model 110.

[0019] In the machine learning process, the controller 11 derives a training error, which represents the error between the model output data Dout and the ground truth data for the image of interest. The controller 11 then adjusts the parameters of the model 110 using backpropagation to reduce the value of the training error. The parameters of the model 110 are the parameters of the DNN 120, including the weight and bias parameters of the DNN 120. The above adjustment is repeated until a predetermined learning termination condition is met, at which point the machine learning process ends. After the machine learning process, the model 110 functions as an object detector (inference model) that performs object detection as inference. Hereinafter, the parameters of the model 110 that are adjusted in the machine learning process will be referred to as learning parameters.

[0020] Object detection is a type of image recognition. The inference performed in Model 110 can be any type of image recognition (such as image classification or semantic segmentation). If the inference performed in Model 110 is image recognition, a Convolutional Neural Network can be used as DNN120. In the following, for the sake of detail, unless otherwise specified, we will assume that the training data set 21 is an image dataset 21a, and that Model 110 functions as an object detector after machine learning.

[0021] Referring to Figure 4, the controller 11 associates each training data E with a training error, a scaling coefficient, and a removal flag, one for each. Hereafter, the training error will be referred to as "L" as appropriate, the scaling coefficient as "r" as appropriate, and the removal flag as "FG" as appropriate. The scaling coefficient r is a coefficient (scalar quantity) with a positive value. The removal flag FG is binary data with a value of "0" or "1".

[0022] Figure 5 shows the structure of the table TBL referenced by the controller 11 during the machine learning process. The table TBL is stored in memory 12 or database 20. The table TBL stores (in other words, holds) the training error L, scaling coefficient r, and removal flag FG corresponding to each of the training data E[1] to E[M]. The training error L, scaling coefficient r, and removal flag FG corresponding to the training data E[i] will be denoted below as training error L[i], scaling coefficient r[i], and removal flag FG[i], respectively. i represents any integer. The integer i in the training error L[i], scaling coefficient r[i], and removal flag FG[i] represents an integer between 1 and M (the same applies to the integer i in the training data E[i]).

[0023] The training errors L[1]~L[M] corresponding to the training data E[1]~E[M] are stored in table TBL. Similarly, the scaling coefficients r[1]~r[M] corresponding to the training data E[1]~E[M] are stored in table TBL. Similarly, the removal flags FG[1]~FG[M] corresponding to the training data E[1]~E[M] are stored in table TBL. During the machine learning process, the training errors L[1]~L[M], scaling coefficients r[1]~r[M], and removal flags FG[1]~FG[M] in table TBL are sequentially updated to the latest versions. Note that for the sake of illustration and explanation, Figure 5 is shown as if the training data E[1]~E[M] is stored in table TBL, but in reality, the training data E[1]~E[M] is not included in table TBL (although it may be included). The table TBL only needs to show that the training error L[i], scaling coefficient r[i], and removal flag FG[i] correspond to the training data E[i]. The same applies to the other diagrams shown later that illustrate the state of the table TBL.

[0024] Figures 6 and 7 show flowcharts of the machine learning process. The machine learning process includes the processes in steps S11 to S16, as well as the epoch process in step S20 and the removal process in step S30. The processes and steps in steps S11 to S16, S20 and S30 are executed by the controller 11. The controller 11 executes the processes and steps in steps S11 to S16, S20 and S30 by executing a learning program triggered by a predetermined operation or other input from the operator to the learning device 10. The learning program may be a program recorded in memory 12, database 20, or any other recording medium (not shown). The learning program may consist of multiple programs, and the processes and steps shown in Figures 6 and 7 may be realized by the controller 11 through the execution of these multiple programs.

[0025] In the machine learning process, first, in step S11, the controller 11 initializes the model 110. Initialization of the model 110 sets its state to a predetermined initial state. In this initial state, the model 110's learning parameters have predetermined initial parameter values. At step S11, the model 110 is an untrained model before machine learning is performed. After step S11, the process proceeds to step S12.

[0026] In step S12, the controller 11 initializes the scaling coefficient r and each removal flag FG in the table TBL. Specifically, in step S12, the controller 11 initializes the scaling coefficients r[1]~r[M] in the table TBL to the reference value VAL REF Initialize the scaling coefficients r[1] to r[M] in the table TBL using the reference value VAL. REF Set the reference value VAL. REF It is 1. However, the reference value VAL REF It may have a positive value other than 1. Also, in step S12, the controller 11 sets the removal flags FG[1] to FG[M] in table TBL to "0". After step S12, the process proceeds to step S13.

[0027] In step S13, the controller 11 assigns "1" to the variable j that it manages. Note that the execution order of steps S11 to S13 is arbitrary, and they may be performed simultaneously. After step S13, the process proceeds to step S14.

[0028] In step S14, the controller 11 sets all of the training data E[1] to E[M] as reference training data. In the epoch process described later, machine learning of the model 110 is performed based on the reference training data. The collection of all reference training data is called the reference data group P[j]. At the stage of step S14, "j=1", so the reference data group P[1] is set in step S14. The reference data group P[1] is a collection of M reference training data, which are the training data E[1] to E[M]. Note that the reference training data corresponding to training data E[i] is sometimes written as reference training data E[i]. After step S14, the process proceeds to step S20.

[0029] In step S20, the controller 11 executes the epoch process. In the machine learning process shown in Figures 6 and 7, the controller 11 performs machine learning for multiple epochs on the model 110 based on the training data set 21. Performing machine learning for one epoch on the model 110 is equivalent to executing the epoch process in step S20 once. In the machine learning process, the epoch process in step S20 is repeatedly executed until a predetermined learning completion condition is met. Therefore, in the machine learning process, machine learning for multiple epochs is performed on the model 110.

[0030] The epoch process of step S20 consists of the processes of steps S21 to S25, starting with the process of step S21. Therefore, after step S14, the process proceeds to step S21. In this specification, among epoch processes that are executed multiple times, the i-th epoch process is referred to as the i-th epoch process or simply the i-th epoch. Accordingly, in step S20 when "j=1", the first epoch process is executed; in step S20 when "j=2", the second epoch process is executed; and in step S20 when "j=3", the third epoch process is executed. The same applies when "j≧4".

[0031] In step S21, the controller 11 samples a mini-batch of reference training data from the reference data set P[j]. Mini-batch learning is used in machine learning in the controller 11. A mini-batch is data from the reference data set P[j] that has a predetermined mini-batch size. For example, if the reference data set P[j] contains image data of 10,000 training images, the mini-batch contains image data of 100 of those training images, and also contains the correct answer data for those 100 training images. The sampled mini-batch of reference training data is temporarily stored in memory 12 and used for subsequent processing. After step S21, the process proceeds to step S22.

[0032] In step S22, the controller 11 derives a training error L for each reference training data sampled in step S21, and stores the derived training error L in table TBL, associating it with the corresponding training data E. More specifically, in step S22, the controller 11 inputs the image data of the training images included in the reference training data as model input data Din to the model 110 for each reference training data sampled in step S21. The model 110 generates model output data Dout from the model input data Din using the current training parameters. For each reference training data, the ground truth data included in the reference training data corresponds to the ground truth value for the model output data Dout. For each reference training data, the controller 11 derives a training error L as data representing the error ERR between the model output data Dout based on the reference training data and the ground truth data. The training error L has a value that indicates the magnitude of the error ERR, and this value (scalar quantity) is used for comparison with a threshold described later.

[0033] The table TBL stores the latest training error L derived in step S22 for each training data E. That is, if a training error L[i] is derived in step S22 of the j-th epoch, the training error L[i] in the table TBL is updated with the training error L[i] derived in step S22 of the j-th epoch. For example, if a training error L[1] is derived in step S22 of the first epoch, the training error L[1] derived in step S22 of the first epoch is set as the training error L[1] in the table TBL. Subsequently, if a training error L[1] is derived in step S22 of the second epoch, the training error L[1] in the table TBL is updated with the training error L[1] derived in step S22 of the second epoch. The same applies to the third epoch and subsequent epochs, and to training errors other than training error L[1]. After step S22, the process proceeds to step S23.

[0034] In step S23, the controller 11 performs a scaling process on each training error L derived in step S22. In the scaling process, each training error L derived in step S22 is multiplied by the corresponding scaling coefficient r. For the sake of clarity, the training error L[i] before the scaling process (the training error L[i] derived in step S22 itself) is specifically represented by the symbol "La[i]". The training error L[i] after the scaling process is specifically represented by the symbol "Lb[i]". Then, in step S23, the training error Lb[i] is derived according to the following equation (10). After step S23, the process proceeds to step S24. Lb[i] = r[i] × La[i] ... (10)

[0035] In step S24, the controller 11 updates the learning parameters of the model 110 using backpropagation based on each training error L after scaling. After step S24, the process proceeds to step S25. The information consisting of each training error L after scaling is referred to as scaling error information. The series of processes consisting of steps S21 to S24 is referred to as mini-batch learning. In the j-th epoch, mini-batch learning is performed on all the reference learning data in the reference data set P[j].

[0036] In other words, in step S25, the controller 11 checks whether mini-batch unit learning has been completed for all the reference learning data that make up the reference data set P[j]. If mini-batch unit learning has been completed for all the reference learning data that make up the reference data set P[j] (Y in step S25), the process proceeds from step S25 to step S15. If mini-batch unit learning has not been completed for all the reference learning data that make up the reference data set P[j] (N in step S25), the process returns from step S25 to step S21 and the mini-batch unit learning is repeated. However, in the multiple steps S21 that are repeatedly executed in one epoch process, the reference learning data that is sampled differs from one another among the multiple steps S21.

[0037] For example, consider the case where the reference data set P[j] consists of training data E[1]~E

[10000] and the mini-batch size of training data E is 100 units. In this case, in the j-th epoch, mini-batch unit learning is repeated 100 times, and training errors L[1]~L

[10000] are derived for the training data E[1]~E

[10000] . In the j-th epoch, scaling is performed on the training errors L[1]~L

[10000] , and scaling error information consisting of training errors Lb[1]~Lb

[10000] is obtained. Then, in the j-th epoch, when the learning parameters of model 110 are updated based on the scaling error information (training errors Lb[1]~Lb

[10000] ), the j-th epoch ends and the process proceeds from step S25 to step S15.

[0038] In step S15, the controller 11 determines whether a predetermined learning termination condition has been met. The learning termination condition is met when learning converges. For example, in step S15, the controller 11 determines that learning has converged when the average value of the training errors L[1] to L[M] in the table TBL is less than or equal to a predetermined small value. The machine learning process in Figures 6 and 7 may be executed for a maximum of J epochs. J represents any integer greater than or equal to 2. Therefore, for example, if "j=J" is met in step S15, it may be determined that the learning termination condition has been met. Alternatively, for example, it may be determined that the learning termination condition has been met when a predetermined maximum learning time has elapsed from the start time of the machine learning process. If the learning termination condition is met in step S15 (Y in step S15), the machine learning process is terminated. The model 110 with the learning parameters after the machine learning process is a trained model and functions as an object detector capable of performing good inference. If the learning termination condition is not met in step S15 (N in step S15), the process proceeds from step S15 to step S30 (see Figure 7).

[0039] In step S30, the controller 11 performs the removal process. The removal process in step S30 consists of the processes in steps S31 to S34, starting with the process in step S31. Therefore, if the learning termination condition is not met in step S15 (N in step S15), the process proceeds to step S31.

[0040] In step S31, the controller 11 reinitializes the scaling coefficients r and each removal flag FG in the table TBL. Specifically, in step S31, the controller 11 resets the scaling coefficients r[1] to r[M] in the table TBL to the reference value VAL REF Initialize the scaling coefficients r[1] to r[M] in the table TBL using the reference value VAL. REF Set the reference value VAL as described above. REF The value is 1. Also, in step S31, the controller 11 sets the removal flags FG[1] to FG[M] in the table TBL to "0". After step S31, the process proceeds to step S32.

[0041] In step S32, the controller 11 sets the data to be removed from the training data E[1] to E[M] based on the training error L for each data in the table TBL. Some of the training data E[1] to E[M] are set as the data to be removed. The table TBL stores the latest training error L derived for each of the training data E[1] to E[M]. Therefore, the controller 11 in step S32 sets the data to be removed based on the latest training error L derived for each of the training data E[1] to E[M]. In step S32, the controller 11 sets the removal flag FG corresponding to the training data E set as the data to be removed to "1". The setting for the removal flag FG is reflected in the table TBL. That is, for example, if training data E[i] is set as the data to be removed in step S32, the value of the removal flag FG[i] in the table TBL changes from "0" to "1". After step S32, the process proceeds to step S33.

[0042] In step S33, the controller 11 sets the training data E[1] to E[M] other than the training data E set as the reference training data for the next epoch. As can be understood from this, the training data to be removed refers to the training data E[1] to E[M] that is excluded from the reference training data for the next epoch. Therefore, in steps S32 and S33, each of the training data E[1] to E[M] is classified as either the training data to be removed or the reference training data for the next epoch based on the training error L in the table TBL. "FG[i]=1" indicates that the training data E[i] is the training data to be removed, and "FG[i]=0" indicates that the training data E[i] is not the training data to be removed.

[0043] After the j-th epoch process is completed and before the (j+1)th epoch process begins, the next epoch refers to the (j+1)th epoch. Therefore, in step S33, reference learning data for the (j+1)th epoch is set. That is, for example, after the first epoch process is completed and before the second epoch process begins, in step S33, reference learning data for the second epoch is set. Similarly, for example, after the second epoch process is completed and before the third epoch process begins, in step S33, reference learning data for the third epoch is set. The same applies after the third epoch process is completed, etc. The collection of reference learning data for the (j+1)th epoch is denoted as reference data group P[j+1]. For example, consider the case where "M=10000" and in step S32 immediately after the completion of the first epoch process, only the learning data E[1]~E

[1000] is set as the data to be removed (in other words, classified). In this case, the training data E

[1001] to E

[10000] are the reference training data for the second epoch and constitute the reference data group P[2]. Alternatively, consider the case where “M=10000” and, immediately after the end of the second epoch process in step S32, only the training data E

[1001] to E

[2000] are set as the data to be removed (in other words, classified). In this case, the training data E[1] to E

[1000] and E

[2001] to E

[10000] are the reference training data for the third epoch and constitute the reference data group P[3]. Specific examples of how to set the data to be removed will be described later. After step S33, proceed to step S34.

[0044] In step S34, the controller 11 performs an update process for the scaling coefficient r. In the update process for the scaling coefficient r, depending on the processing content of steps S32 and S33, a reference value VAL is applied to some of the scaling coefficients r from r[1] to r[M]. REF Set a value greater than (details below). After step S34, proceed to step S16.

[0045] In step S16, the controller 11 adds "1" to the variable j. After step S16, it returns to step S20 (see FIG. 6), and the j-th epoch process for the incremented variable j is performed. Thus, for example, after the end of the first epoch process and before the start of the second epoch process, in step S33 of FIG. 7, the reference learning data for the second epoch is set to form the reference data group P[2]. Thereafter, in the second epoch process, each of the processes of steps S21 to S24 is executed for the reference data group P[2]. After the end of the second epoch process and before the start of the third epoch process, in step S33 of FIG. 7, the reference learning data for the third epoch is set to form the reference data group P[3]. Thereafter, in the third epoch process, each of the processes of steps S21 to S24 is executed for the reference data group P[3]. The same applies after the end of the third epoch process. The reference data group P[1] includes the learning data E[1] to E[M] as M reference learning data, but the reference data group P[j] satisfying "j≧2" includes only a part of the learning data E[1] to E[M] as reference learning data. j A and j B represent two or more different integers, the reference learning data constituting the reference data group P[j A and the reference learning data constituting the reference data group P[j B may coincide with each other or may be different from each other.

[0046] Hereinafter, among a plurality of practical examples, some specific operation examples, application technologies, modification technologies, etc. related to the learning system will be described. The matters described above in this embodiment are applied to the following respective examples unless otherwise specified and without contradiction. In each example, if there is a matter conflicting with the above matters, the description in each example may be prioritized. Also, without contradiction, among the following plurality of examples, the matters described in any one example can be applied to any other example (that is, it is also possible to combine any two or more of the plurality of examples).

[0047] <<First Embodiment>> The first embodiment will now be described. Figure 8 is a flowchart of the removal process according to the first embodiment. In addition to the flowchart of the removal process, Figure 8 also includes a conceptual diagram CC1 of the data classification method according to the first embodiment. The removal process in Figure 8 consists of steps S131 to S138, starting from step S131. The removal process in Figure 8 can be used as the removal process for step S30 in Figure 7. In this case, if the learning termination condition is not met in step S15 of Figure 6, the process proceeds to (N of step S15) step S131, then to step S138 in Figure 8, and then to step S16 in Figure 7. Step S131 corresponds to step S31 in Figure 7. Steps S132 to S137 constitute steps S32 and S33 in Figure 7. Step S138 corresponds to step S34 in Figure 7.

[0048] In the removal process shown in Figure 8, the process in step S131 is performed first. The process in step S131 is the same as the process in step S31 in Figure 7. That is, in step S131, the controller 11 sets the scaling coefficients r[1]~r[M] in the table TBL to the reference value VAL REF Initialize the scaling coefficients r[1] to r[M] in the table TBL using the reference value VAL. REF Set the reference value VAL as described above. REF The value is 1. Also, in step S31, the controller 11 sets the removal flags FG[1] to FG[M] in the table TBL to "0". After step S131, the process proceeds to step S132.

[0049] In step S132, the controller 11 evaluates each training error L in the table TBL. EV Treat each evaluation training error L as such. EV This is compared to the threshold TH. Note that the evaluation training error L is also considered. EV The training error L[i] treated as the evaluation training error L EVIt is sometimes denoted as [i] (as in the other examples described later). As mentioned above, the table TBL stores the latest training error L derived for each of the training data E[1] to E[M]. Therefore, the evaluation training error L EV [i] is the latest training error L derived for the training data E[i]. In step S132, the controller 11 can, for example, derive the average of the training errors L[1] to L[M] in the table TBL and set it as the threshold TH. After step S132, the process proceeds to step S133.

[0050] In step S133, the controller 11 evaluates the corresponding training error L for each training data E that constitutes the training data set 21. EV By determining whether the first inequality is satisfied, the training data E is classified into either candidate data or first introduction data. The candidate data is the training data E that corresponds to the candidate data to be removed, and therefore the data to be removed will be selected from the group of candidate data. The first inequality is given by equation (1a) below. Alternatively, the first inequality may be given by equation (1b) below. L EV <TH ···(1a) L EV ≤TH ···(1b)

[0051] Evaluation training error L corresponding to training data E[i] EV If [i] satisfies the first inequality, the training data E[i] is classified as candidate data. The evaluation training error L corresponds to the training data E[i]. EV If [i] does not satisfy the first inequality, the training data E[i] is classified as the first introduction data. Therefore, if the first inequality is equation (1a), the controller 11 in step S133 evaluates the training error L EV When [i] is smaller than the threshold TH, the evaluation training error L EV The training data E[i] corresponding to [i] is classified as candidate data. If the first inequality is equation (1a), the controller 11 in step S133 evaluates the training error L EV When [i] is greater than or equal to the threshold TH, the evaluation training error L EVThe training data E[i] corresponding to [i] is classified as the first introduction data. If the first inequality is equation (1b), the controller 11 in step S133 evaluates the training error L EV When [i] is less than or equal to the threshold TH, the evaluation training error L EV The training data E[i] corresponding to [i] is classified as candidate data. If the first inequality is equation (1b), the controller 11 in step S133 evaluates the training error L EV When [i] is greater than the threshold TH, the evaluation training error L EV The training data E[i] corresponding to [i] is classified as the first introduction data. In this way, the evaluation training error L has a value that matches the threshold TH. EV Each corresponding training data E will be classified as either candidate data or first introduction data, depending on whether the first inequality is equation (1a) or (1b). In the first embodiment, it is assumed that the first inequality is used as equation (1a). After step S133, proceed to step S134.

[0052] In step S134, the controller 11 randomly selects (N × 100)% of candidate data from all candidate data. N is a positive value less than 1. Preferably, N is 0.5 or less. A predetermined appropriate value is set for N. For example, (N × 100)% is 20%, 30%, 40%, or 50%. For example, if only the training data E[1]~E

[5000] is classified as candidate data, and N=0.2, then 1000 candidate data, corresponding to 20%, are randomly selected from the 5000 candidate data corresponding to the training data E[1]~E

[5000] . The controller 11 generates or obtains random numbers and uses these random numbers to perform random selection. Known methods can be used as methods for generating or obtaining random numbers. For example, the controller 11 may generate random numbers (pseudorandom numbers) using a pseudorandom function, or it may obtain random numbers using environmental noise. After step S134, the process proceeds to step S135.

[0053] In step S135, the controller 11 classifies and sets each candidate data extracted in step S134 as data to be removed. In step S135, the controller 11 sets the removal flag FG corresponding to the training data E set as data to be removed to "1". The setting for the removal flag FG is reflected in the table TBL. That is, in step S135, if candidate data corresponding to training data E[i] is set as data to be removed, the value of the removal flag FG[i] in the table TBL changes from "0" to "1". After step S135, the process proceeds to step S136.

[0054] In step S136, the controller 11 classifies all candidate data except for the data to be removed into second introduction data. For example, consider the case where only the training data E[1] to E

[5000] are classified as candidate data, and the 1000 candidate data corresponding to the training data E[1] to E

[1000] are classified as data to be removed. In this case, the 4000 candidate data corresponding to the training data E

[1001] to E

[5000] are classified into second introduction data. After step S136, the process proceeds to step S137.

[0055] In step S137, the controller 11 sets the first introductory data and the second introductory data as reference learning data for the next epoch. There are multiple sets of first introductory data, second introductory data, and removal data. Therefore, the controller 11 sets each set of first introductory data and each set of second introductory data as reference learning data for the next epoch. The controller 11 excludes each removal data from the reference learning data for the next epoch. In steps S132 to S137, each of the learning data E[1] to E[M] is classified as either removal data or reference learning data for the next epoch based on the training error L in table TBL.

[0056] As described above, after the completion of the j-th epoch process and before the start of the (j+1)th epoch process, the next epoch refers to the (j+1)th epoch. Therefore, in step S137, reference learning data for the (j+1)th epoch is set. As described above, the collection of reference learning data for the (j+1)th epoch is denoted as reference data group P[j+1]. After step S137, the process proceeds to step S138.

[0057] In step S138, the controller 11 updates the scaling coefficient r. Specifically, the controller 11 updates the scaling coefficient r corresponding to each reference learning data classified as second introduction data from the reference learning data for the (j+1) epoch with an updated value V UD Set the value. Update value V UD The reference value is VAL REF It is (1 / (1-N)) times that. That is, “V UD =VAL REF It is ×(1 / (1-N))”. Here it is “VAL REF Since "=1", for example, if "N=0.2", then "V UD = 1 × (1 / (1 - 0.2)) = 1.25.

[0058] However, update value V UD The reference value is VAL REF It is acceptable for it to differ slightly from (1 / (1-N)) times, for example, “V UD =ΔV+VAL REF It may also be ×(1 / (1-N)). ΔV has a predetermined small value that is positive or negative. However, in either case, the updated value V UD The reference value is VAL REF It is larger than that.

[0059] In step S138, the scaling coefficient r corresponding to each reference learning data that is not classified as second introduction data is the baseline value VAL. REF This is maintained. Therefore, in the removal process, the reference value VAL is applied to the scaling coefficient r corresponding to each reference learning data classified as the first introductory data among the reference learning data for the (j+1) epoch. REFThis will be set. As described above, after step S138, proceed to step S16 in Figure 7.

[0060] Refer to Figures 9 and 10 to see a concrete example of the machine learning process flow (also refer to Figures 6-8 as appropriate). In the examples in Figures 9 and 10, it is assumed that "M=10000" and "N=0.2", and the reference value VAL REF It is 1. Then “V UD =VAL REF ×(1 / (1-N))=1×(1 / (1-0.2))=1.25” Therefore, the updated value V UD The value is 1.25 (step S138). Figure 9 shows the table TBL in state ST1. State ST1 is the state during the execution period of the first epoch process. Due to the processing in step S12, which is performed before the first epoch process (see Figure 6), in the table TBL in state ST1, the scaling coefficients r[1]~r

[10000] all have a value of "1" and the removal flags FG[1]~FG

[10000] all have a value of "0". The reference data group P[1], which is a collection of reference learning data for the first epoch, consists of the learning data E[1]~E

[10000] (step S14).

[0061] In the first epoch process, the training errors L[1] to L

[10000] for the training data E[1] to E

[10000] are derived and stored in the table TBL (step S22). Since all scaling coefficients r have a value of "1" in the first epoch process, the scaling error information corresponds to the training errors L[1] to L

[10000] derived in step S22.

[0062] Figure 9 also shows the table TBL in state ST2. State ST2 is the state immediately before the execution of the second epoch process, or the state during the execution of the second epoch process. State ST2 is reached after the completion of the first epoch process and the execution of the removal process between the first and second epoch processes. Assume that in the removal process between the first and second epoch processes, the learning data E[1] to E

[5000] are classified as candidate data, and the learning data E

[5001] to E

[10000] are classified as first introduction data (step S133). In addition, assume that of the candidate data, the learning data E[1] to E

[1000] are classified as removal data and the learning data E

[1001] to E

[5000] are classified as second introduction data (steps S134 to S136). Consequently, in the removal process between the first and second epoch processes, only removal flags FG[1] to FG

[1000] in the table TBL are set to "1", while the other removal flags FG have a value of "0". In addition, in the removal process between the first and second epoch processes, only scaling coefficients r

[1001] to r

[5000] in the table TBL are set to "1.25", while the other scaling coefficients r have a value of "1". The reference data group P[2], which is a collection of reference learning data for the second epoch, is composed of the first and second introduction data (step S137). Therefore, in the example in Figure 9, the reference data group P[2] has learning data E

[1001] to E

[10000] as reference learning data for the second epoch.

[0063] In the second epoch process, the training errors L

[1001] to L

[10000] for the training data E

[1001] to E

[10000] are derived and stored in the table TBL (step S22). In the example in Figure 9, the training data E[1] to E

[1000] correspond to the data removed in the second epoch and are not included in the reference data group P[2]. Therefore, the training errors L for the training data E[1] to E

[1000] are not derived in the second epoch process.

[0064] In the second epoch process, the training errors L

[1001] to L

[10000] derived during the second epoch process are scaled using the scaling coefficients r

[1001] to L

[10000] in the table TBL. As described above, the training error L[i] before scaling is denoted as training error La[i], and the training error L[i] after scaling is denoted as training error Lb[i]. In the second epoch process related to the example in Figure 9, for integers i satisfying "1001≦i≦5000", "Lb[i]=1.25×La[i]", and for integers i satisfying "5001≦i≦10000", "Lb[i]=1×La[i]=La[i]".

[0065] Figure 10 shows the table TBL in state ST3. State ST3 is the state immediately after the processing of step S131 is executed in the removal process between the second and third epoch processes. Therefore, in state ST3, all scaling coefficients r in table TBL have a value of "1" and all removal flags FG in table TBL have a value of "0".

[0066] Figure 10 also shows the table TBL in state ST4. State ST4 is the state immediately before the execution of the third epoch process, or the state during the execution of the third epoch process. State ST4 is reached after the completion of the second epoch process and after the removal process between the second and third epoch processes is executed. Assume that in the removal process between the second and third epoch processes, the training data E[1] to E

[5000] are classified as candidate data, and the training data E

[5001] to E

[10000] are classified as first introduction data (step S133). In addition, assume that among the candidate training data E[1] to E

[5000] , the training data E

[4001] to E

[5000] are classified as removal data and the training data E[1] to E

[4000] are classified as second introduction data (steps S134 to S136). Consequently, in the removal process between the second and third epochs, only removal flags FG

[4001] to FG

[5000] in the table TBL are set to "1", while the other removal flags FG have a value of "0". In addition, in the removal process between the second and third epochs, only scaling coefficients r[1] to r

[4000] in the table TBL are set to "1.25", while the other scaling coefficients r have a value of "1". The reference data group P[3], which is a collection of reference learning data for the third epoch, is composed of the first and second introduction data (step S137). Therefore, the reference data group P[3] in the example in Figure 10 has the learning data E[1] to E

[4000] and E

[5001] to E

[10000] as reference learning data for the third epoch.

[0067] In the third epoch process, the training errors L[1]~L

[4000] and L

[5001] ~L

[10000] for the training data E[1]~E

[4000] and E

[5001] ~E

[10000] are derived and stored in the table TBL (step S22). In the example in Figure 10, the training data E

[4001] ~E

[5000] corresponds to the data removed in the third epoch and is not included in the reference data group P[3]. Therefore, the training error L for the training data E

[4001] ~E

[5000] is not derived in the third epoch process.

[0068] In the third epoch, the training errors L[1]~L

[4000] and L

[5001] ~L

[10000] derived during the third epoch are scaled using the scaling coefficients r[1]~r

[4000] and r

[5001] ~r

[10000] in the table TBL. As described above, the training error L[i] before scaling is denoted as training error La[i], and the training error L[i] after scaling is denoted as training error Lb[i]. In the third epoch in the example shown in Figure 10, for integers i satisfying "1≦i≦4000", "Lb[i]=1.25×La[i]", and for integers i satisfying "5001≦i≦10000", "Lb[i]=1×La[i]=La[i]".

[0069] Each removal process and each epoch process from the third epoch onward proceeds in the same manner. In the examples of Figures 9 and 10, the candidate data set in the removal process between the first and second epoch processes is the same as the candidate data set in the removal process between the second and third epoch processes, but the former and the latter may differ.

[0070] Refer to Figures 11(a) to (c) to explain the significance of setting the data to be removed and the scaling process. The histogram 610 in Figure 11(a) represents the distribution of the training error L groups derived in the first epoch process. Training data E corresponding to training errors L greater than the threshold TH has relatively difficult model input data (problems where errors are likely to occur) for model 110. On the other hand, training data E corresponding to training errors L smaller than the threshold TH has relatively easy model input data (problems where errors are less likely to occur) for model 110. Model input data that is relatively easy for model 110 makes it easier for model 110 to derive the correct answer without further training. For this reason, it can be considered that training data E corresponding to training errors L smaller than the threshold TH is likely to be unnecessary data for training.

[0071] Excluding training data E corresponding to training errors L smaller than the threshold TH from the data used for training (reference data set P[j]) reduces the computational cost in machine learning. However, uniformly excluding all training data E corresponding to training errors L smaller than the threshold TH from the data used for training (reference data set P[j]) introduces bias into the training data. This bias means that model input data that is relatively easy for model 110 is predominantly introduced into the training data, rather than model input data that is relatively difficult for model 110. There is often useful data for training model 110 among the training data E corresponding to training errors L smaller than the threshold TH, so excessive bias is undesirable.

[0072] Therefore, in the first embodiment, some of the training data E corresponding to training errors L smaller than the threshold TH (or training data E corresponding to training errors L less than or equal to the threshold TH) are randomly extracted as discarded data. Then, in the second epoch step, the discarded data are excluded from the data used for training (reference data group P[2]). Histogram 620 in Figure 11(b) shows the distribution of the group of training errors L that remain after removing the training errors L corresponding to the discarded data from the training errors L derived in the first epoch step. Compared to histogram 610, histogram 620 shows that the frequency corresponding to training errors L smaller than the threshold TH has decreased by (100 × N)% overall.

[0073] Because some of the training data E corresponding to training errors L smaller than the threshold TH are randomly extracted as discarded data, the above bias partially remains, and the sum of each training error L derived in the second epoch process decreases. To compensate for this decrease, scaling is performed in this embodiment. That is, some of the data judged to be unnecessary is classified as discarded data, while the remainder is classified as second introduced data. Then, a scaling coefficient r for the second introduced data is set to the reference value VAL so that the decrease in training error L corresponding to the discarded data is compensated for. REFTo improve (Step S138).

[0074] Histogram 630 in Figure 11(c) is obtained by applying the following scaling transformation to histogram 620 in Figure 11(b). This scaling transformation is the training error L derived for the training data E corresponding to the second introduction data, and multiplies the training error L derived in the first epoch process by (1 / (1-N)). For example, consider the case where the training errors L[1]~L

[5000] derived in the first epoch process are smaller than the threshold TH, and therefore the training data E[1]~E

[5000] are classified as candidate data (see Figure 9). Furthermore, suppose that of the training data E[1]~E

[5000] , the training data E[1]~E

[1000] are classified as removal data and the training data E

[1001] ~E

[5000] are classified as second introduction data (see Figure 9). Under these circumstances, all frequencies below the threshold TH in histogram 610 correspond to the training errors L[1] to L

[5000] , and all frequencies below the threshold TH in histogram 620 correspond to the training errors L

[1001] to L

[5000] . Compared to histogram 610, the sum of training errors L below the threshold TH decreases by (100 × N)% in histogram 620 (however, errors may still occur). To compensate for this decrease, the scaling coefficient r

[1001] to r

[5000] corresponding to training errors L

[1001] to L

[5000] is set to 1.25 (see state ST2 in Figure 9). As a result, each training error L distributed below the threshold TH in histogram 620 is multiplied by 1.25, and consequently, histogram 620 is transformed into histogram 630. In the examples in Figures 11(a) to (c), much of the training error L that was distributed below the threshold TH in histogram 620 is distributed above the threshold TH in histogram 630.

[0075] In this first embodiment, training data E containing model input data that is relatively easy for model 110 (problems that are less likely to generate errors) is treated as unnecessary data, and a portion of it is excluded from the data set used for training as detached data. This reduces the computational cost in machine learning. Since the detached data is identified using the training error L that is inherently derived in machine learning, no additional computational cost is incurred. At this time, the influence of bias that arises from setting the detached data can be reduced through the scaling process described above. By reducing the influence of bias, the deterioration of the accuracy of machine learning (inference accuracy of the trained model) can be suppressed.

[0076] In step S132 of Figure 8, the training error L[i] in table TBL is evaluated as the training error L EV It is treated as [i]. In step S132, the controller 11 evaluates the training error L EV [1]~L EV The statistical value obtained by statistically processing [M] can be set as the threshold TH. This allows for the setting of an appropriate threshold TH that takes into account the distribution of each training error L that was actually derived.

[0077] Specifically, for example, the statistical value is the evaluation training error L. EV [1]~L EV It may be the average value of [M]. That is, in step S132, the controller 11 evaluates the training error L EV [1]~L EV It is advisable to derive the average value of [M] and set the derived average value as the threshold TH. This makes it easy to distinguish between training data E that have model input data E that are relatively difficult for model 110 and training data E that have model input data that are relatively easy for model 110. Instead of selecting data to be removed in order of increasing training error L, the average value can be used as the threshold TH, and the data to be removed can be set by identifying candidate data for removal. According to this, the evaluation training error L EV [1]~L EVSorting [M] in ascending or descending order is unnecessary (sorting is essential in the method of selecting data to remove in order of smallest training error L). However, if the training data set 21 is large, the computational cost of sorting cannot be ignored.

[0078] Alternatively, for example, the statistical value is the evaluation training error L. EV [1]~L EV It may also be the median of [M]. That is, in step S132, the controller 11 evaluates the training error L EV [1]~L EV The median of [M] can be derived, and this derived median can be set as the threshold TH. This also allows for a simple distinction between training data E that has relatively difficult model input data for model 110 and training data E that has relatively easy model input data for model 110. However, the evaluation training error L is used to derive the median. EV [1]~L EV A sorting process is required to arrange [M] in ascending or descending order.

[0079] Alternatively, for example, the statistical value may be the value at a specific position in the array obtained by performing the sorting process described above. That is, for example, in step S132, the controller 11 generates a sorted array by performing the sorting process described above, and based on the sorted array, the evaluation training error L EV [1]~L EV [M] The evaluation training error L of the top Q% EV Identify the evaluation training error L. Figure 12 shows an example of the generated sorted array. Here, in the sorted array, EV [1]~L EV Among [M], the evaluation training error L has a relatively small value. EV Evaluation training error L, which has a relatively larger value than EV A higher position is assigned to each of these. Q has a positive value less than 100, and is selected from a numerical range of, for example, 30 to 70. In step S132, the controller 11 evaluates the top Q% training error L EV Of these, the smallest evaluation training error L EVThe above statistical values and threshold TH can be used. For example, if "M=10000" and "Q=40", the evaluation training error L EV [1]~L EV Among [M], the 1st to 4000th largest evaluation training error L EV The evaluation training error L is the top Q%. EV The 4000th largest evaluation training error L EV The threshold TH can be set. In the example in Figure 12, the 4000th largest evaluation training error L EV Evaluation training error L EV

[4960] is the evaluation training error L EV The value of

[4960] (0.92) can be set as the threshold TH.

[0080] Alternatively, for example, in step S132, the controller 11 may use a value pre-set as a hyperparameter before executing the machine learning process as the threshold TH. This eliminates the need to derive statistical values in the machine learning process. However, in this case, a separate process to find an appropriate threshold TH may be required before the machine learning process.

[0081] <<Second Example>> A second embodiment will be described. It is empirically known that when the scale of model 110 is small (i.e., when the total number of learning parameters for model 110 is small), increasing the amount of training data that contains model input data that is relatively difficult for model 110 can actually lead to greater problems. A decrease in the inference accuracy of the trained model is an example of this problem. Taking this into consideration, training data E corresponding to training errors L greater than the threshold TH may be targeted for removal.

[0082] Figure 13 is a flowchart of the removal process according to the second embodiment. In addition to the flowchart of the removal process, Figure 13 also includes a conceptual diagram CC2 of the data classification method according to the second embodiment. The second embodiment is an embodiment obtained by modifying a part of the first embodiment, and unless otherwise specified in the second embodiment, the description of the first embodiment applies to the second embodiment as long as there is no contradiction. Figures 11(a) to (c) correspond to the technical content shown in the first embodiment and do not correspond to the second embodiment.

[0083] The removal process in Figure 13 consists of steps S131, S132, S133a and S134-S138, starting with step S131. By replacing step S133 in the removal process in Figure 8 with step S133a, the removal process in Figure 13 can be obtained. Except for this replacement, the content of the removal process is common between the first and second embodiments. The removal process in Figure 13 can be used as the removal process for step S30 in Figure 7. In this case, if the learning termination condition is not met in step S15 of Figure 6 (N of step S15), the process proceeds to step S131 in Figure 13, then to step S138 in Figure 13, and then to step S16 in Figure 7. Step S131 in Figure 13 corresponds to step S31 in Figure 7. Steps S132, S133a and S134-S137 in Figure 13 constitute steps S32 and S33 in Figure 7. Step S138 in Figure 13 corresponds to step S34 in Figure 7.

[0084] The differences from the removal process of the first embodiment will be explained. In the removal process shown in Figure 13, the process proceeds through steps S131 and S132 to step S133a. The details of the processes in steps S131 and S132 are the same as those shown in the first embodiment. The method for setting the threshold TH is also the same as that shown in the first embodiment.

[0085] In step S133a, the controller 11 evaluates the corresponding training error L for each training data E that constitutes the training data set 21. EVBy determining whether it satisfies the second inequality, the training data E is classified into either candidate data or first introduced data. The candidate data is the training data E corresponding to the candidate of the removal data. Therefore, the removal data is set from among the group of candidate data. The second inequality is the following formula (2a). Instead of this, the second inequality may be the following formula (2b). L EV >TH ···(2a) L EV ≧TH ···(2b)

[0086] The evaluation training error L corresponding to the training data E[i] EV [i], if it satisfies the second inequality, the training data E[i] is classified as candidate data. The evaluation training error L corresponding to the training data E[i] EV [i], if it does not satisfy the second inequality, the training data E[i] is classified as first introduced data. Therefore, when the second inequality is formula (2a), the controller 11 according to step S133a EV classifies the training data E[i] corresponding to the evaluation training error L EV [i] as candidate data when the evaluation training error L EV [i] is greater than the threshold TH. When the second inequality is formula (2a), the controller 11 according to step S133a EV classifies the training data E[i] corresponding to the evaluation training error L EV [i] as first introduced data when the evaluation training error L EV [i] is less than or equal to the threshold TH. When the second inequality is formula (2b), the controller 11 according to step S133a EV classifies the training data E[i] corresponding to the evaluation training error L EV [i] as candidate data when the evaluation training error L EVEach training data point E corresponding to this will be classified as either candidate data or first introduction data, depending on whether the second inequality is equation (2a) or (2b).

[0087] After step S133a, the process proceeds to step S134, and steps S134 to S138 are executed sequentially. The details of each step S134 to S138 are as shown in the first embodiment. After step S138, the process proceeds to step S16 in Figure 7.

[0088] In this second embodiment, a portion of the training data E containing model input data that is relatively difficult for model 110 is excluded from the data set used for training as elimination data. This reduces the computational cost in machine learning without degrading the accuracy of machine learning (inference accuracy of the trained model), especially when the scale of model 110 is small. Since the elimination data is identified using the training error L that is inherently derived in machine learning, no additional computational cost is incurred. At this time, the influence of bias arising from the setting of elimination data can be reduced through the scaling process described above. By reducing the influence of bias, the degradation of machine learning accuracy can be suppressed.

[0089] <<Third Example>> A third embodiment will now be described. Figure 14 is a flowchart of the removal process according to the third embodiment. The third embodiment is an embodiment obtained by modifying a part of the first embodiment, and unless otherwise specifically described in the third embodiment, the description of the first embodiment applies to the third embodiment as well, insofar as there is no contradiction. Note that Figures 11(a) to (c) correspond to the technical content shown in the first embodiment and do not correspond to the third embodiment.

[0090] The removal response process of FIG. 14 consists of the processes of steps S131, S132b, S133b, and S134 to S138, and starts from step S131. The removal response process of FIG. 14 is obtained by replacing steps S132 and S133 in the removal response process of FIG. 8 with steps S132b and S133b. Except for the said replacement, the content of the removal response process is common between the first and third embodiments. The removal response process of FIG. 14 can be used as the removal response process of step S30 in FIG. 7. In this case, when the learning end condition is not satisfied in step S15 of FIG. 6 (N in step S15), it proceeds to step S131 of FIG. 14, and after step S138 of FIG. 14, it proceeds to step S16 of FIG. 7. Step S131 of FIG. 14 corresponds to step S31 of FIG. 7. Steps S32 and S33 of FIG. 7 are constituted by steps S132b, S133b, and S134 to S137 of FIG. 14. Step S138 of FIG. 14 corresponds to step S34 of FIG. 7.

[0091] The differences from the removal response process of the first embodiment will be described. In the removal response process of FIG. 14, after the process of step S131, it proceeds to step S132b. The content of the process of step S131 is as shown in the first embodiment.

[0092] In step S132b, the controller 11 treats each training error L in the table TBL as the evaluation training error L EV and compares each evaluation training error L EV with the threshold values TH1 and TH2. TH1 and TH2 are positive threshold values that satisfy "TH1 < TH2". After step S132b, it proceeds to step S133b.

[0093] In step S133b, the controller 11 classifies the learning data E into either candidate data or first-introduced data by determining whether the corresponding evaluation training error L EV belongs to a specific numerical range RNG for each learning data E constituting the learning data group 21. The candidate data is the learning data E corresponding to the candidate for the removal data, and thus the removal data is set from among the group of candidate data. The controller 11 determines each evaluation training error L EVBased on the comparison results with thresholds TH1 and TH2, the evaluation training error L EV The system determines whether a given value belongs to a specific numerical range RNG. Either numerical range RNG1 or numerical range RNG2 can be used as the specific numerical range RNG. As shown in Figure 15(a), numerical range RNG1 is the range where the threshold TH1 is greater than or equal to the threshold TH2. As shown in Figure 15(b), numerical range RNG2 is a composite range of numerical ranges where the threshold TH1 is less than or equal to the numerical range where the threshold TH2 is greater than or equal to the threshold TH2.

[0094] Evaluation training error L corresponding to training data E[i] EV If [i] belongs to a specific numerical range RNG, the training data E[i] is classified as candidate data. The evaluation training error L corresponds to the training data E[i]. EV If [i] deviates from the specified numerical range RNG, the training data E[i] is classified as the first introduction data. Therefore, if the specified numerical range RNG is numerical range RNG1, the controller 11 in step S133b is “TH1≦L EV When [i]≦TH2” is true, the evaluation training error L EV The training data E[i] corresponding to [i] is classified as candidate data. If the specific numerical range RNG is the numerical range RNG1, the controller 11 in step S133b is “TH1≦L EV [i]When TH2” is not true, evaluation training error L EV The training data E[i] corresponding to [i] is classified as the first introduction data. If the specific numerical range RNG is the numerical range RNG1, then “TH1≦L EV [i]≦TH2” is satisfied if the evaluation training error L EV [i] corresponds to belonging to a specific numerical range RNG. If the specific numerical range RNG is the numerical range RNG1, then "TH1≦L EV The failure of [i]≦TH2” is due to the evaluation training error L EV [i] corresponds to a deviation from a specific numerical range RNG.

[0095] If the specified numerical range RNG is the numerical range RNG2, the controller 11 in step S133b will say "L EV [i] ≤ TH1" and "TH2 ≤ L EVWhen any of "[i]" is established, the evaluation training error L EV [i] corresponding learning data E[i] is classified as candidate data. When the specific numerical range RNG is the numerical range RNG2, the controller 11 according to step S133b is "TH1 < L EV [i] < TH2", when it is established, the evaluation training error L EV [i] corresponding learning data E[i] is classified as the first introduced data. When the specific numerical range RNG is the numerical range RNG2, "L EV [i] ≤ TH1" or "TH2 ≤ L EV [i]" being established corresponds to the evaluation training error L EV [i] belonging to the specific numerical range RNG. When the specific numerical range RNG is the numerical range RNG2, "TH1 < L EV [i] < TH2" being established corresponds to the evaluation training error L EV [i] deviating from the specific numerical range RNG.

[0096] After step S133b, it proceeds to step S134, and the processes of steps S134 to S138 are sequentially executed. The processing contents of each of steps S134 to S138 are as shown in the first embodiment. After step S138, it proceeds to step S16 in FIG. 7.

[0097] Also according to the third embodiment, the calculation cost in machine learning can be reduced. By appropriately setting the specific numerical range RNG according to the scale and configuration of the model 110, etc., the deterioration of the accuracy of machine learning associated with the setting of the removal data can be suppressed. Since the removal data is specified using the training error L originally derived in machine learning, no special additional calculation cost occurs. At this time, through the above-mentioned scaling process, the influence of the bias generated along with the setting of the removal data can be reduced. By reducing the influence of the bias, the deterioration of the accuracy of machine learning can be suppressed.

[0098] Two predetermined values may be set as the threshold values TH1 and TH2. Alternatively, the first and second statistical values obtained by statistically processing the evaluation training errors L EV [1] to L EV [M] may be set as the threshold values TH1 and TH2.

[0099] The first and second statistical values may be the values at the first and second specific positions in the array obtained by the sorting process described above. That is, for example, in step S132b, the controller 11 generates a sorted array by performing the sorting process described above, and based on the sorted array, the evaluation training error L EV [1]~L EV [M] The evaluation training error L of the top Q1% EV and the evaluation training error L for the top Q2% EV To identify the following. Figure 12 shows an example of a sorted array. Here, in the sorted array, the evaluation training error L EV [1]~L EV Among [M], the evaluation training error L has a relatively small value. EV Evaluation training error L, which has a relatively larger value than EV A higher position will be assigned to each of these. Q1 and Q2 have positive values less than 100, and are selected from a numerical range, for example, 10 to 90. Here, "Q1 > Q2" holds true, for example, "Q1 = 80" and "Q2 = 20". Evaluation training error L for the top Q1% EV In other words, the evaluation training error L for the lower (100-Q1) percent. EV That is the case.

[0100] In step S132b, the controller 11 evaluates the top Q1% training error L EV Of these, the smallest evaluation training error L EV The above first statistical value and threshold TH1 can be used. In step S132b, the controller 11 evaluates the top Q2% training error L EV Of these, the smallest evaluation training error L EV The above second statistical value and threshold TH2 can be used. For example, if “M=10000” and “Q1=80”, the evaluation training error L EV [1]~L EV Among [M], the evaluation training error L is the largest from the 1st to the 8000th largest. EV The evaluation training error L is for the top Q1%. EV And the 8000th largest evaluation training error L EVThe threshold TH1 can be set to this value. Similarly, for example, if "M=10000" and "Q2=20", the evaluation training error L EV [1]~L EV Among [M], the evaluation training errors L are the 1st to 2000th largest. EV The evaluation training error L is for the top 2%. EV And the 2000th largest evaluation training error L EV The threshold TH2 can be set.

[0101] <<Fourth Example>> A fourth embodiment will now be described. Depending on the scale of Model 110, it may be appropriate to combine the method shown in the first embodiment and the method shown in the second embodiment. The technology related to this combination will be explained in the fourth embodiment.

[0102] Figure 16 is a flowchart of the removal process according to the fourth embodiment. In addition to the flowchart of the removal process, Figure 16 also includes a conceptual diagram CC4 of the data classification method according to the fourth embodiment. The removal process in Figure 16 consists of steps S131, S132, S133c, S134c, S135c, S136c, S137, and S138c, starting from step S131. The removal process in Figure 16 can be obtained by replacing steps S133-S136 and S138 in the removal process in Figure 8 with steps S133c-S136c and S138c, respectively. Except for this replacement, the content of the removal process is common between the first and fourth embodiments. The removal process in Figure 16 can be used as the removal process for step S30 in Figure 7. In this case, if the learning termination condition is not met in step S15 of Figure 6 (N of step S15), the process proceeds to step S131 of Figure 16, then to step S138 of Figure 16, and finally to step S16 of Figure 7. Step S131 of Figure 16 corresponds to step S31 of Figure 7. Steps S132, S133c to S136c and S137 of Figure 16 constitute steps S32 and S33 of Figure 7. Step S138c of Figure 16 corresponds to step S34 of Figure 7.

[0103] In the removal countermeasure process of FIG. 16, first, the process of step S132 is performed after the process of step S131. The processes of steps S131 and S132 in FIG. 16 are the same as the processes of steps S131 and S132 in FIG. 8. The method of setting the threshold value TH is also as shown in the first embodiment. However, in the fourth embodiment, after step S132, the process proceeds to step S133c.

[0104] In step S133c, the controller 11 classifies the learning data E that constitutes the learning data group 21 into either the first candidate data or the second candidate data by determining whether the corresponding evaluation training error L EV satisfies the first inequality. Each candidate data is the learning data E corresponding to the candidate of the removal data. Therefore, the removal data is set from the group of the first candidate data and the group of the second candidate data. The first inequality used in step S133c is the above formula (1a), that is, "L EV <TH". Instead of this, the first inequality may be the above formula (1b), that is, "L EV ≦TH".

[0105] When the evaluation training error L EV [i] corresponding to the learning data E[i] satisfies the first inequality, the learning data E[i] is classified as the first candidate data. When the evaluation training error L EV [i] corresponding to the learning data E[i] does not satisfy the first inequality, the learning data E[i] is classified as the second candidate data. Therefore, when the first inequality is the formula (1a), the controller 11 according to step S133c classifies the learning data E[i] corresponding to the evaluation training error L EV [i] as the first candidate data when the evaluation training error L EV [i] is smaller than the threshold value TH. When the first inequality is the formula (1a), the controller 11 according to step S133c classifies the learning data E[i] corresponding to the evaluation training error L EV [i] as the second candidate data when the evaluation training error L EV [i] is greater than or equal to the threshold value TH. When the first inequality is the formula (1b), the controller 11 according to step S133c is the evaluation training error LEV When [i] is less than or equal to the threshold TH, the evaluation training error L EV The training data E[i] corresponding to [i] is classified as the first candidate data. If the first inequality is equation (1b), the controller 11 in step S133c evaluates the training error L EV When [i] is greater than the threshold TH, the evaluation training error L EV The training data E[i] corresponding to [i] is classified as the second candidate data. In this way, the evaluation training error L has a value that matches the threshold TH. EV Each corresponding training data E will be classified as either a first candidate data or a second candidate data, depending on whether the first inequality is equation (1a) or (1b). After step S133c, proceed to step S134c.

[0106] In step S134c, the controller 11 randomly extracts (Na × 100)% of the first candidate data from all the first candidate data, and randomly extracts (Nb × 100)% of the second candidate data from all the second candidate data. Na and Nb each have a positive value less than 1. It is preferable that Na and Nb are 0.5 or less. Na and Nb are set to predetermined appropriate values. For example, (Na × 100)% is 20%, 30%, 40%, or 50%, and (Nb × 100)% is 20%, 30%, 40%, or 50%. The values of Na and Nb may be the same or different. For example, consider the case where the training data E[1] to E

[5000] are classified as first candidate data and the training data E

[5001] to E

[10000] are classified as second candidate data, and Na = Nb = 0.2. In this case, 1,000 first candidate data points, representing 20%, are randomly selected from 5,000 first candidate data points corresponding to the training data E[1] to E

[5000] . Similarly, 1,000 second candidate data points, representing 20%, are randomly selected from 5,000 second candidate data points corresponding to the training data E

[5001] to E

[10000] . The controller 11 generates or obtains random numbers and uses these random numbers to perform random extraction. Known methods can be used as methods for generating or obtaining random numbers. For example, the controller 11 may generate random numbers (pseudorandom numbers) using a pseudorandom function, or it may obtain random numbers using environmental noise. After step S134c, the process proceeds to step S135c.

[0107] In step S135c, the controller 11 classifies and sets each first candidate data extracted in step S134c as first removal data, and classifies and sets each second candidate data extracted in step S134c as second removal data. In step S135c, the controller 11 sets the removal flag FG corresponding to the training data E set as first removal data to "1", and sets the removal flag FG corresponding to the training data E set as second removal data to "1". The settings for the removal flag FG are reflected in the table TBL. That is, in step S135c, if the first candidate data corresponding to the training data E[i] is set as first removal data, the value of the removal flag FG[i] in the table TBL changes from "0" to "1". Similarly, in step S135c, if the second candidate data corresponding to the training data E[i] is set as second removal data, the value of the removal flag FG[i] in the table TBL changes from "0" to "1". After step S135c, proceed to step S136c.

[0108] In step S136c, the controller 11 classifies all first candidate data except for the first removal data as first introduction data, and all second candidate data except for the second removal data as second introduction data. Now, let's assume that the training data E[1] to E

[5000] are classified as first candidate data and the training data E

[5001] to E

[10000] are classified as second candidate data. Under this assumption, let's consider the case where 1000 first candidate data corresponding to the training data E[1] to E

[1000] are classified as first removal data and 1000 second candidate data corresponding to the training data E

[5001] to E

[6000] are classified as second removal data. In this case, 4000 first candidate data points corresponding to training data E

[1001] to E

[5000] are classified as first introduction data, and 4000 second candidate data points corresponding to training data E

[6001] to E

[10000] are classified as second introduction data. After step S136c, proceed to step S137.

[0109] The process in step S137 in Figure 16 is the same as the process in step S137 in Figure 8. In step S137, the controller 11 sets the first introductory data and the second introductory data as reference learning data for the next epoch. There are multiple instances of each of the first introductory data, second introductory data, first removal data, and second removal data. Therefore, the controller 11 sets each of the first introductory data and each of the second introductory data as reference learning data for the next epoch. The controller 11 excludes each of the first removal data and each of the second removal data from the reference learning data for the next epoch. In steps S132, S133c to S133c and S137, each of the learning data E[1] to E[M] is classified as either first removal data, second removal data, or reference learning data for the next epoch, based on the training error L in table TBL.

[0110] As described above, after the completion of the j-th epoch process and before the start of the (j+1)th epoch process, the next epoch refers to the (j+1)th epoch. Therefore, in step S137, reference learning data for the (j+1)th epoch is set. As described above, the collection of reference learning data for the (j+1)th epoch is denoted as reference data group P[j+1]. After step S137, the process proceeds to step S138c.

[0111] In step S138c, the controller 11 updates the scaling coefficient r. Specifically, in step S138c, the controller 11 updates the scaling coefficient r corresponding to each reference learning data classified as the first introduction data from the reference learning data for the (j+1) epoch with an updated value Va UD Set the value. Update value Va UD The reference value is VAL REF It is (1 / (1-Na)) times that. That is, “Va UD =VAL REF It is ×(1 / (1-Na))” here “VAL REF Since =1, for example, if "Na=0.2", then "Va UD = 1 × (1 / (1 - 0.2)) = 1.25. However, the updated value Va UD The reference value is VAL REFIt is acceptable for it to differ slightly from (1 / (1-Na)) times, for example, “Va UD =ΔVa+VAL REF It may also be ×(1 / (1-Na))''. ΔVa has a predetermined small value that is positive or negative. However, in either case, the updated value Va UD The reference value is VAL REF It is larger than that.

[0112] Furthermore, in step S138c, the controller 11 updates the scaling coefficient r corresponding to each reference learning data classified as second introduction data from the reference learning data for the (j+1) epoch with an updated value Vb. UD Set the value Vb. UD The reference value is VAL REF It is (1 / (1-Nb)) times that. That is, “Vb UD =VAL REF It is ×(1 / (1-Nb))” here “VAL REF Since =1, for example, if "Nb=0.2", then "Vb UD = 1 × (1 / (1 - 0.2)) = 1.25. However, the updated value Vb UD The reference value is VAL REF It is acceptable for it to differ slightly from (1 / (1-Nb)) times, for example, "Vb UD =ΔVb+VAL REF It may also be ×(1 / (1-Nb)). ΔVb has a predetermined small value that is positive or negative. However, in either case, the updated value Vb UD The reference value is VAL REF It is larger than that.

[0113] Furthermore, in step S138c, the scaling coefficient r corresponding to the first or second removal data is the reference value VAL. REF This state is maintained. After step S138c, proceed to step S16 in Figure 7.

[0114] Thus, in the fourth embodiment, a part of the training data E having model input data that is relatively easy for the model 110 is excluded from the data group used for training as the first exclusion data. In addition, a part of the training data E having model input data that is relatively difficult for the model 110 is excluded from the data group used for training as the second exclusion data. By appropriately setting the values of Na and Nb corresponding to the exclusion rate, it is possible to reduce the computational cost in machine learning without degrading the accuracy of machine learning (the inference accuracy of the trained model). Since the exclusion data is specified using the training error L originally derived in machine learning, no special additional computational cost is incurred. At this time, through the above-described scaling process, the influence of the bias generated with the setting of the exclusion data can be reduced. By reducing the influence of the bias, the degradation of the accuracy of machine learning can be suppressed.

[0115] The values of Na and Nb may be fixed at predetermined values. Alternatively, based on the concept of curriculum learning, the controller 11 may dynamically change the values of Na and Nb as the machine learning progresses. In this case, as the variable j in the flowcharts of FIGS. 6 and 7 increases, the value of Na may be gradually decreased and the value of Nb may be gradually increased. Therefore, for example, before the j A th epoch step, “Na = Na1” and “Nb = Nb1”, and if it is assumed that after the j B th epoch step, “Na = Na2” and “Nb = Nb2”, then “Na1 <Na2” and “Nb1> Nb2” may be set. Here, j A and j B are integers that satisfy “1 <j A <j B ”.

[0116] <<Fifth Embodiment>> The fifth embodiment will be described. In the fifth embodiment, supplementary explanations will be added to the above-described technology.

[0117] The controller 11 assigns a scaling coefficient r to each of the training data E[1] to E[M] individually (see Figure 5). In each epoch (i.e., in each epoch process), the controller 11 derives a training error L for each reference training data set from the training data E[1] to E[M] (step S22 in Figure 6). Then, the controller 11 updates the training parameters of the model 110 based on the scaling error information obtained by scaling each derived training error L using the corresponding scaling coefficient r (steps S23 and S24 in Figure 6).

[0118] In the first epoch (i.e., the first epoch process), the controller 11 sets each of the training data E[1] to E[M] as reference training data for the first epoch (step S14 in Figure 6). In the first epoch, the controller 11 sets a reference value VAL for each scaling coefficient r for each reference training data. REF The scaling coefficient r is set (step S12 in Figure 6). However, after the j-th epoch and before the (j+1)th epoch, the controller 11 updates the necessary scaling coefficient r through the removal process (see Figure 7). At this time, the controller 11 sets a portion of the training data E[1]~E[M] as reference training data for the (j+1)th epoch based on the latest training errors L derived for the training data E[1]~E[M] (step S33 in Figure 7). Then, the controller 11 updates the scaling coefficient r corresponding to the reference training data for the (j+1)th epoch (see Figure 7). After that, in the (j+1)th epoch, the controller 11 derives scaling error information by multiplying the training error L derived for the reference training data for the (j+1)th epoch by the corresponding scaling coefficient r (steps S22~S24 in Figure 6). Here, j represents any integer greater than or equal to 1.

[0119] A portion of the training data E[1]~E[M] is set as reference training data for the (j+1)th epoch, and in the (j+1)th epoch, the training of model 110 proceeds through the derivation of the training error L for the reference training data. That is, in the (j+1)th epoch, only a portion of the training data E[1]~E[M] is used for training, and the remainder is excluded from the data used for training. This reduces the computational cost in machine learning. Since the data to be excluded from the data used for training (removed data) is identified based on the training error L that is inherently derived in machine learning, no particular additional computational cost is incurred. There is a concern that setting the removed data may introduce some bias into the data used for training. In this embodiment, the effect of bias can be reduced by scaling. By reducing the effect of bias, the deterioration of the accuracy of machine learning (inference accuracy of the trained model) can be suppressed.

[0120] Specifically, the controller 11 executes the removal process after the j-th epoch and before the (j+1)-th epoch (see Figure 7). That is, the controller 11 executes the removal process between the j-th epoch process and the (j+1)-th epoch process. Specific examples of the removal process are shown in the first to fourth embodiments. Of these, the removal methods according to the first to third embodiments (Figures 8, 13, and 14) are classified as first-type removal methods, and the removal method according to the fourth embodiment (Figure 16) is classified as second-type removal methods.

[0121] Of these, the first type of removal method (Figures 8, 13, and 14) will be explained below. For convenience of description, the removal process performed after the j-th epoch and before the (j+1)th epoch will be denoted as the removal process [j:j+1]. In the removal process [j:j+1], the controller 11 evaluates the latest training errors L derived for the training data E[1] to E[M]. EV Treated as such, evaluation training error L EV [1]~L EV[M] is compared with a threshold (steps S132, S132b). Based on the comparison result, the controller 11 classifies the training data E[1] to E[M] individually into candidate data or first introduction data (steps S133, 133a, 133b). Furthermore, the controller 11 randomly classifies some of the candidate data into removal data and classifies the candidate data other than the removal data into second introduction data (steps S134 to S136). Because the value of M is very large, except in extremely exceptional cases, there are multiple sets of removal data, first introduction data, and second introduction data. In the removal process [j:j+1], the controller 11 removes each removal data from the reference training data for the (j+1) epoch and sets each first introduction data and each second introduction data as reference training data for the (j+1) epoch (step S137). In the removal process [j:j+1], the controller 11 updates the scaling coefficient r corresponding to each reference learning data classified as the second introduction data with an updated value V UD By setting them to the reference value VAL REF To further improve (step S138; see state ST4 in Figure 10). For each reference learning data classified as the first introductory data among the reference learning data for the (j+1) epoch, the scaling coefficient r corresponds to the reference value VAL. REF This will be set (step S131; see state ST4 in Figure 10).

[0122] By setting the data to be removed, the computational cost in machine learning can be reduced. Based on the training error L that is inherently derived in machine learning, the data to be excluded from the data used for training (removed data) is identified, so no particular additional computational cost is incurred. There is a concern that setting the data to be removed may introduce some bias into the data used for training. The scaling coefficient r of the second introduction data, which is part of the candidate data and is incorporated into the reference training data, is the baseline value VAL. REF By improving this, the impact of bias can be reduced. Reducing the impact of bias helps to suppress the degradation of machine learning accuracy (inference accuracy of trained models).

[0123] <<Sixth Example>> A sixth embodiment will now be described. Figure 17 shows a functional block diagram of the controller 11. The controller 11 is provided with functional blocks F1 to F6. Each of the functions of functional blocks F1 to F6 may be realized by executing a program recorded in memory 12, database 20, or any other recording medium (not shown) on the controller 11.

[0124] Functional block F1 is the training error derivation unit and performs the processing shown in step S22 of Figure 6. Functional block F2 is the scaling unit and performs the processing shown in step S23 of Figure 6 (scaling processing). Functional block F3 is the parameter update unit and performs the processing shown in step S24 of Figure 6.

[0125] Functional blocks F4, F5, and F6 are the data classification unit, the reference learning data setting unit, and the scaling coefficient setting unit, respectively. The data classification unit F4, setting unit F5, and setting unit F6 execute the removal process in step S30 of Figure 7. In the removal process in Figure 8, the data classification unit F4 performs the processes in steps S132 to S136, the setting unit F5 performs the process in step S137, and the setting unit F6 performs the processes in steps S131 and S138. In the removal process in Figure 13, the data classification unit F4 performs the processes in steps S132, S133a, and S134 to S136, the setting unit F5 performs the process in step S137, and the setting unit F6 performs the processes in steps S131 and S138. In the removal process shown in Figure 14, the data classification unit F4 performs steps S132b, S133b, and S134-S136, the setting unit F5 performs step S137, and the setting unit F6 performs steps S131 and S138. In the removal process shown in Figure 16, the data classification unit F4 performs steps S132 and S133c-S136c, the setting unit F5 performs step S137, and the setting unit F6 performs steps S131 and S138c.

[0126] <<Example 7>> A seventh embodiment will be described.

[0127] A trained model (i.e., a model 110 having learned parameters after undergoing a machine learning process) may be applied to an in-vehicle device (not shown). An in-vehicle device is a type of electronic device mounted in a vehicle such as an automobile. The trained model may be used to perform predetermined inferences in the in-vehicle device, and the inference results may be used for autonomous driving or driver assistance that can be implemented in the vehicle. The learning device 10 itself may also be an in-vehicle device.

[0128] If the training data set 21 is an image dataset 21a (see Figure 3), each training image may be an image taken by an in-vehicle camera. The in-vehicle camera is a camera installed in any vehicle and may be a camera that constitutes a drive recorder or a camera connected to a drive recorder. The ground truth data in the training data may be generated using an existing object detector (not shown) or generated manually by a human. An existing image dataset for machine learning may be used as the image dataset 21a when executing the machine learning according to this embodiment.

[0129] The learning device 10 and the present invention embodied in the learning device 10 can also be applied to any application other than in-vehicle applications.

[0130] While the example mainly focuses on building an object detection model 110 through machine learning, the task (inference) performed by model 110 is arbitrary and can be a clustering task, a generative task, a natural language processing task, a regression task, a reinforcement task, etc. Therefore, for example, a trained model (i.e., model 110 with the learned parameters after the machine learning process) could be a generative AI that generates text or images. AI is an abbreviation for Artificial Intelligence.

[0131] A program that causes a computer device to execute any method described in each embodiment of the present invention, and a non-volatile recording medium on which such program is recorded, are included within the scope of the embodiments of the present invention. The program that causes a computer (computer device) to execute any method described in the embodiments of the present invention may be a subprogram incorporated into any main program or called by any main program. The learning device 10 includes a computer capable of executing any program. The controller 11 itself or the arithmetic processing unit provided in the controller 11 may be considered a computer. In the machine learning process, the method executed by the learning device 10 (controller 11) can be called a learning method, and the program that causes a computer to execute such learning method can be called a learning program. Any processing in the embodiments of the present invention may be realized by hardware such as semiconductor integrated circuits, software corresponding to the above program, or a combination of hardware and software. [Explanation of symbols]

[0132] 10 Learning device 11 Controllers 12 memory 13 Communications Department 20 Databases 21 Training Data Sets 21a Image dataset 110 Model E, E[i] Training data L, L[i] training error r, r[i] scaling coefficient FG, FG[i] removal flag F1 training error derivation unit F2 Scaling section F3 Parameter Update Section F4 Data Classification Department F5 Setting section for reference training data F6 Scaling coefficient setting section

Claims

1. A learning device that performs machine learning on a target model for multiple epochs based on a set of training data, having a controller, The controller assigns a scaling factor to each individual training data that constitutes the training data set. The controller, in each epoch, derives a training error for each set reference training data from the entire training data, and then updates the training parameters of the target model based on the scaling error information obtained by scaling each derived training error using the corresponding scaling coefficient. The controller, in the first epoch, sets each of the training data as reference training data for the first epoch, and sets a reference value for each scaling coefficient for each reference training data. The controller, after the j-th epoch and before the (j+1)th epoch, sets a portion of the entire training data as reference training data for the (j+1)th epoch and updates the scaling coefficient corresponding to the reference training data for the (j+1)th epoch, based on the latest training error derived for each training data constituting the training data set. j represents an integer greater than or equal to 1, The controller derives the scaling error information in the (j+1)th epoch by multiplying the training error derived for the (j+1)th epoch by the corresponding scaling coefficient. Learning device.

2. The controller, after the j-th epoch and before the (j+1)-th epoch, By comparing a set of evaluation training errors, which are the latest training errors derived for each training data set constituting the aforementioned training data set, with a threshold, each training data set constituting the aforementioned training data set is classified as candidate data or first introduction data. A portion of all candidate data is randomly classified as removal data, and the remaining candidate data is classified as second introduction data. Each removed data point is excluded from the reference learning data for the (j+1) epoch, and each first and second introduction data point is set as the reference learning data for the (j+1) epoch. For the (j+1) epoch, the scaling coefficient corresponding to each reference learning data classified as the first introduction data is set to the reference value, while the scaling coefficient corresponding to each reference learning data classified as the second introduction data is set to be greater than the reference value. The learning device according to claim 1.

3. The controller, after the j-th epoch and before the (j+1)-th epoch, Of the total training data, training data corresponding to evaluation training errors smaller than the threshold are classified as candidate data, training data corresponding to evaluation training errors larger than the threshold are classified as first introduction data, and training data corresponding to evaluation training errors with the same value as the threshold are classified as either candidate data or first introduction data. The learning device according to claim 2.

4. The controller, after the j-th epoch and before the (j+1)-th epoch, Of the total training data, training data corresponding to evaluation training errors greater than the threshold are classified as candidate data, training data corresponding to evaluation training errors smaller than the threshold are classified as first introduction data, and training data corresponding to evaluation training errors with the same value as the threshold are classified as either candidate data or first introduction data. The learning device according to claim 2.

5. The controller, after the j-th epoch and before the (j+1)-th epoch, By comparing each evaluation training error with multiple thresholds, it is determined whether each evaluation training error falls within a specific numerical range. From the total training data, the training data corresponding to evaluation training errors that fall within the specific numerical range are classified as candidate data, while the training data corresponding to evaluation training errors that deviate from the specific numerical range are classified as first introduction data. The learning device according to claim 2.

6. The controller, after the j-th epoch and before the (j+1)-th epoch, (N × 100)% of all the candidate data mentioned above are classified as the data to be removed. The scaling coefficient corresponding to each reference learning data classified as the second introduction data is set to 1 / (1-N) times the reference value, N has a positive value less than 1. or the learning device according to any one of claims 2 to 5.

7. The controller, after the j-th epoch and before the (j+1)-th epoch, By comparing a set of evaluation training errors, which are the latest training errors derived for each training data set constituting the aforementioned training data set, with a threshold, each training data set constituting the aforementioned training data set is classified as either a first candidate data or a second candidate data. A portion of all first candidate data is randomly classified as first removal data, and the first candidate data other than the first removal data is classified as first introduction data. A portion of all the second candidate data is randomly classified as second removal data, and the second candidate data other than the second removal data is classified as second introduction data. Each first and second removal data is excluded from the reference learning data for the (j+1) epoch, and each first and second introduction data is set as the reference learning data for the (j+1) epoch. For the (j+1) epoch, the scaling coefficients corresponding to each reference learning data classified as the first introduction data and the scaling coefficients corresponding to each reference learning data classified as the second introduction data are set to be greater than the reference value. The learning device according to claim 1.

8. The controller, after the j-th epoch and before the (j+1)-th epoch, Of the total training data, training data corresponding to evaluation training errors smaller than the threshold are classified as first candidate data, training data corresponding to evaluation training errors larger than the threshold are classified as second candidate data, and training data corresponding to evaluation training errors with the same value as the threshold are classified as either first candidate data or second candidate data. The learning device according to claim 7.

9. The controller, after the j-th epoch and before the (j+1)-th epoch, All (Na × 100)% of the above first candidate data are classified as the first removal data. The scaling coefficient corresponding to each reference learning data classified as the first introduction data is set to 1 / (1-Na) times the reference value, All (Nb × 100)% of the above first candidate data are classified as the second removal data. The scaling coefficient corresponding to each reference learning data classified as the second introduction data is set to 1 / (1-Nb) times the reference value, Na and Nb each have positive values less than 1. The learning device according to claim 7 or 8.

10. The controller sets the statistical value obtained by statistically processing the multiple evaluation training errors to the threshold. A learning device according to any one of claims 2 to 4, 7, and 8.

11. The controller sets the average value of the multiple evaluation training errors to the threshold. A learning device according to any one of claims 2 to 4, 7, and 8.

12. A learning method in which a learning device controller performs machine learning for multiple epochs on a target model based on a set of training data, The controller assigns a scaling factor to each individual training data that constitutes the training data set. The controller, in each epoch, derives a training error for each set reference training data from the entire training data, and then updates the training parameters of the target model based on the scaling error information obtained by scaling each derived training error using the corresponding scaling coefficient. The controller sets each of the training data for the first epoch as reference training data for the first epoch, and sets a reference value for each scaling coefficient for each reference training data. The controller, after the j-th epoch and before the (j+1)th epoch, sets a portion of the entire training data as reference training data for the (j+1)th epoch and updates the scaling coefficient corresponding to the reference training data for the (j+1)th epoch, based on the latest training error derived for each training data constituting the training data set. j represents an integer greater than or equal to 1, The controller derives the scaling error information in the (j+1)th epoch by multiplying the training error derived for the (j+1)th epoch by the corresponding scaling coefficient. Learning methods.

13. A learning program that causes the controller to execute the learning method described in claim 12.