Data deletion system, data deletion method, and data deletion program
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- DENSO CORP
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-25
Smart Images

Figure JP2024044617_25062026_PF_FP_ABST
Abstract
Claims
1. A data deletion system for deleting some data from a dataset used to train a machine learning model, comprising: an information acquisition unit (20); an information processing unit (30); the information acquisition unit acquires the dataset, initial model parameters obtained by training the machine learning model using the dataset, the range of variation of weight components corresponding to the variation in the data distribution under covariate shift, and a target number of data; the target number of data is set to be less than the number of data when data is deleted from the dataset while maintaining the performance of the machine learning model; the information processing unit includes a deletion unit (32) that identifies the weight component whose upper limit value is the largest within the range of variation of the weight components, leaves the target number of data as non-deletion targets, and deletes the data other than those not to be deleted.
2. The data deletion system according to claim 1, wherein the model parameter that is the dual variable of the initial model parameter is defined as the initial dual model parameter, and the difference between the evaluation value obtained when the initial model parameter is applied to the weighted loss function of the data to be not deleted and the evaluation value obtained when the initial dual model parameter is applied to the weighted loss function of the data to be not deleted is defined as the dual gap, and the information processing unit leaves data that reduces the dual gap as the target number of data to be not deleted.
3. The data deletion system according to claim 1 or 2, wherein the information processing unit includes an evaluation unit (34) that evaluates the performance of the machine learning model, which was trained using the data to be deleted, within a range of variation of the weight components.
4. A data deletion system for deleting some data from a dataset used to train a machine learning model, comprising: an information acquisition unit (20); an information processing unit (30); the information acquisition unit acquires the dataset, initial model parameters obtained by training the machine learning model using the dataset, the range of variation of weight components corresponding to the variation of the data distribution under covariate shift, and the target performance of the machine learning model; the information processing unit includes: a deletion unit (32) that identifies the weight component within the range of variation of the weight components for which the upper limit value of the range of variation of the model parameters obtained by training the machine learning model when data is deleted from the dataset is maximized, leaves a predetermined number of data for which the upper limit value of the identified weight component is small as data to be not deleted, and deletes the data other than the data to be not deleted; and an evaluation unit (34) that evaluates the performance of the machine learning model trained using the data to be not deleted within the range of variation of the weight components; the deletion unit selects the data to be not deleted such that the number of data to be deleted is maximized within the range in which the evaluation result of the evaluation unit satisfies the target performance.
5. A data deletion method for deleting some data from a dataset used to train a machine learning model, comprising: obtaining the dataset, initial model parameters obtained by training the machine learning model using the dataset, the range of variation of weight components corresponding to fluctuations in the data distribution under covariate shift, and the target number of data; identifying the weight component within the range of variation of the weight components that maximizes the upper limit of the range of variation of the model parameters obtained by training the machine learning model when data is deleted from the dataset; identifying the weight component that maximizes the upper limit as the number of data used to train the machine learning model changes; retaining the target number of data as data whose upper limit becomes smaller in the identified weight component, and deleting the data other than those not to be deleted, wherein the target number of data is smaller than the number of data when data is deleted from the dataset while maintaining the performance of the machine learning model.
6. A data deletion method for deleting some data from a dataset used to train a machine learning model, comprising: obtaining the dataset, initial model parameters obtained by training the machine learning model using the dataset, the range of variation of weight components corresponding to fluctuations in the data distribution under covariate shift, and the target performance of the machine learning model; identifying the weight component within the range of variation of the weight components that maximizes the upper limit of the range of variation of the model parameters obtained by training the machine learning model when data is deleted from the dataset; retaining a predetermined number of data for which the upper limit of the identified weight component is small as data to be not deleted, and deleting the data other than the data to be not deleted as data to be deleted; and evaluating the performance of the machine learning model trained using the data to be not deleted within the range of variation of the weight components, wherein, in the deletion of data as data to be not deleted, the data to be not deleted is selected such that the number of data to be deleted is maximized while the evaluation result of the machine learning model satisfies the target performance.
7. A data deletion program that removes some data from a dataset used for training a machine learning model, the program causing the computer to perform the following: an acquisition process to obtain the dataset, initial model parameters obtained by training the machine learning model using the dataset, the range of variation of weight components corresponding to fluctuations in the data distribution under covariate shifts, and a target number of data set to be less than the number of data when data is removed from the dataset while maintaining the performance of the machine learning model; and a deletion process to identify the weight component within the range of variation of the weight component whose upper limit value is maximized in the range of variation of the model parameters obtained by training the machine learning model when data is removed from the dataset, leave the target number of data as data to be not deleted, and delete the data other than the data to be not deleted.
8. A data deletion program that removes some data from a dataset used for training a machine learning model, wherein the program causes the computer to perform: an acquisition process to acquire the dataset, initial model parameters obtained by training the machine learning model using the dataset, the range of variation of weight components corresponding to fluctuations in the data distribution under covariate shift, and the target performance of the machine learning model; a deletion process to identify the weight component within the range of variation of the weight components that maximizes the upper limit of the range of variation of the model parameters obtained by training the machine learning model when data is removed from the dataset, a predetermined number of data for which the upper limit of the identified weight component is small are to be kept as non-deletion targets, and data other than those to be kept as non-deletion targets and delete them; and an evaluation process to evaluate the performance of the machine learning model trained using the non-deletion targets within the range of variation of the weight components, wherein the deletion process selects the non-deletion targets such that the number of data to be deleted is maximized while the evaluation result in the evaluation process satisfies the target performance.