Model continuous learning optimization method, electronic device, and storage medium

By acquiring initial and new sample datasets, the core gradient space of the diagnostic model is determined and the model is optimized using gradient transition values. This solves the catastrophic forgetting problem of deep learning models under new distribution data and improves model compatibility and diagnostic performance without increasing cost or workload.

CN116029355BActive Publication Date: 2026-06-26SOUTHERN UNIVERSITY OF SCIENCE AND TECHNOLOGY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHERN UNIVERSITY OF SCIENCE AND TECHNOLOGY
Filing Date
2023-01-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing deep learning models cannot be effectively updated when faced with new data distributions, leading to catastrophic forgetting problems. Furthermore, existing training methods are costly and labor-intensive in clinical applications and are difficult to be compatible with different data distributions.

Method used

By acquiring the initial sample dataset and the newly added sample dataset, the core gradient space of the diagnostic model is determined, and the model is optimized using gradient transition values ​​to avoid catastrophic forgetting, maintain the diagnostic effectiveness of historical data, and improve the diagnostic capability of new data.

Benefits of technology

Without increasing training costs and workload, this method enhances the model's compatibility with different data, meets clinical needs, and improves the overall optimization effect of the diagnostic model.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116029355B_ABST
    Figure CN116029355B_ABST
Patent Text Reader

Abstract

The application provides a model continuous learning optimization method, an electronic device and a storage medium. By inputting an initial sample data set into a diagnosis model, a core gradient space of the initial sample data set when training the diagnosis model is obtained. When subsequently optimizing the model by using a new sample data set, the diagnosis effect on historical data can be maintained, the diagnosis capability on new data can be improved, the catastrophic forgetting problem of the diagnosis model can be avoided, and the training cost and workload of the model are not increased. Therefore, the compatibility of the model on different data can be improved while meeting clinical requirements. A new sample data set is obtained according to the initial sample data set and the new sample data set. A gradient transfer value is obtained according to the new sample data set, the core gradient space and the new sample data set. The target gradient of the diagnosis model is determined according to the gradient transfer value, the direction of the optimization of the diagnosis model can be distinguished, and the overall effect of the optimization of the diagnosis model can be improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to, but is not limited to, the field of artificial intelligence technology, and particularly to a method for continuous model learning optimization, an electronic device, and a storage medium. Background Technology

[0002] Currently, most deep learning models cannot dynamically update according to the environment. Therefore, when new data distributions are input, deep learning models need to be retrained to adapt to the changes in distribution. While training the model based on all historical data can achieve the best results across the overall data distribution, this training method is impractical in real-world scenarios. Alternatively, training the model only on newly distributed data can lead to catastrophic forgetting, where the model learns to process new data but forgets old data. Specifically, variations in the distribution of imaging data due to changes in clinical scenarios can cause diagnostic models developed based on training data to deviate when processing clinical data, affecting diagnostic accuracy. Furthermore, directly using specific clinical data to optimize diagnostic models can hinder the diagnosis of other data, while jointly training with all training data and a large amount of clinical data is too costly and labor-intensive for clinical needs. Therefore, how to improve the model's compatibility with different data while meeting clinical requirements is a pressing issue that needs to be addressed. Summary of the Invention

[0003] This application provides a model continuous learning optimization method, electronic device, and storage medium, which can improve the model's compatibility with different data while meeting clinical needs.

[0004] In a first aspect, embodiments of this application provide a method for continuous model learning optimization, including:

[0005] Obtain the initial sample dataset and the new sample dataset;

[0006] The initial sample dataset is input into the diagnostic model to obtain the core gradient space of the initial sample dataset during the training of the diagnostic model.

[0007] A new sample dataset is obtained based on the initial sample dataset and the newly added sample dataset;

[0008] The gradient transition value is obtained based on the new sample dataset, the core gradient space, and the newly added sample dataset;

[0009] The target gradient of the diagnostic model is determined based on the gradient transition value, and the diagnostic model is optimized using the target gradient.

[0010] Optionally, in one embodiment of this application, the step of inputting the initial sample dataset into the diagnostic model to obtain the core gradient space of the initial sample dataset during training of the diagnostic model includes:

[0011] The initial sample dataset is input into the diagnostic model, and singular value decomposition is performed on the target sample values ​​in the initial sample dataset to obtain the left singular matrix and the diagonal matrix.

[0012] The target sample values ​​are filtered based on the diagonal matrix to obtain target singular values;

[0013] Based on the target singular values ​​and the left singular matrix, the core gradient space for training the diagnostic model using the initial sample dataset is obtained.

[0014] Optionally, in one embodiment of this application, obtaining the core gradient space for training the diagnostic model using the initial sample dataset based on the target singular values ​​and the left singular matrix includes:

[0015] Determine the number of the target singular values;

[0016] Multiple eigenvectors are obtained from the left singular matrix according to the stated quantity;

[0017] The core gradient space for training the diagnostic model is constructed using the multiple feature vectors to construct the initial sample dataset.

[0018] Optionally, in one embodiment of this application, obtaining the gradient transition value based on the new sample dataset, the core gradient space, and the newly added sample dataset includes:

[0019] Calculate the first gradient direction of the new sample dataset;

[0020] Based on the first gradient direction and the core gradient space, a first projection of the first gradient direction in the core gradient space is obtained;

[0021] The gradient transition value is obtained based on the first projection and the newly added sample dataset.

[0022] Optionally, in one embodiment of this application, obtaining the gradient transition value based on the first projection and the newly added sample dataset includes:

[0023] Determine the loss value and the first gradient based on the newly added sample dataset;

[0024] The second gradient direction is obtained by multiplying the loss value and the first gradient.

[0025] The gradient transition value is obtained by multiplying the first projection and the second gradient direction.

[0026] Optionally, in one embodiment of this application, determining the target gradient of the diagnostic model based on the gradient transition value includes:

[0027] When the gradient transition value is greater than or equal to a preset threshold, the second gradient direction is determined as the target gradient of the diagnostic model.

[0028] Optionally, in one embodiment of this application, determining the target gradient of the diagnostic model based on the gradient transition value includes:

[0029] When the gradient transition value is less than a preset threshold, the second projection of the second gradient direction in the core gradient space is determined based on the second gradient direction and the core gradient space.

[0030] The difference between the second gradient direction and the second projection is determined as the target gradient of the diagnostic model.

[0031] Optionally, in one embodiment of this application, obtaining a new sample dataset based on the initial sample dataset and the newly added sample dataset includes:

[0032] Determine a subset of sample data based on the initial sample dataset;

[0033] A new sample dataset is obtained by taking the union of the subset of sample data and the newly added sample dataset.

[0034] Secondly, embodiments of this application also provide an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the model continuous learning optimization method described in the first aspect above.

[0035] Thirdly, embodiments of this application also provide a computer-readable storage medium storing computer-executable instructions for performing the model continuous learning optimization method as described in the first aspect above.

[0036] Fourthly, embodiments of this application also provide a computer program product, including a computer program or computer instructions, wherein the computer program or computer instructions are stored in a computer-readable storage medium, a processor of a computer device reads the computer program or computer instructions from the computer-readable storage medium, and the processor executes the computer program or computer instructions, causing the computer device to perform the model continuous learning optimization method as described in the first aspect above.

[0037] The embodiments of this application include: by inputting an initial sample dataset into a diagnostic model, a core gradient space is obtained for training the diagnostic model using the initial sample dataset. This allows for the maintenance of diagnostic performance on historical data while improving diagnostic capabilities on new data when optimizing the model using a new sample dataset. This avoids catastrophic forgetting in the diagnostic model and does not increase the training cost or workload. Therefore, it can improve the model's compatibility with different data while meeting clinical needs. Furthermore, by obtaining a new sample dataset based on the initial and new sample datasets, and by obtaining gradient transition values ​​based on the new sample dataset, the core gradient space, and the new sample dataset, the target gradient of the diagnostic model is determined based on the gradient transition values. This allows for the differentiation of the direction of diagnostic model optimization, thereby improving the overall optimization effect of the diagnostic model. Attached Figure Description

[0038] Figure 1 This is a schematic diagram of data distribution provided in one embodiment of this application;

[0039] Figure 2 This is a flowchart of a model continuous learning optimization method provided in one embodiment of this application;

[0040] Figure 3 yes Figure 2 A flowchart of a specific method for step S120;

[0041] Figure 4 yes Figure 3 A flowchart of a specific method for step S230;

[0042] Figure 5 This is a schematic diagram of data distribution provided in another embodiment of this application;

[0043] Figure 6 yes Figure 2 A flowchart of a specific method for step S140;

[0044] Figure 7 yes Figure 6 A flowchart of a specific method for step S430;

[0045] Figure 8 This is a schematic diagram of data distribution provided in another embodiment of this application;

[0046] Figure 9 This is a schematic diagram of the positive direction of playback guidance optimization provided in one embodiment of this application;

[0047] Figure 10 This is a schematic diagram of the negative direction of playback guidance optimization provided in one embodiment of this application;

[0048] Figure 11 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0050] It should be noted that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0051] This application provides a method, electronic device, and storage medium for continuous model learning optimization. By inputting an initial sample dataset into a diagnostic model, the core gradient space of the initial sample dataset during model training is obtained. This allows for subsequent optimization of the model using new sample datasets, maintaining diagnostic effectiveness on historical data while improving diagnostic capabilities on new data. This avoids catastrophic forgetting in the diagnostic model without increasing training costs or workload. Therefore, it enhances the model's compatibility with different data while meeting clinical needs. Furthermore, by using the initial and new sample datasets, a new sample dataset is obtained. Based on the new sample dataset, the core gradient space, and the new sample dataset, gradient transition values ​​are derived. The target gradient of the diagnostic model is determined based on these gradient transition values, which distinguishes the direction of diagnostic model optimization and improves the overall optimization effect.

[0052] The embodiments of this application will be further described below with reference to the accompanying drawings.

[0053] In one embodiment, the different data distributions collected are defined as D in chronological order. t Where t = 1, 2, ..., n, and n is a positive integer. Assume D1 is used as the data distribution of pre-collected normal images for developing a diagnostic model. When using the new distribution D... T When optimizing a diagnostic model based on data with a T≥2, catastrophic forgetting can cause the model to fail to adapt to previous data distributions D. t≤T-1 Performance degradation occurs. To avoid catastrophic forgetting, the data is distributed across the joint distribution D of all existing data. J =D1YD2Y...YD T The diagnostic model is optimized with the goal of achieving the best performance. The following analysis uses a single optimization of the diagnostic model as an example to illustrate the optimization gradient.

[0054] For example, such as Figure 1 As shown, Figure 1 This is a schematic diagram of data distribution provided in one embodiment of this application. Figure 1 In the example, the previous data distribution D1 is used as the initial distribution, and the new data distribution D2 is used as the incremental distribution. If the fine-tuning method is used to optimize the diagnostic model based only on D2, the performance of the optimized diagnostic model will be severely degraded on D1.

[0055] Alternatively, if joint training is used to merge the data from D1 and D2 to optimize the diagnostic model, it can be ensured that the optimized diagnostic model is compatible with both data distributions, achieving compatibility in D1 and D2. 1Y2 The diagnostic model can reach the upper bound of its performance, but training with all past data is not practical in clinical applications.

[0056] Based on this, in order to achieve good diagnostic results in clinical applications, this application proposes to improve the optimization method of the diagnostic model by using continuous learning technology, thereby improving the optimization effect of the model.

[0057] The following presents various embodiments of the model continuous learning optimization method of this application.

[0058] Reference Figure 2 , Figure 2 This is a flowchart of a model continuous learning optimization method provided in one embodiment of this application. The model continuous learning optimization method includes, but is not limited to, steps S110, S120, S130, S140 and S150.

[0059] Step S110: Obtain the initial sample dataset and the new sample dataset.

[0060] In this step, the initial sample dataset is a collection of historical data, and the new sample dataset is a newly added data set. The initial sample dataset may include one or more sample data sets, and similarly, the new sample dataset may include one or more sample data sets. This application embodiment does not impose specific limitations on this.

[0061] Step S120: Input the initial sample dataset into the diagnostic model to obtain the core gradient space of the initial sample dataset when training the diagnostic model.

[0062] In this step, the core gradient space is the gradient of the diagnostic model when it is trained based on the initial sample dataset.

[0063] Step S130: Obtain a new sample dataset based on the initial sample dataset and the newly added sample dataset.

[0064] In one embodiment, a subset of sample data can be determined based on the initial sample dataset, and then a new sample dataset can be obtained by taking the union of the subset of sample data and the newly added sample dataset. Specifically, a replay strategy can be used to retain a minimal subset (i.e., the subset of sample data) from the initial sample dataset, and then the union of the minimal subset and the newly added sample dataset can be taken to obtain a new sample dataset. This embodiment does not impose specific limitations here. It should be noted that the replay strategy is a class incremental optimization strategy in continuous learning, used to handle unknown class labels contained in new data.

[0065] Step S140: Obtain gradient transition values ​​based on the new sample dataset, the core gradient space, and the newly added sample dataset.

[0066] In this step, the gradient transition value can be used to determine the positive and negative directions of the diagnostic model optimization.

[0067] Step S150: Determine the target gradient of the diagnostic model based on the gradient transition value, and optimize the diagnostic model using the target gradient.

[0068] In this embodiment, by employing the model continuous learning optimization method including steps S110 to S150, the initial sample dataset is input into the diagnostic model to obtain the core gradient space of the initial sample dataset during the training of the diagnostic model. This allows the model to maintain its diagnostic performance on historical data while improving its diagnostic capability on new data when subsequently optimized using new sample datasets, avoiding catastrophic forgetting problems in the diagnostic model. Moreover, it does not increase the training cost or workload of the model. Therefore, it can improve the model's compatibility with different data while meeting clinical needs. Furthermore, by obtaining a new sample dataset based on the initial sample dataset and the new sample dataset, and by obtaining gradient transition values ​​based on the new sample dataset, the core gradient space, and the new sample dataset, the target gradient of the diagnostic model can be determined based on the gradient transition values. This allows the direction of the diagnostic model optimization to be distinguished, thereby improving the overall optimization effect of the diagnostic model.

[0069] Reference Figure 3 In one embodiment of this application, step S120 is further described. Step S120 may include, but is not limited to, steps S210, S220 and S230.

[0070] Step S210: Input the initial sample dataset into the diagnostic model, perform singular value decomposition on the target sample values ​​in the initial sample dataset, and obtain the left singular matrix and the diagonal matrix.

[0071] Step S220: Filter the target sample values ​​according to the diagonal matrix to obtain the target singular values.

[0072] In this step, the non-zero elements on the diagonal of the diagonal matrix are singular values. The number of target singular values ​​can be one or more, which can be set according to actual needs. This application embodiment does not impose specific restrictions on this.

[0073] Step S230: Based on the target singular values ​​and the left singular matrix, obtain the core gradient space for training the diagnostic model using the initial sample dataset.

[0074] In this embodiment, by employing the model continuous learning optimization method including steps S210 to S230, the initial sample dataset can be input into the diagnostic model. Singular value decomposition is performed on the target sample values ​​in the initial sample dataset to obtain the left singular matrix and the diagonal matrix. Then, the target sample values ​​are filtered based on the diagonal matrix to obtain the target singular values. Finally, based on the target singular values ​​and the left singular matrix, the core gradient space of the initial sample dataset for training the diagnostic model is obtained. This ensures that when the model is subsequently optimized using new sample datasets, the diagnostic effect on historical data is maintained while the diagnostic capability for new data is improved, avoiding the catastrophic forgetting problem of the diagnostic model. Moreover, it does not increase the training cost or workload of the model. Therefore, it can improve the model's compatibility with different data while meeting clinical needs.

[0075] Reference Figure 4 In one embodiment of this application, step S230 is further described. Step S230 may include, but is not limited to, steps S310, S320 and S330.

[0076] Step S310: Determine the number of target singular values.

[0077] Step S320: Obtain multiple eigenvectors from the left singular matrix according to the quantity.

[0078] It is understood that the number of feature vectors can be the same as or different from the number of target singular values ​​(for example, the number of feature vectors can be a multiple of the number of target singular values). This can be set according to the actual situation, and the embodiments of this application do not impose specific restrictions on this.

[0079] Step S330: Construct the core gradient space for training the diagnostic model using the initial sample dataset with multiple feature vectors.

[0080] In this embodiment, by employing the model continuous learning optimization method including steps S310 to S330, the number of target singular values ​​can be determined. Then, multiple feature vectors are obtained from the left singular matrix based on the number. Finally, the core gradient space for training the diagnostic model using the initial sample dataset is constructed using the multiple feature vectors. This ensures that when the model is subsequently optimized using new sample datasets, the diagnostic effect on historical data is maintained while the diagnostic capability for new data is improved, avoiding the catastrophic forgetting problem of the diagnostic model. Moreover, it does not increase the training cost or workload of the model. Therefore, it can meet clinical needs while improving the model's compatibility with different data.

[0081] In one embodiment, such as Figure 5 As shown, assuming the initial sample dataset is D1, the input to the l-th layer of the diagnostic model is the target sample value. For target sample values Perform singular value decomposition to obtain in, It is a left singular matrix. It is a diagonal matrix.

[0082] Next, based on the diagonal matrix For target sample values After filtering, the top k singular values ​​of the target are obtained. This can be expressed by the following formula:

[0083]

[0084] in, Represents a preset threshold, ||·|| F Describing the Frobenius norm, Indicates the target sample value The first k target singular values ​​of the diagonal matrix after singular value decomposition.

[0085] Then, determine the number of target singular values ​​from the left singular matrix. We obtain k feature vectors and use these k feature vectors to construct the core gradient space, i.e., the projection matrix, for training the diagnostic model using the initial sample dataset. It should be noted that this projection matrix is ​​used to characterize the core gradient space of the diagnostic model when it is trained based on the initial sample dataset.

[0086] Reference Figure 6 In one embodiment of this application, step S140 is further described. Step S140 may include, but is not limited to, steps S410, S420 and S430.

[0087] Step S410: Calculate the first gradient direction of the new sample dataset.

[0088] In this step, the first gradient direction of the new sample dataset is calculated. Specifically, this can be done by calculating the loss value of the new sample dataset, then calculating the gradient of the new sample dataset, and finally using the product of the loss value and the gradient of the new sample dataset as the first gradient direction. It should be noted that the loss value of the new sample dataset can be calculated using any loss function such as the cross-entropy loss function or the L1 loss function (i.e., Manhattan distance), and this embodiment does not impose any specific restrictions on this.

[0089] Step S420: Based on the first gradient direction and the core gradient space, obtain the first projection of the first gradient direction in the core gradient space.

[0090] In this step, based on the first gradient direction and the core gradient space, the first projection of the first gradient direction in the core gradient space is obtained. Specifically, the first gradient direction can be obtained by multiplying the product between the core gradient space and the transpose of the core gradient space by the first gradient direction.

[0091] Step S430: Obtain the gradient transition value based on the first projection and the newly added sample dataset.

[0092] In this embodiment, by employing the model continuous learning optimization method including steps S410 to S430, the first gradient direction of the new sample dataset can be calculated. Then, based on the first gradient direction and the core gradient space, the first projection of the first gradient direction in the core gradient space is obtained. Finally, based on the first projection and the newly added sample dataset, the gradient transition value is obtained. This allows the direction of the diagnostic model optimization to be distinguished by the gradient transition value, rather than directly updating the diagnostic model, thereby improving the overall effect of the diagnostic model optimization.

[0093] Reference Figure 7 In one embodiment of this application, step S430 is further described. Step S430 may include, but is not limited to, steps S510, S520 and S530.

[0094] Step S510: Determine the loss value and the first gradient based on the newly added sample dataset.

[0095] In this step, the loss value can be calculated by any loss function such as the cross-entropy loss function or the L1 loss function (i.e., Manhattan distance). This application embodiment does not impose specific restrictions on this.

[0096] Step S520: Obtain the second gradient direction based on the product of the loss value and the first gradient.

[0097] Step S530: Obtain the gradient transition value based on the product of the first projection and the second gradient direction.

[0098] In this embodiment, by employing the model continuous learning optimization method including steps S510 to S530, the loss value and the first gradient can be determined based on the newly added sample dataset. Then, the second gradient direction is obtained based on the product of the loss value and the first gradient. Finally, the gradient transition value is obtained based on the product of the first projection and the second gradient direction. This allows the direction of the diagnostic model optimization to be distinguished by the gradient transition value, rather than directly updating the diagnostic model, thereby improving the overall effect of the diagnostic model optimization.

[0099] In one embodiment, step S150 will be further described, and step S150 may include, but is not limited to, the following steps:

[0100] When the gradient transition value is greater than or equal to a preset threshold, the second gradient direction is determined as the target gradient of the diagnostic model.

[0101] The preset threshold can be 0, 1, ... or 8, etc., and can be set according to the actual situation. No specific restrictions are made here.

[0102] In another embodiment, step S150 is further described, and step S150 may include, but is not limited to, the following steps:

[0103] When the gradient transition value is less than a preset threshold, the second projection of the second gradient direction in the core gradient space is determined based on the second gradient direction and the core gradient space.

[0104] The difference between the second gradient direction and the second projection is determined as the target gradient of the diagnostic model.

[0105] The preset threshold can be 0, 1, ... or 8, etc., and can be set according to the actual situation. No specific restrictions are made here.

[0106] In one embodiment, a replay strategy can be used to retain a very small subset μ1 (i.e., a subset of sample data) from the initial sample dataset, and a new sample dataset D2 can be introduced to optimize the diagnostic model, thereby maintaining the model's diagnostic performance on the initial sample dataset D1. For example... Figure 8 As shown, combining the playback-guided strategy with gradient orthogonal projection (i.e., the core gradient space) can distinguish the direction of model optimization, further improving the overall effect of model updates. To avoid overfitting caused by the minimum subset, the union of the minimum subset and the newly added sample dataset is taken to obtain the new sample dataset. Using the new sample dataset Instead of directly updating the model, choose the positive or negative direction for model optimization.

[0107] Specifically, the first step is to calculate the first gradient direction of the new sample dataset, which is equivalent to calculating the loss value of the new sample dataset. Then calculate the gradient of the new sample dataset. The product of the loss value of the new sample dataset and the gradient of the new sample dataset is used as the first gradient direction.

[0108] Next, the gradient transition value can be expressed by the following formula:

[0109]

[0110] Among them, S l P is the gradient transition value. l For the core gradient space, D1 represents the first projection of the first gradient direction onto the core gradient space, and D2 represents the newly added sample dataset. The loss value for the newly added sample dataset. This is the direction of the second gradient.

[0111] like Figure 9 As shown, when S l When ≥0, the direction of the second gradient With the first gradient direction If the angle between the projection directions in the core gradient space is less than or equal to 90 degrees, it is considered a positive optimization direction. The second gradient direction can be determined as the target gradient of the diagnostic model, and this target gradient can be directly used to update the diagnostic model.

[0112] like Figure 10 As shown, when S l When <0, the second gradient direction With the first gradient direction If the angle between the projection directions in the core gradient space and the target direction is greater than 90 degrees, it is considered a negative optimization direction. Based on the second gradient direction and the core gradient space, the second projection of the second gradient direction into the core gradient space can be determined. The difference between the second gradient direction and the second projection is then used as the target gradient for the diagnostic model, which is directly applied to update the model. When optimizing the diagnostic model based on a new sample dataset, only the target gradient orthogonal to the core gradient space can be retained. This maintains the diagnostic effectiveness for historical data and improves the diagnostic capability for new data. It should be noted that selecting optimization directions based on orthogonal gradient projection maintains the model's performance on the initial sample data D1 during updates, and by fixing the model update direction, the model is more likely to converge to a direction that is perpendicular to D1. 1Y2 The second-best position.

[0113] The target gradient can be expressed as:

[0114]

[0115] D2 is the newly added sample dataset. The loss value for the newly added sample dataset. For the second gradient, For the second gradient direction, This is the second projection of the second gradient direction onto the core gradient space.

[0116] It should be noted that all the l mentioned above correspond to the l-th layer of the diagnostic model, and l can be any positive integer without any specific restrictions.

[0117] Understandably, the aforementioned continuous learning optimization method updates each layer of the model, which can improve the optimization effect on the newly added sample dataset D2 while maintaining the model's performance on the initial sample data D1. This enhances the compatibility of the updated model with both the initial and newly added sample datasets, thereby improving the model's performance on both datasets (i.e., D1 and D2). 1Y2 To achieve better results.

[0118] In addition, one embodiment of this application also provides a model optimization apparatus, which includes:

[0119] The data acquisition module is used to acquire the initial sample dataset and the newly added sample dataset;

[0120] The core gradient space generation module is used to input the initial sample dataset into the diagnostic model and obtain the core gradient space of the initial sample dataset when training the diagnostic model.

[0121] The sample dataset generation module is used to generate a new sample dataset based on the initial sample dataset and the newly added sample dataset;

[0122] The gradient transition value generation module is used to obtain gradient transition values ​​based on the new sample dataset, the core gradient space, and the newly added sample dataset.

[0123] The model optimization module is used to determine the target gradient of the diagnostic model based on the gradient transition value, and to optimize the diagnostic model using the target gradient.

[0124] Furthermore, the core gradient space generation module specifically includes:

[0125] The singular value decomposition unit is used to input the initial sample dataset into the diagnostic model, perform singular value decomposition on the target sample values ​​in the initial sample dataset, and obtain the left singular matrix and the diagonal matrix.

[0126] The singular value generation unit is used to filter target sample values ​​based on a diagonal matrix to obtain target singular values.

[0127] The core gradient space generation unit is used to obtain the core gradient space of the initial sample dataset for training the diagnostic model based on the target singular values ​​and the left singular matrix.

[0128] Furthermore, the core gradient space generation unit specifically includes:

[0129] The quantity generation unit is used to determine the number of target singular values;

[0130] The eigenvector acquisition unit is used to obtain multiple eigenvectors from the left singular matrix according to the quantity.

[0131] The core gradient space generation subunit is used to construct the core gradient space for training the diagnostic model using an initial sample dataset with multiple feature vectors.

[0132] Furthermore, the gradient transition value generation module specifically includes:

[0133] The first gradient direction generation unit is used to calculate the first gradient direction of the new sample dataset.

[0134] The first projection generation unit is used to obtain the first projection of the first gradient direction in the core gradient space based on the first gradient direction and the core gradient space.

[0135] The gradient transition value generation unit is used to obtain gradient transition values ​​based on the first projection and the newly added sample dataset.

[0136] Furthermore, the gradient transition value generation unit specifically includes:

[0137] The first determination module is used to determine the loss value and the first gradient based on the newly added sample dataset;

[0138] The second gradient direction generation module is used to obtain the second gradient direction based on the product between the loss value and the first gradient.

[0139] The gradient transition value generation sub-unit is used to obtain the gradient transition value based on the product of the first projection and the second gradient direction.

[0140] Furthermore, the model optimization module specifically includes:

[0141] The second determining module is used to determine the second gradient direction as the target gradient of the diagnostic model when the gradient transition value is greater than or equal to a preset threshold.

[0142] Furthermore, the model optimization module specifically includes:

[0143] The third determining module is used to determine the second projection of the second gradient direction in the core gradient space based on the second gradient direction and the core gradient space when the gradient transition value is less than a preset threshold.

[0144] The fourth determination module is used to determine the difference between the second gradient direction and the second projection as the target gradient of the diagnostic model.

[0145] Furthermore, the sample dataset generation module specifically includes:

[0146] The fifth determination module is used to determine a subset of sample data based on the initial sample dataset;

[0147] The merge module is used to obtain a new sample dataset based on the union of a subset of sample data and the newly added sample dataset.

[0148] The aforementioned model optimization device and model continuous learning optimization method are based on the same inventive concept. By inputting the initial sample dataset into the diagnostic model, the core gradient space of the initial sample dataset during the training of the diagnostic model is obtained. When the model is subsequently optimized using new sample datasets, it can maintain the diagnostic effect on historical data while improving the diagnostic ability on new data, avoiding the catastrophic forgetting problem of the diagnostic model, and without increasing the training cost or workload of the model. Therefore, it can improve the model's compatibility with different data while meeting clinical needs. Furthermore, by obtaining a new sample dataset based on the initial sample dataset and the new sample dataset, and obtaining gradient transition values ​​based on the new sample dataset, the core gradient space, and the new sample dataset, the target gradient of the diagnostic model can be determined based on the gradient transition values. This can distinguish the direction of the diagnostic model optimization and thus improve the overall effect of the diagnostic model optimization.

[0149] Additionally, refer to Figure 11 An embodiment of this application also provides an electronic device 200, which includes a memory 202, a processor 201, and a computer program stored in the memory 202 and executable on the processor 201.

[0150] The processor 201 and the memory 202 can be connected via a bus or other means.

[0151] Memory 202, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory 202 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 202 may optionally include memory remotely located relative to processor 201, and these remote memories can be connected to processor 201 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0152] The non-transient software program and instructions required to implement the model continuous learning optimization method of the above embodiments are stored in memory 202. When executed by processor 201, the model continuous learning optimization method of the above embodiments is executed, for example, the method described above is executed. Figure 2 Method steps S110 to S150 in the text Figure 3 Method steps S210 to S230 in the text Figure 4 Method steps S310 to S330 in the text Figure 6 Method steps S410 to S430 in the text Figure 7 Method steps S510 to S530.

[0153] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0154] Furthermore, one embodiment of this application also provides a computer-readable storage medium storing computer-executable instructions that are executed by a processor or controller, for example, by a processor in the above-described device embodiment, causing the processor to execute the model continuous learning optimization method in the above-described embodiment, for example, executing the above-described... Figure 2 Method steps S110 to S150 in the text Figure 3 Method steps S210 to S230 in the text Figure 4 Method steps S310 to S330 in the text Figure 6 Method steps S410 to S430 in the text Figure 7 Method steps S510 to S530.

[0155] Furthermore, one embodiment of this application also provides a computer program product, including a computer program or computer instructions, which are stored in a computer-readable storage medium. A processor of a computer device reads the computer program or computer instructions from the computer-readable storage medium and executes the computer program or computer instructions, causing the computer device to perform the model continuous learning optimization method described in the above embodiments, for example, performing the above-described... Figure 2 Method steps S110 to S150 in the text Figure 3 Method steps S210 to S230 in the text Figure 4 Method steps S310 to S330 in the text Figure 6 Method steps S410 to S430 in the text Figure 7 Method steps S510 to S530.

[0156] It will be understood by those skilled in the art that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components can be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit. Such software can be distributed on a computer-readable medium, which can include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, as is known to those skilled in the art, communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

Claims

1. A method for continuous learning and optimization of a model, comprising: Obtain an initial sample dataset and a newly added sample dataset, both of which are clinical image datasets; The initial sample dataset is input into the diagnostic model. Singular value decomposition is performed on the target sample values ​​in the initial sample dataset to obtain a left singular matrix and a diagonal matrix. The target sample values ​​are filtered according to the diagonal matrix to obtain target singular values. The core gradient space of the initial sample dataset is constructed for training the diagnostic model using the feature vector of the left singular matrix corresponding to the target singular values. Determine a subset of sample data based on the initial sample dataset; A new sample dataset is obtained by taking the union of the subset of sample data and the newly added sample dataset; Calculate the first gradient direction of the new sample dataset; based on the first gradient direction and the core gradient space, obtain the first projection of the first gradient direction in the core gradient space; The second gradient direction is obtained based on the newly added sample dataset, and the gradient transition value is obtained based on the product of the first projection and the second gradient direction. The target gradient of the diagnostic model is determined based on the gradient transition value, and the diagnostic model is optimized using the target gradient. Wherein, determining the target gradient of the diagnostic model based on the gradient transition value includes: When the gradient transition value is greater than or equal to a preset threshold, the second gradient direction is determined as the target gradient of the diagnostic model; when the gradient transition value is less than the preset threshold, the difference between the second gradient direction and the second projection of the second gradient direction in the core gradient space is determined as the target gradient.

2. The model continuous learning optimization method according to claim 1, characterized in that, The step of constructing the core gradient space for training the diagnostic model using the left singular matrix eigenvectors corresponding to the target singular values ​​includes: Determine the number of the target singular values; Multiple eigenvectors are obtained from the left singular matrix according to the stated quantity; The core gradient space for training the diagnostic model is constructed using the multiple feature vectors to construct the initial sample dataset.

3. The model continuous learning optimization method according to claim 1, characterized in that, The step of obtaining the second gradient direction based on the newly added sample dataset includes: Determine the loss value and the first gradient based on the newly added sample dataset; The second gradient direction is obtained by multiplying the loss value and the first gradient.

4. An electronic device, comprising: A memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the computer program, it implements the model continuous learning optimization method as described in any one of claims 1 to 3.

5. A computer-readable storage medium storing computer-executable instructions for performing the model continuous learning optimization method according to any one of claims 1 to 3.