Network model method and device, data processing method and device, and electronic device
By iteratively updating the network model and utilizing the differences between pseudo-data and real training data, the problem of performance degradation of the network model after data expansion was solved, achieving real-time learning and resource conservation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- IFLYTEK SOUTH CHINA ARTIFICIAL INTELLIGENCE RES INST GUANGZHOU CO LTD
- Filing Date
- 2023-01-29
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies in network models cannot maintain excellent performance as data continues to expand, and retraining the entire dataset is inefficient, consumes computational and storage resources, and makes it difficult to learn new data in real time.
By iteratively updating the network model and utilizing the differences between pseudo-data and real training data, the network model can learn in real time, and the information from real training data can be losslessly compressed into pseudo-data, avoiding the consumption of computing and storage resources.
This approach enables network models to effectively balance performance and training costs while learning new data, avoiding the waste of computing and storage resources.
Smart Images

Figure CN115983346B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, specifically to a method and apparatus for network modeling, a data processing method and apparatus, and an electronic device. Background Technology
[0002] Most artificial intelligence algorithms or products disastrously forget previously learned knowledge when learning new knowledge. For example, a network model is trained using all available data, but as the number of users and the time spent using the network model increase, the available data is constantly expanded. Obviously, the network model cannot maintain its original superior performance when the data is expanded. Existing methods generally retrain the network model using all known full data (historical training data + new data), but this method is very inefficient. It not only greatly hinders the network model from learning new data in real time, but also consumes computing and storage resources. Summary of the Invention
[0003] In view of this, this application provides a method and apparatus for network modeling, a data processing method and apparatus, and an electronic device, which not only enables the network model to learn new data in real time, but also avoids occupying computing and storage resources, thereby effectively balancing the performance of the network model and the cost of training.
[0004] According to a first aspect of the embodiments of this application, a method for training a network model is provided, comprising: a) predicting the (i-1)th prediction result of the (i-1)th iteration of the pseudo data using the (i-1)th iteration of the pseudo data through the (i-1)th network model, and updating the (i-1)th network model based on the difference between the (i-1)th iteration of the pseudo data and the forged target result, thereby obtaining an updated (i-1)th network model, wherein the (i-1)th iteration of the pseudo data does not have a definite target result, and the forged target result is obtained by forgery; b) training a network model based on real training data through the updated (i-1)th iteration of the pseudo data. The i-1 network model predicts the i-1th prediction result of the real training data, and updates the i-1th network model and the pseudo data of the i-1th iteration based on the difference between the i-1th prediction result of the real training data and the real target result, to obtain the i-th network model and the pseudo data of the i-th iteration, where the real training data has a clear target result; the above steps a) and b) are iteratively executed until the j-th iteration to obtain the j+1th network model and the pseudo data of the j+1th iteration, where i is an integer greater than or equal to 2 and less than or equal to j+1, and j is an integer greater than or equal to 2.
[0005] In one embodiment, when real training data is replaced or expanded by data of the same category to obtain training data of the same category, the training method further includes: c) predicting the j+1th prediction result of the pseudo data of the j+1th iteration using the j+1th network model, and updating the j+1th network model based on the difference between the j+1th prediction result of the pseudo data of the j+1th iteration and the fake target result, to obtain the updated j+1th network model; d) predicting the j+1th prediction result of the training data of the same category using the updated j+1th network model, and updating the updated j+1th network model and the pseudo data of the j+1th iteration based on the difference between the j+1th prediction result of the training data of the same category and the real target result, to obtain the j+2th network model and the pseudo data of the j+2th iteration; iterating through steps c) and d) to the nth iteration to obtain the n+1th network model and the pseudo data of the n+1th iteration, where n is an integer greater than or equal to j+1.
[0006] In one embodiment, when different categories of training data are obtained by replacing or expanding real training data with data of different categories, different categories are added to the pseudo data of the (j+1)th iteration to obtain expanded pseudo data of the (j+1)th iteration. The training method further includes: e) predicting the (j+1)th prediction result of the expanded pseudo data of the (j+1)th iteration using the (j+1)th network model, and updating the (j+1)th network model based on the difference between the (j+1)th prediction result of the expanded pseudo data of the (j+1)th iteration and the fake expanded target result, to obtain the updated (j+1)th network model, wherein when j+1 = 3, the pseudo data of the (j+1)th iteration is the initial expansion. Pseudo-data, the fabricated expanded target result is obtained by adding different category labels to the fabricated target result; f) Based on the training data of different categories, through the updated j+1th network model, predict the j+1th prediction result of the training data of different categories, and based on the difference between the j+1th prediction result of the training data of different categories and the real target result, update the updated j+1th network model and the expanded pseudo-data of the j+1th iteration to obtain the j+2th network model and the expanded pseudo-data of the j+2th iteration; iterate through the above steps e) and f) to the nth iteration to obtain the n+1th network model and the pseudo-data of the n+1th iteration, where n is an integer greater than or equal to j+1.
[0007] In one embodiment, steps a) and b) above are iteratively executed up to the j-th iteration to obtain the (j+1)-th network model and the (j+1)-th iteration pseudo data, including: stopping the iteration when the difference between the j-th prediction result of the real training data and the real target result, and / or the difference between the j-th prediction result of the pseudo data of the j-th iteration and the fake target result meets a preset condition, thereby obtaining the (j+1)-th network model and the (j+1)-th iteration pseudo data.
[0008] In one embodiment, the pseudo data for the (i-1)th iteration is a matrix. When the training method is applied to the application scenario of educational score grading, the fake target result includes at least one score, and each score is obtained by forging score labels for a preset number of elements in the matrix.
[0009] According to a second aspect of the embodiments of this application, a data processing method is provided, comprising: inputting data to be classified into a network model, wherein the network model is trained by the training method of the network model described in any of the above embodiments; and using the network model to classify the data to be classified.
[0010] According to a third aspect of the embodiments of this application, a training apparatus for a network model is provided, comprising: a compressed training module configured to: a) predict the i-1th prediction result of the i-1th iteration of the pseudo data using the i-1th iteration network model, and update the i-1th iteration network model based on the difference between the i-1th prediction result of the i-1th iteration of the pseudo data and the forged target result, thereby obtaining an updated i-1th iteration network model, wherein when i-1 = 1, the pseudo data of the i-1th iteration is a randomly initialized matrix or an initialized matrix obtained based on historical training data, the i-1th iteration network model is an initialized network model, the pseudo data of the i-1th iteration does not have a definite target result, and the forged target result... The target result is obtained by forgery; the regular training module is configured as follows: b) Based on the real training data, the updated i-1th network model predicts the i-1th prediction result of the real training data, and based on the difference between the i-1th prediction result of the real training data and the real target result, the updated i-1th network model and the i-1th iteration pseudo data are updated to obtain the i-th network model and the i-th iteration pseudo data, where the real training data has a clear target result; the iteration module is configured to iteratively execute the above steps a) and b) up to the j-th iteration to obtain the j+1th network model and the j+1th iteration pseudo data, where i is an integer greater than or equal to 2 and less than or equal to j+1, and j is an integer greater than or equal to 2.
[0011] In one embodiment, the training apparatus further includes a module for performing the various steps of the training method for the network model mentioned in the above embodiments.
[0012] According to a fourth aspect of the embodiments of this application, a data processing apparatus is provided, comprising: an input module configured to input data to be classified into a network model, wherein the network model is trained by the training method of the network model described in any of the above embodiments; and a classification module configured to classify the data to be classified using the network model.
[0013] In one embodiment, the apparatus further includes a module for performing the various steps of the data processing method mentioned in the above embodiments.
[0014] According to a fifth aspect of the embodiments of this application, an electronic device is provided, including a memory and a processor, wherein executable code is stored in the memory and the processor is configured to execute the executable code to implement the method as described in the first or second aspect.
[0015] According to a sixth aspect of the embodiments of this application, a computer-readable storage medium is provided having executable code stored thereon, which, when executed, enables the implementation of the method as described in the first or second aspect.
[0016] This application provides a training scheme for a network model. While the network model learns real training data in real time, it also continuously updates the pseudo data using real training data. This compresses the information contained in the real training data into a smaller and denser pseudo data without loss. This not only enables the network model to learn real training data in real time, but also avoids occupying computing and storage resources, thereby effectively balancing the performance of the network model and the cost of training. Attached Figure Description
[0017] The above and other objects, features, and advantages of this application will become more apparent from the more detailed description of the embodiments of this application in conjunction with the accompanying drawings. The drawings are provided to further illustrate the embodiments of this application and form part of the specification. They are used together with the embodiments of this application to explain this application and do not constitute a limitation thereof. In the drawings, the same reference numerals generally represent the same components or steps.
[0018] Figure 1 This is a schematic diagram of the system architecture for an application scenario of the network model training method provided in one embodiment of this application.
[0019] Figure 2 This is a flowchart illustrating a training method for a network model provided in one embodiment of this application.
[0020] Figure 3 This is a flowchart illustrating a training method for a network model provided in another embodiment of this application.
[0021] Figure 4This is a flowchart illustrating a training method for a network model provided in another embodiment of this application.
[0022] Figure 5 This is a schematic diagram illustrating the process of training a network model according to another embodiment of this application.
[0023] Figure 6 This is a schematic flowchart of a data processing method provided in one embodiment of this application.
[0024] Figure 7 This is a schematic structural diagram of a network model training device provided in one embodiment of this application.
[0025] Figure 8 This is a schematic structural diagram of a data processing apparatus provided in one embodiment of this application.
[0026] Figure 9 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application. Detailed Implementation
[0027] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0028] The technical solutions of this application are applicable to any type of incremental learning task application scenario. By adopting the technical solutions of this application, not only can the network model learn real training data in real time, but it can also avoid occupying computing and storage resources, thereby effectively balancing the performance of the network model and the cost of training.
[0029] Humans possess the lifelong ability to acquire, adjust, and transfer knowledge. Typically, the accumulation of knowledge helps us master new knowledge more quickly; this learning ability is known as incremental learning. However, most artificial intelligence algorithms or products disastrously forget previously learned knowledge when learning new knowledge.
[0030] The following example uses an educational score grading task as the main demonstration case, which is a natural testing scenario for incremental learning techniques. For instance, an educational score grading task consists of three tasks, with training data being short-answer questions. For the first task, the training data is labeled 0 points; for the second task, 1 point; and for the third task, 2 points. When each task has only one score range in its training data, the grading model is likely to learn incorrect knowledge. For example, if the third task only has a score range of 2 points, the model might predict a score of 2 regardless of the input data, thus fitting the training data for 2 points. However, during testing, the grading model needs to process data from all score ranges, not just the 2-point range. Therefore, the grading model needs to learn from training data of different score ranges and ensure that it doesn't forget previously learned knowledge when learning new knowledge.
[0031] Specifically, a grading model (model_0) is trained using all the current data (referred to as data_0) and deployed on a learning machine. However, as the number of users and the time spent using the data increase, user feedback will increase, and the online question bank resources will expand with the addition of teachers, past exam papers, etc., expanding the current data from data_0 to data_1. Obviously, the grading model model_0 will not be able to maintain its excellent performance on data_1. In a more extreme case, if users report grading errors today, and testing again after a period of time reveals that the grading is still incorrect, it will greatly affect the user experience. However, initiating a fine-tuning process for the grading model based on small batches of user feedback is costly and inefficient. Therefore, how to balance the performance of the grading model with the training cost is an urgent problem to be solved.
[0032] Incremental learning techniques can be broadly categorized into three types:
[0033] a) The Data_replay method for accessing historical training data is key to selecting representative historical data for training.
[0034] b) Limit the magnitude of parameter updates in the grading model to prevent knowledge loss, but at the cost of the grading model having a relatively limited fit to new knowledge.
[0035] c) Expand the parameters of a sub-grading model for each task. This can minimize the interference and influence between knowledge between tasks. However, the cost is the computation and storage overhead of the grading model. Also, as the number of tasks gradually increases, and the score itself needs to be graded by the grading model, it is impossible to determine which sub-grading model to use at the time of input. Therefore, it does not meet the actual use case.
[0036] Based on the above analysis, the simplest and most direct solution to catastrophic forgetting is to use Data_replay, which involves retraining the parameters of the batch model using all known full data (representative historical training data + current task training data). However, this method is extremely inefficient, significantly hindering the batch model from learning the current task's training data in real time. The main goal of incremental learning is to find the optimal balance between computational and storage resources and the performance of the batch model. Therefore, the Data_replay method primarily focuses on data selection strategies, choosing as many representative historical data as possible. However, this is precisely the limitation of the Data_replay method, as detailed below:
[0037] a) Relying solely on historical training data, even with the best selection strategy, is insufficient;
[0038] b) It is impossible to theoretically guarantee its selection target, that is, which data can be considered representative, and how much of this representative training data needs to be selected to be equivalent to using all historical training data.
[0039] c) No single data selection strategy may always be superior to random sampling; that is, after trying many data selection strategies, random selection may be preferable.
[0040] To address the aforementioned issues, this application provides a network model training scheme. While the network model learns from real training data in real time, it also continuously updates the pseudo-data using the real training data. This losslessly compresses the information contained in the real training data into a smaller and denser set of pseudo-data, thereby enabling the network model to learn from real training data in real time while avoiding the occupation of computing and storage resources. This effectively balances the performance of the network model with the cost of training.
[0041] Since the embodiments of this application involve applications of neural networks, for ease of understanding, the relevant terms and concepts such as neural networks that may be involved in the embodiments of this application will be briefly introduced below.
[0042] A neural network is a computational model composed of numerous interconnected nodes (or neurons). Each node corresponds to a policy function, and the connection between any two nodes represents a weighted value for the signal passing through that connection. A neural network typically consists of multiple layers, cascaded together. The output of the i-th layer is connected to the input of the (i+1)-th layer, the output of the (i+1)-th layer is connected to the input of the (i+2)-th layer, and so on. Training samples are input into the cascaded neural network layers, and each layer outputs a result, which becomes the input of the next layer. This process continues, with multiple layers calculating the output. The predicted results of the output layers are compared with the actual target values. Based on the difference between the predicted and target values, the weight matrix and policy function of each layer are adjusted. The neural network continuously undergoes this adjustment process using training samples, refining the weights and other parameters until the predicted output matches the actual target value. This process is called the training process of the neural network. After training, a neural network model is obtained.
[0043] During neural network training, to ensure the output closely approximates the desired predicted value, we compare the network's prediction with the target value. Based on the difference, we update the weight vector of each layer (usually pre-configuring parameters before the initial update). For example, if the prediction is too high, the weight vector is adjusted to predict a lower value. This adjustment continues until the neural network predicts the target value or a value very close to it. Therefore, a loss function or objective function can be used to compare the difference between the predicted and target values; these are important equations for measuring this difference. Taking the loss function as an example, a higher loss value indicates a greater difference, and training the neural network becomes a process of minimizing this loss.
[0044] The following is combined Figure 1 This application provides a detailed description of the system architecture for application scenarios of the text generation method mentioned in the embodiments. For example... Figure 1 As shown, the application scenarios provided in this application embodiment involve server 140 and multiple terminal devices 110, 120, and 130.
[0045] Terminal devices 110, 120, and 130 can be mobile terminal devices such as mobile phones, game consoles, tablets, and in-vehicle computers; alternatively, they can be personal computers (PCs), such as laptops and desktop computers. Those skilled in the art will understand that the types of the aforementioned terminal devices 110, 120, and 130 can be the same or different, and their number can be more or less. For example, there can be one of each of the aforementioned terminals, or dozens or hundreds of terminals, or even more. This application does not limit the number or type of terminals in its embodiments.
[0046] Terminal devices 110, 120, and 130 are connected to server 140 via a communication network. Optionally, the communication network can be a wired network or a wireless network. Optionally, server 140 can be a single server, a combination of several servers, or a cloud computing service center.
[0047] To illustrate the data interaction method in the application scenario of this application, the following explanation uses the education score grading task as the main demonstration case. However, it should be noted that the application scenario of this application is applicable to any type of incremental learning task, and is not limited to education score grading tasks.
[0048] In one embodiment, terminal devices 110, 120, and 130 are used to receive real training data input by the user, such as test questions, their corresponding standard answers, and student answers, and send them to server 140. After receiving the real training data, server 140 performs the following steps: a) Based on the pseudo data of the (i-1)th iteration, using the (i-1)th network model, predicts the (i-1)th prediction result of the pseudo data of the (i-1)th iteration, and based on the difference between the (i-1)th prediction result of the pseudo data of the (i-1)th iteration and the fake target result, ... a) Update the (i-1)th network model to obtain the updated (i-1)th network model; b) Based on the real training data, use the updated (i-1)th network model to predict the (i-1)th prediction result of the real training data, and based on the difference between the (i-1)th prediction result of the real training data and the real target result, update the updated (i-1)th network model and the pseudo data of the (i-1)th iteration to obtain the i-th network model and the pseudo data of the i-th iteration; iterate through steps a) and b) to the j-th iteration to obtain the j+1th network model and the pseudo data of the j+1th iteration. This not only enables the network model to learn new data in real time but also avoids consuming computational and storage resources, thus effectively balancing the performance of the network model and the cost of training.
[0049] Figure 2The diagram shows a flowchart of a network model training method provided in an embodiment of this application. For example, this training method can be implemented by... Figure 1 The server 140 or other type of electronic device with data processing capabilities mentioned herein shall perform the execution. For example... Figure 2 As shown, the method includes the following steps.
[0050] Step S210: Based on the pseudo data of the (i-1)th iteration, predict the (i-1)th prediction result of the pseudo data of the (i-1)th iteration through the (i-1)th network model, and update the (i-1)th network model based on the difference between the (i-1)th prediction result of the pseudo data of the (i-1)th iteration and the fake target result, to obtain the updated (i-1)th network model.
[0051] The (i-1)th network model can be composed of a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), or a Recurrent Neural Network (RNN), etc. This application embodiment does not limit the type of the (i-1)th network model. Optionally, the network structure of the (i-1)th network model can be designed independently based on the computer vision task, or the network structure of the (i-1)th network model can adopt at least a part of existing network structures, such as ResNet, ResNext, or DenseNet, etc., or it can be an SWM classifier, or a linear regression classifier, etc. This application embodiment does not limit the network structure of the (i-1)th network model.
[0052] The pseudo-data in the (i-1)th iteration can be a matrix. The pseudo-data in the (i-1)th iteration does not have a definite target result; therefore, the target result corresponding to the pseudo-data in the (i-1)th iteration is a forged target result, i.e., obtained through forgery. For example, when this training method is applied to educational score grading, the forged target result includes at least one score, each obtained by forging score labels for a predetermined number of elements in the matrix.
[0053] For example, if the matrix has 100 elements, meaning its size is [100, 512], where 100 represents the number of elements (also the number of pseudo-data points), and 512 represents the dimension of the feature vector of the (i-1)th network model, then the score labels for the 1st to 20th elements are pseudo-labeled as 1, the score labels for the 21st to 60th elements are 2, and the score labels for the 61st to 100th elements are 3.
[0054] However, it should be noted that the embodiments of this application do not specifically limit the number of pseudo data maintained for each score label. One pseudo data can maintain only one score label, or multiple pseudo data can maintain one score label, or a preset number of pseudo data can be assigned to each score label according to its importance.
[0055] In one embodiment, by inputting the pseudo-data from the (i-1)th iteration into the (i-1)th network model, the (i-1)th prediction result of the pseudo-data from the (i-1)th iteration can be predicted. For example, the (i-1)th prediction result of the pseudo-data from the (i-1)th iteration includes a prediction score of 2 for the 1st to 20th elements, a prediction score of 3 for the 21st to 60th elements, and a prediction score of 4 for the 61st to 100th elements.
[0056] In one embodiment, based on the difference between the (i-1)th prediction result of the pseudo data in the (i-1)th iteration and the fake target result, the loss function value for the (i-1)th iteration is determined using a loss function. This loss function value is then backpropagated to update the parameters of the (i-1)th network model, resulting in an updated (i-1)th network model. For example, the loss function value for the (i-1)th iteration is determined based on the differences between the predicted scores of elements 1 to 20 (2 points) and 1 point, the predicted scores of elements 21 to 60 (3 points) and 2 points, and the predicted scores of elements 61 to 100 (4 points) and 3 points.
[0057] The process of obtaining the updated i-1 network model using the pseudo-data from the i-1th iteration can be viewed as a compressed training of the entire training method.
[0058] Step S220: Based on the real training data, predict the i-1th prediction result of the real training data through the updated i-1th network model, and update the updated i-1th network model and the pseudo data of the i-1th iteration based on the difference between the i-1th prediction result of the real training data and the real target result, so as to obtain the i-th network model and the pseudo data of the i-th iteration.
[0059] Real training data has a clear target result. For example, real training data can be the test questions, the corresponding standard answers, and the students' answers to a certain exam. However, it should be noted that the embodiments of this application do not specifically limit the number of real training data. For example, there can be multiple real training data, namely X1 to Xm, where each X is a test question, its corresponding standard answer, and the students' answers, and each X corresponds to a real target result.
[0060] For example, m is 2, X1 is test question 1, its corresponding standard answer 1, and student answer 1. The actual target result corresponding to X1 is 6 points obtained by manual scoring. X2 is test question 2, its corresponding standard answer 2, and student answer 2. The actual target result corresponding to X2 is 3 points obtained by manual scoring.
[0061] In one embodiment, by inputting real training data into the updated (i-1)th network model, the (i-1)th prediction result of the real training data can be predicted. For example, for real training data X1, the corresponding (i-1)th prediction result is 5 points, and for real training data X2, the corresponding (i-1)th prediction result is 2 points.
[0062] In one embodiment, based on the difference between the (i-1)th prediction result and the actual target result of the real training data, the loss function value for the (i-1)th iteration is determined using a loss function. This loss function value is then backpropagated to update the pseudo-data and the parameters of the updated (i-1)th network model for the (i-1)th iteration, respectively, to obtain the pseudo-data and the i-th network model for the i-th iteration. For example, the loss function value for the (i-1)th iteration is determined based on the difference between the (i-1)th prediction result of 5 points corresponding to the real training data X1 and the actual target result of 6 points (obtained by manual scoring) corresponding to X1, and the difference between the (i-1)th prediction result of 2 points corresponding to the real training data X2 and the actual target result of 3 points (obtained by manual scoring) corresponding to X2.
[0063] The process of obtaining the pseudo-data for the i-th iteration and the i-th network model from real training data can be considered as the regular training of the entire training method. In the regular training phase, the pseudo-data for the (i-1)-th iteration is learned and will be updated to the pseudo-data for the i-th iteration.
[0064] In compressed training, a mapping relationship can be obtained between the pseudo-data and the fake target result in the (i-1)th iteration, but this mapping relationship is actually faked. Through regular training, a mapping relationship can be obtained between the real training data and the real target result; this mapping relationship truly exists. For example, through the mapping relationship in compressed training, the features corresponding to the fake 1 point can be determined; through the mapping relationship in regular training, the features corresponding to the real 1 point can be determined. The features corresponding to the real 1 point are then updated in the pseudo-data of the (i-1)th iteration. Thus, in the i-th iteration of compressed training, the i-th network model can learn the features corresponding to the real 1 point.
[0065] Step S230: Iterate through steps S210 and S220 until the j-th iteration to obtain the (j+1)-th network model and the pseudo data for the (j+1)-th iteration.
[0066] i is an integer greater than or equal to 2 and less than or equal to j+1, and j is an integer greater than or equal to 2. That is to say, the above steps S210 and S220 are executed at least twice, and the resulting (j+1)th network model can be understood as a trained network model.
[0067] The above iterative process includes at least two iterations. First iteration: Based on the pseudo-data from the first iteration, the first network model predicts the first prediction result of the pseudo-data from the first iteration. Based on the difference between the first prediction result of the pseudo-data from the first iteration and the fake target result, the first network model is updated to obtain an updated first network model. Based on the real training data, the updated first network model predicts the first prediction result of the real training data. Based on the difference between the first prediction result of the real training data and the real target result, the updated first network model and the pseudo-data from the first iteration are updated to obtain a second network model and pseudo-data for the second iteration. Second iteration: Based on the pseudo-data from the second iteration, the second network model predicts the second prediction result of the pseudo-data from the second iteration. Based on the difference between the second prediction result of the pseudo-data from the second iteration and the fake target result, the second network model is updated to obtain the updated second network model. Based on the real training data, the updated second network model predicts the second prediction result of the real training data. Based on the difference between the second prediction result of the real training data and the real target result, the updated second network model and the pseudo-data from the second iteration are updated to obtain the third network model and the pseudo-data from the third iteration. This process is repeated for j iterations to obtain the (j+1)th network model and the (j+1)th iteration pseudo-data.
[0068] It should be noted that, Figure 2 The method steps shown can be understood as being performed for any given task. When the task is the initial task (the first task for training the network model), when i-1 = 1, the pseudo-data for the (i-1)th iteration is a randomly initialized matrix or an initialized matrix obtained based on historical training data, and the (i-1)th network model is a randomly initialized network model. However, this embodiment does not limit the specific type of the task; it can also be a subsequent task other than the initial task, and the (j+1)th network model and the pseudo-data for the (j+1)th iteration are the trained network model and trained pseudo-data obtained based on this task.
[0069] This application provides a training scheme for a network model. In each iteration, the network model learns real training data in real time, and also continuously updates the pseudo data using real training data. This compresses the information contained in the real training data into a smaller and denser pseudo data without loss. This not only enables the network model to learn real training data in real time, but also avoids occupying computing and storage resources, thereby effectively balancing the performance of the network model and the cost of training.
[0070] If a new task emerges, the real training data will be replaced or expanded with data of the same category to obtain training data of the same category. In other words... Figure 2 The real training data shown contains the same categories as the training data of the same category. For example, if the real training data contains categories 1, 2, and 3, the training data of the same category also contains categories 1, 2, and 3. New tasks can be performed as follows: Figure 3 The method shown includes the following:
[0071] Step S310: Based on the pseudo data of the (j+1)th iteration, predict the (j+1)th prediction result of the pseudo data of the (j+1)th iteration through the (j+1)th network model, and update the (j+1)th network model based on the difference between the (j+1)th prediction result of the pseudo data of the (j+1)th iteration and the fake target result, to obtain the updated (j+1)th network model.
[0072] This new task is based on Figure 2 The method steps shown are used to obtain the pseudo data for the (j+1)th iteration and the (j+1)th network model. The target result of the pseudo data for the (j+1)th iteration is the same as the target result of the pseudo data for the (i-1)th iteration.
[0073] The specific implementation details of step S310 are the same as those of step S210, and will not be repeated here. Please refer to step S210.
[0074] Step S320: Based on the training data of the same category, predict the (j+1)th prediction result of the training data of the same category through the updated (j+1)th network model, and update the updated (j+1)th network model and the pseudo data of the (j+1)th iteration based on the difference between the (j+1)th prediction result of the training data of the same category and the actual target result, to obtain the (j+2)th network model and the pseudo data of the (j+2)th iteration.
[0075] The specific implementation details of step S320 are the same as those of step S220, and will not be repeated here. Please refer to step S220.
[0076] Step S330: Iterate through steps S310 and S320 until the nth iteration to obtain the (n+1)th network model and the pseudo data for the (n+1)th iteration.
[0077] Although step S310 starts from the (j+1)th iteration, steps S310 and S320 iterate up to the nth iteration, where n is an integer greater than or equal to j+1. This means that steps S310 and S320 are executed at least twice, and the resulting (n+1)th network model can be considered a trained network model. The above iterative process is similar to... Figure 2 The iterative process in the method shown is the same, and will not be repeated here.
[0078] It should be noted that, Figure 3 The method steps shown can be understood as being performed for subsequent tasks other than the initial task. The (n+1)th network model and the pseudo data for the (n+1)th iteration are the trained network model and trained pseudo data obtained based on this subsequent task.
[0079] Furthermore, if new tasks emerge, the real training data will be replaced or expanded by data of different categories, resulting in training data of different categories. In other words, Figure 2 The categories contained in the real training data shown are different from those contained in the training data of different categories. Based on the pseudo data of the (j+1)th iteration, different categories are added to obtain the expanded pseudo data of the (j+1)th iteration.
[0080] For example, the real training data contains categories of 1, 2, and 3 points, and the training data of different categories also contains categories of 1, 2, 3.5, and 3 points. Obviously, 3.5 points is a different category. At this time, it is necessary to add this different category to the pseudo data of the (j+1)th iteration to obtain the expanded pseudo data of the (j+1)th iteration. For example, the size of the pseudo data of the (j+1)th iteration is [3, 512], and the size of the expanded pseudo data of the (j+1)th iteration is [4, 512].
[0081] New tasks can be performed as follows Figure 4 The method shown includes the following:
[0082] Step S410: Based on the augmented pseudo-data of the (j+1)th iteration, predict the (j+1)th prediction result of the augmented pseudo-data of the (j+1)th iteration through the (j+1)th network model, and update the (j+1)th network model based on the difference between the (j+1)th prediction result of the augmented pseudo-data of the (j+1)th iteration and the fake augmented target result, to obtain the updated (j+1)th network model.
[0083] This new task is based on Figure 2The method steps shown are used to obtain the (j+1)th network model, but with... Figure 3 The method steps shown are different in that the pseudo data used in this new task is not based on the pseudo data of the (j+1)th iteration, but is expanded pseudo data of the (j+1)th iteration.
[0084] Since this embodiment uses the augmented pseudo-data from the (j+1)th iteration, therefore, for different categories, in Figure 2 By adding labels of different categories to the forged expanded target result shown, a forged expanded target result is obtained.
[0085] When j+1 = 3, the pseudo data of the (j+1)th iteration is the initial expanded pseudo data. That is, it can initialize the different categories newly added in the expanded pseudo data of the (j+1)th iteration, or it can initialize all categories in the expanded pseudo data of the (j+1)th iteration. The (j+1)th network model is the parameter-initialized model.
[0086] The specific implementation details of step S410 are the same as those of step S210, and will not be repeated here. Please refer to step S210.
[0087] Step S420: Based on the training data of different categories, predict the (j+1)th prediction result of the training data of different categories through the updated (j+1)th network model, and update the updated (j+1)th network model and the augmented pseudo data of the (j+1)th iteration based on the difference between the (j+1)th prediction result of the training data of different categories and the actual target result, to obtain the (j+2)th network model and the augmented pseudo data of the (j+2)th iteration.
[0088] The specific implementation details of step S420 are the same as those of step S220, and will not be repeated here. Please refer to step S220.
[0089] Step S430: Iterate through steps S410 and S420 until the nth iteration to obtain the (n+1)th network model and the pseudo data for the (n+1)th iteration.
[0090] Although step S410 also starts from the (j+1)th iteration, steps S410 and S420 iterate up to the nth iteration, where n is an integer greater than or equal to j+1. This means that steps S410 and S420 are executed at least twice, and the resulting (n+1)th network model can be understood as a trained network model. The above iterative process is similar to... Figure 2 The iterative process in the method shown is the same, and will not be repeated here.
[0091] It should be noted that, Figure 4The method steps shown can be understood as being performed for subsequent tasks other than the initial task. The (n+1)th network model and the pseudo-data for the (n+1)th iteration are the trained network model and trained augmented pseudo-data obtained based on this subsequent task.
[0092] As can be seen from the above analysis, the method used in this application embodiment is similar to the Data_replay method, reusing historical training data (i.e., real training data) to ensure that the network model does not forget knowledge. However, the method used in this application embodiment does not select a representative portion of historical training data (e.g., using 10% of historical training data to represent the remaining 90% of historical training data). The method used in this application embodiment focuses on compressing the information contained in all the real training data into a small portion of pseudo-data without loss. Its consumption of computing and storage resources is only equivalent to 10% of the real training data, but it compresses 100% of the information.
[0093] Specifically, the current task is task_i, and the real training data is data_i. Both the Data_replay method and the method used in this embodiment aim to train using real training data. However, the Data_replay method selects a certain proportion (e.g., one-tenth) of the historical training data from the historical task task_{0,1…i-1}, so its total training data can be represented as data_i + 0.1 * data_{0,1…i-1}. The method used in this embodiment explores a lossless compression strategy, first compressing all the training data data_{0,1…i-1} onto pseudo-data x' (e.g., the pseudo-data from the (i-1)th iteration), so its total training data can be represented as data_i + x'. When training a new task, the network model first ensures that it does not suffer catastrophic forgetting of knowledge on x'.
[0094] In summary, the method employed in this application addresses the shortcomings of the Data_replay method in terms of data selection strategy. Even with the best selection strategy, the number of selected samples is ultimately limited, and its representativeness of the original data distribution is questionable. In contrast, the method employed in this application uses data compression, distilling all the information from the training data into a smaller, denser pseudo-data set.
[0095] To facilitate understanding, examples of incremental task application scenarios are given below, combined with... Figure 5 The diagram shown illustrates the key optimization points.
[0096] Assume the current task is task_i, the network model is M0, and the pseudo-data is X'. The update order of the current task task_i is as follows: network model M0 first fits the pseudo-data X', and then fits the real training data X1 to Xn of the current task. Specifically, the pseudo-data X' is input into network model M0 to obtain the prediction result y', and the loss function value loss' between the prediction result y' and the fake target result Y' is calculated. The loss function value loss' is used to update network model M0 to network model M1. The real data X1 to Xn are input into network model M1 to obtain the prediction results y1 to yn, and the loss function values loss1 to lossn between the prediction results y1 to yn and the corresponding target results Y1 to Yn are calculated. The loss function values loss1 to lossn are used to update network model M1 to network model M2.
[0097] In fact, the above process is performed for all historical tasks preceding the current task. Therefore, pseudo-data participates in the training of all historical tasks (task_{0:i-1}), and the network model learns by reviewing historical knowledge. The network model truly used to fit the data for the current task is M1 (which is updated by pseudo-data). That is, when network model M1 converges normally, pseudo-data should not harm the current task. Intuitively, the process of updating from network model M0 to network model M1 should at least not affect the network model's fit to the current task. Otherwise, in the alternating iterations of compressed training and regular training, the network model would always be unable to grasp the correct convergence direction, i.e., it would be unable to fit the current task.
[0098] In summary, the network model simultaneously fits pseudo-data and the real training data for the current task. Since the pseudo-data has been used in the training of all historical tasks, the final convergence method of the network model can only be achieved by using both pseudo-data storing historical training data and the real training data for the current task. Therefore, the technical solution of this application proposes a method that deeply integrates learnable pseudo-data into the training of all tasks, forcing the network model to compress historical information as efficiently as possible during the convergence process.
[0099] In implementation, the compressed training phase only updates the parameters of the network model; that is, loss' only updates the parameters of network model M0 to obtain network model M1. In contrast, the regular training phase updates both the network model and the pseudo-data simultaneously. Loss1 to lossn are used not only to update the parameters of network model M1 to obtain network model M2, but also to update the pseudo-data X' to obtain pseudo-data X'. If loss' is directly used to update the input, X' is variable for network model M1. That is, by directly adjusting X' instead of the network model's parameters, loss' can be made smaller and smaller. This leads to a degradation in the network model's fitting method. Therefore, it is possible to obtain updated pseudo-data and achieve the current task's fitting effect without compressing the historical training data.
[0100] The following is about Figures 2 to 4 The method steps described below are illustrated with an overall example. Assume there are three tasks initially, designated task_{0,1,2}, with training data x_{0,1,2} for each task and corresponding target score labels (i.e., target results) of {0,1,2}. A new task is added, with training data x_3 and target score labels (i.e., target results) of {3.5,7,10}. Therefore, the total target score labels are {0,1,2,3.5,7,10}.
[0101] For the training process on task_0: Randomly initialize pseudo-data X' with size [3, 512]; the corresponding score labels are [0, 1, 2]; randomly initialize the network model M_random; and perform the following... Figure 2 The method steps shown involve compressed training, regular training, compressed training, regular training, that is, compressed training and regular training are alternately iterated; the two training methods are performed alternately until both loss' + loss reach their minimum, then the training ends, and pseudo-data X'_task0 and network model M_task0 are obtained.
[0102] Regarding the training process on task_1, the pseudo-data X'_task0 is no longer randomly initialized, nor is the network model M_task0 randomly initialized. Instead, the network model M_task0 and the pseudo-data X'_task0 are used directly for training. Figure 2 The method steps shown involve compressed training, regular training, compressed training, regular training, that is, compressed training and regular training are alternately iterated, and the above two training methods are performed alternately until the loss' + loss both reach the minimum, then the training ends, and the pseudo data X'_task1 and the network model M_task1 are obtained.
[0103] Regarding the training process on task_2, the network model M_task1 and pseudo-data X'_task1 are directly used for training. Figure 2 The method steps shown involve compressed training, regular training, compressed training, regular training, that is, compressed training and regular training are alternately iterated, and the above two training methods are performed alternately until the loss' + loss both reach the minimum, then the training ends, and the pseudo data X'_task2 and the network model M_task2 are obtained.
[0104] Regarding the training process on task_3: Due to the introduction of a new category, the pseudo-data X'_task2 needs to be expanded. Specifically, new category labels are added and learnable parameters are appended to X'_task2, resulting in expanded pseudo-data. This can be done by randomly initializing only the expanded portion of the pseudo-data, or by randomly initializing all portions of the expanded pseudo-data. The size of the expanded pseudo-data becomes [6, 512], with corresponding score labels of [0, 1, 2, 3.5, 7, 10]. The parameters of the network model M_task2 are initialized as follows: Figure 2 The method steps shown are compressed training, regular training, compressed training, regular training, that is, compressed training and regular training are alternately iterated; the two training methods are alternately performed until both loss'+loss reach the minimum, then the training ends, and the pseudo data X'_task3 and the network model M_task3 are obtained.
[0105] pass Figures 2 to 4 The method described above can obtain a trained network model and trained pseudo-data. However, the extent to which the above iterative process should be stopped is determined by the following method in this embodiment.
[0106] The following is based on Figure 2 Let's take an example to illustrate when to stop iterating. Figure 3 and 4 The timing for stopping iterations in the steps shown is the same and will not be repeated. Iteration stops when the difference between the j-th prediction result and the true target result in the real training data, and / or the difference between the j-th prediction result and the fake target result in the j-th iteration of the pseudo-data, meets a preset condition, resulting in the (j+1)-th network model and the (j+1)-th iteration of the pseudo-data.
[0107] The difference can refer to the loss function value between the predicted result and the target result, or it can refer to the accuracy of the predicted result compared to the target result. This application does not specifically limit this.
[0108] For example, iteration stops when the difference between the j-th prediction result of the real training data and the real target result meets a preset condition. Similarly, iteration stops when the difference between the j-th prediction result of the pseudo-data in the j-th iteration and the fake target result meets a preset condition. Furthermore, iteration stops when the sum of the difference between the j-th prediction result of the real training data and the real target result, and the difference between the j-th prediction result of the pseudo-data in the j-th iteration and the fake target result meets a preset condition.
[0109] The embodiments of this application do not specifically limit the preset conditions. For example, it can be a preset threshold. When the difference is greater than or equal to the preset threshold, it means that the preset conditions are not met and the iteration continues. When the difference is less than the preset threshold, it means that the preset conditions are met and the iteration stops.
[0110] In other words, compressed training and regular training are performed alternately until the difference meets the preset conditions, at which point the update stops.
[0111] Figure 6 The diagram shown is a schematic flowchart of a data processing method provided in an embodiment of this application. For example, in this embodiment, the method can be... Figure 1 The server 140 or other type of electronic device with data processing capabilities mentioned herein shall perform the execution. For example... Figure 6 As shown, the method includes the following steps.
[0112] Step 610: Input the data to be classified into the network model, wherein the network model is trained using the network model training method described in any of the above embodiments.
[0113] The data to be classified can be data to be classified in any type of incremental learning task, such as test questions and their corresponding student answers.
[0114] Step 620: Use a network model to classify the data to be classified.
[0115] In one embodiment, the network model is trained by learning from the answers of various score ranges, such as 0 to 10. After the test questions and their corresponding student answers are input into the network model, the network model can classify the test questions and their corresponding student answers into any score range from 1 to 10.
[0116] According to the technical solution provided in the embodiments of this application, the network model trained by the training method described in the above embodiments is used to classify the data to be classified, thereby improving the classification accuracy of the network model.
[0117] The above text combined Figures 2 to 6 The method embodiments of this application are described in detail below, in conjunction with... Figure 7and Figure 8 The present application provides a detailed description of the apparatus embodiments. It should be understood that the descriptions of the method embodiments correspond to the descriptions of the apparatus embodiments; therefore, any parts not described in detail can be found in the foregoing method embodiments.
[0118] Figure 7 This is a schematic structural diagram of a network model training device provided in one embodiment of this application. Figure 7 As shown, Figure 7 The training device 700 may include: a compressed training module 710, a regular training module 720, and an iterative module 730. These modules are described in detail below.
[0119] The compressed training module 710 is configured to: a) predict the i-1th prediction result of the pseudo data in the i-1th iteration using the i-1th network model, and update the i-1th network model based on the difference between the i-1th prediction result of the pseudo data in the i-1th iteration and the forged target result, thereby obtaining the updated i-1th network model. The pseudo data in the i-1th iteration does not have a clear target result, and the forged target result is obtained through forgery.
[0120] The regular training module 720 is configured as follows: b) Based on the real training data, the updated i-1th network model predicts the i-1th prediction result of the real training data, and based on the difference between the i-1th prediction result of the real training data and the real target result, the updated i-1th network model and the i-1th iteration pseudo data are updated to obtain the i-th network model and the i-th iteration pseudo data, wherein the real training data has a clear target result.
[0121] The iteration module 730 is configured to iteratively execute the above steps a) and b) up to the j-th iteration to obtain the (j+1)-th network model and the pseudo data of the (j+1)-th iteration, where i is an integer greater than or equal to 2 and less than or equal to j+1, and j is an integer greater than or equal to 2.
[0122] This application provides a training scheme for a network model. While the network model learns real training data in real time, it also continuously updates the pseudo data using real training data. This compresses the information contained in the real training data into a smaller and denser pseudo data without loss. This not only enables the network model to learn real training data in real time, but also avoids occupying computing and storage resources, thereby effectively balancing the performance of the network model and the cost of training.
[0123] In one embodiment, when real training data is replaced or expanded by data of the same category to obtain training data of the same category, the compressed training module 710 is further configured to: c) predict the j+1th prediction result of the pseudo data of the j+1th iteration using the j+1th network model based on the pseudo data of the j+1th iteration, and update the j+1th network model based on the difference between the j+1th prediction result of the pseudo data of the j+1th iteration and the fake target result, thereby obtaining the updated j+1th network model; the regular training module 720 is further configured to: d) predict the j+1th prediction result of the pseudo data of the j+1th iteration based on the training data of the same category. According to the updated (j+1)th network model, the (j+1)th prediction result of the same category of training data is predicted. Based on the difference between the (j+1)th prediction result of the same category of training data and the true target result, the updated (j+1)th network model and the pseudo data of the (j+1)th iteration are updated to obtain the (j+2)th network model and the pseudo data of the (j+2)th iteration. The iteration module 730 is also configured to iteratively execute the above steps c) and d) up to the nth iteration to obtain the (n+1)th network model and the pseudo data of the (n+1)th iteration, where n is an integer greater than or equal to j+1.
[0124] In one embodiment, when different categories of training data are obtained by replacing or expanding real training data with different categories of data, different categories are added to the pseudo data of the (j+1)th iteration to obtain expanded pseudo data of the (j+1)th iteration. The compressed training module 710 is further configured to predict the (j+1)th prediction result of the expanded pseudo data of the (j+1)th iteration through the (j+1)th network model based on the expanded pseudo data of the (j+1)th iteration, and update the (j+1)th network model based on the difference between the (j+1)th prediction result of the expanded pseudo data of the (j+1)th iteration and the forged expanded target result, to obtain the updated (j+1)th network model. Wherein, when j+1 = 3, the pseudo data of the (j+1)th iteration is the initial expanded pseudo data, and the forged expanded target is... The result is obtained by adding different category labels to the fake target result; the regular training module 720 is also configured to predict the j+1 prediction result of the training data of different categories through the updated j+1 network model, and update the updated j+1 network model and the expanded pseudo data of the j+1 iteration based on the difference between the j+1 prediction result of the training data of different categories and the real target result, to obtain the j+2 network model and the expanded pseudo data of the j+2 iteration; the iteration module 730 is also configured to iterate the above steps e) and f) up to the nth iteration to obtain the n+1 network model and the pseudo data of the n+1th iteration, where n is an integer greater than or equal to j+1.
[0125] In one embodiment, the iteration module 730 is further configured to stop the iteration when the difference between the j-th prediction result of the real training data and the real target result, and / or the difference between the j-th prediction result of the pseudo data of the j-th iteration and the fake target result, meets a preset condition, thereby obtaining the (j+1)-th network model and the (j+1)-th iteration pseudo data.
[0126] In one embodiment, the pseudo data for the (i-1)th iteration is a matrix. When the training method is applied to the application scenario of educational score correction, the fake target result includes at least one score. Each score is obtained by forging score labels for a preset number of elements in the matrix.
[0127] Figure 8 This is a schematic structural diagram of a data processing apparatus provided in one embodiment of this application. Figure 8 As shown, Figure 8 The device 800 may include an input module 810 and a classification module 820. These modules are described in detail below.
[0128] The input module 810 is configured to input the data to be classified into the network model, wherein the network model is trained by the training method of the network model described in any of the above embodiments.
[0129] The classification module 820 is configured to use a network model to classify the data to be classified.
[0130] According to the technical solution provided in the embodiments of this application, the network model trained by the training method described in the above embodiments is used to classify the data to be classified, thereby improving the classification accuracy of the network model.
[0131] Figure 9 This is a schematic diagram of the structure of an electronic device 900 provided in one embodiment of this application. The electronic device 900 may be, for example, a computing device with computing capabilities. For instance, the electronic device 900 may be a server. The electronic device 900 may include a memory 910 and a processor 920. The memory 910 may be used to store executable code. The processor 920 may be used to execute the executable code stored in the memory 910 to implement the steps in the various methods described above. In some embodiments, the electronic device 900 may further include a network interface 930, through which the processor 920 can exchange data with external devices.
[0132] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any other combination. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., digital video discs (DVDs)), or semiconductor media (e.g., solid-state drives (SSDs)).
[0133] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments of this application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0134] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0135] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0136] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0137] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for training a network model, characterized in that, Applied to educational score grading applications, the method includes: a) Based on the pseudo-data from the (i-1)th iteration, predict the (i-1)th prediction result of the pseudo-data from the (i-1)th iteration using the (i-1)th network model, and update the (i-1)th network model based on the difference between the (i-1)th prediction result of the pseudo-data from the (i-1)th iteration and the forged target result, to obtain an updated (i-1)th network model. The pseudo-data from the (i-1)th iteration does not have a definite target result, and the forged target result is obtained through forgery. The (i-1)th prediction result of the pseudo-data from the (i-1)th iteration includes at least one prediction score, each prediction score being a prediction score for a predetermined number of elements in the pseudo-data from the (i-1)th iteration. The forged target result includes at least one score, each score being obtained by forging score labels for a predetermined number of elements in the pseudo-data from the (i-1)th iteration. b) Based on the real training data, using the updated (i-1)th network model, predict the (i-1)th prediction result of the real training data, and based on the difference between the (i-1)th prediction result of the real training data and the actual target result, update the updated (i-1)th network model and the pseudo data of the (i-1)th iteration to obtain the i-th network model and the pseudo data of the i-th iteration, wherein the real training data has a clear target result; the real training data includes test questions and their corresponding standard answers and student answers; the (i-1)th prediction result of the real training data includes the predicted score for the real training data, and the actual target result includes the manual score for the real training data; Iteratively execute steps a) and b) above up to the j-th iteration to obtain the (j+1)-th network model and the pseudo data of the (j+1)-th iteration, where i is an integer greater than or equal to 2 and less than or equal to j+1, and j is an integer greater than or equal to 2.
2. The training method according to claim 1, characterized in that, When the real training data is replaced or expanded by data of the same category to obtain training data of the same category, the training method further includes: c) Based on the pseudo data of the (j+1)th iteration, predict the (j+1)th prediction result of the pseudo data of the (j+1)th iteration through the (j+1)th network model, and update the (j+1)th network model based on the difference between the (j+1)th prediction result of the pseudo data of the (j+1)th iteration and the forged target result, to obtain the updated (j+1)th network model. d) Based on the training data of the same category, predict the (j+1)th prediction result of the training data of the same category through the updated (j+1)th network model, and update the updated (j+1)th network model and the pseudo data of the (j+1)th iteration based on the difference between the (j+1)th prediction result of the training data of the same category and the actual target result, to obtain the (j+2)th network model and the pseudo data of the (j+2)th iteration. Iteratively execute steps c) and d) above up to the nth iteration to obtain the (n+1)th network model and the pseudo data for the (n+1)th iteration, where n is an integer greater than or equal to j+1.
3. The method according to claim 1, characterized in that, When the real training data is replaced or expanded by different categories of data to obtain training data of different categories, the different categories are added to the pseudo data of the (j+1)th iteration to obtain the expanded pseudo data of the (j+1)th iteration. The training method further includes: e) Based on the augmented pseudo-data of the (j+1)th iteration, predict the (j+1)th prediction result of the augmented pseudo-data of the (j+1)th iteration through the (j+1)th network model, and update the (j+1)th network model based on the difference between the (j+1)th prediction result of the augmented pseudo-data of the (j+1)th iteration and the forged augmented target result, to obtain the updated (j+1)th network model. Wherein, when j+1=3, the pseudo-data of the (j+1)th iteration is the initial augmented pseudo-data, and the forged augmented target result is obtained by adding the labels of different categories on the basis of the forged target result. f) Based on the training data of different categories, predict the (j+1)th prediction result of the training data of different categories through the updated (j+1)th network model, and update the updated (j+1)th network model and the augmented pseudo data of the (j+1)th iteration based on the difference between the (j+1)th prediction result of the training data of different categories and the true target result, to obtain the (j+2)th network model and the augmented pseudo data of the (j+2)th iteration; Iteratively execute steps e) and f) above until the nth iteration to obtain the (n+1)th network model and the pseudo data of the (n+1)th iteration, where n is an integer greater than or equal to j+1.
4. The training method according to any one of claims 1 to 3, characterized in that, The iteration executes steps a) and b) above up to the j-th iteration, obtaining the (j+1)-th network model and the pseudo-data for the (j+1)-th iteration, including: When the difference between the j-th prediction result of the real training data and the real target result, and / or the difference between the j-th prediction result of the pseudo data of the j-th iteration and the fake target result, satisfy a preset condition, the iteration stops, and the (j+1)-th network model and the (j+1)-th iteration pseudo data are obtained.
5. The training method according to any one of claims 1 to 3, characterized in that, The pseudo data for the (i-1)th iteration is a matrix. When the training method is applied to the educational score correction scenario, the fake target result includes at least one score. Each score is obtained by forging score labels for a preset number of elements in the matrix.
6. A data processing method, characterized in that, include: The data to be classified is input into the network model, wherein the network model is trained by the training method of any one of claims 1 to 5; The network model is used to classify the data to be classified.
7. A training device for a network model, characterized in that, The device, used in educational score grading applications, includes: The compressed training module is configured to: a) predict the (i-1)th prediction result of the (i-1)th iteration of the pseudo-data using the (i-1)th iteration network model, and update the (i-1)th network model based on the difference between the (i-1)th prediction result of the (i-1)th iteration of the pseudo-data and the forged target result, thereby obtaining an updated (i-1)th network model. The (i-1)th iteration of the pseudo-data does not have a definite target result, and the forged target result is obtained through forgery. The (i-1)th prediction result of the (i-1)th iteration of the pseudo-data includes at least one prediction score, each prediction score being a prediction score for a predetermined number of elements in the (i-1)th iteration of the pseudo-data. The forged target result includes at least one score, each score being obtained by forging score labels for a predetermined number of elements in the (i-1)th iteration of the pseudo-data. The standard training module is configured as follows: b) Based on the real training data, the updated (i-1)th network model predicts the (i-1)th prediction result of the real training data, and updates the updated (i-1)th network model and the pseudo-data of the (i-1)th iteration based on the difference between the (i-1)th prediction result of the real training data and the actual target result, to obtain the i-th network model and the pseudo-data of the i-th iteration. The real training data has a definite target result; the real training data includes test questions and their corresponding standard answers, as well as student answers; the (i-1)th prediction result of the real training data includes the predicted score for the real training data, and the actual target result includes the manual score for the real training data. The iteration module is configured to iteratively execute steps a) and b) above up to the j-th iteration to obtain the (j+1)-th network model and the pseudo data of the (j+1)-th iteration, wherein i is an integer greater than or equal to 2 and less than or equal to j+1, and j is an integer greater than or equal to 2.
8. A data processing apparatus, characterized in that, include: An input module is configured to input data to be classified into a network model, wherein the network model is trained using the training method of any one of claims 1 to 5; The classification module is configured to use the network model to classify the data to be classified.
9. An electronic device, characterized in that, include: A memory and a processor, wherein the memory stores executable code and the processor is configured to execute the executable code to implement the method of any one of claims 1 to 6.
10. A storage medium, characterized in that, The storage medium stores executable code, which, when executed by a processor, implements the method as described in any one of claims 1 to 6.