Method and apparatus for evaluating generality of continuous learning model, and electronic device
By testing the continuous learning model on target classification and probe task sets, the classification accuracy and general text representation results are obtained, which solves the problem that existing technologies cannot explain the changes in the general representation of language models, and realizes accurate evaluation and multi-task adaptability improvement of the continuous learning model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2023-03-02
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies, when evaluating continuous learning models, neglect the growth potential and general knowledge storage of large-scale language models in continuous learning scenarios, and cannot explain the changes in general language representations, thus limiting the exploration and improvement of continuous learning scenarios.
By testing continuously learned language models and single-task language models using target classification tasks and probe task sets, we can obtain classification accuracy and general text representation results, determine the general evaluation results of the target, and indicate the correlation between the general representation ability of the initial pre-trained language model after continuous learning and the non-continuous learning model.
It enables accurate evaluation of continuous learning models, can explain their variations in classification tasks and general text representations, improves the accuracy of models on multiple classification tasks, and increases the diversity and flexibility of model applications.
Smart Images

Figure CN118586444B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a general method, apparatus and electronic device for evaluating the performance of a continuous learning model. Background Technology
[0002] Current evaluations of continuous learning models primarily assess the effectiveness of the models on previously learned tasks, neglecting the growth potential and general knowledge storage of large-scale language models in continuous learning scenarios. They also fail to explain the changes in general language representations during continuous learning, thus limiting the exploration and improvement of this scenario. Summary of the Invention
[0003] In view of the aforementioned technical problems, this application proposes a general evaluation method, apparatus, and electronic device for continuous learning models.
[0004] According to one aspect of this application, a general evaluation method for a continuous learning model is provided, the method comprising:
[0005] The continuously learned language model and the single-task language model are tested on the target classification task set corresponding to the target classification task, respectively, to obtain the first classification accuracy of the continuously learned language model and the second classification accuracy of the single-task language model. The continuously learned language model is the language model after the initial pre-trained language model has continuously learned the target classification task. The single-task language model is the language model after the initial pre-trained language model has learned the target classification task alone. The target classification task is any one of multiple classification tasks used for continuous learning.
[0006] The general text representations of the continuously learned language model and the initial pre-trained language model are tested using a probe task set to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model.
[0007] Based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result, a target general evaluation result is determined; the target general evaluation result is used to indicate the correlation between the general representation ability of the initial pre-trained language model after continuous learning of the multiple classification tasks and the general representation ability of the non-continuous learning model, wherein the non-continuous learning model includes the initial pre-trained language model and the single-task language model.
[0008] According to another aspect of this application, a general evaluation apparatus for a continuous learning model is provided, the apparatus comprising:
[0009] The classification task testing module is used to perform classification task testing on the continuously learned language model and the single-task language model respectively using the target test task set corresponding to the target classification task, to obtain the first classification accuracy corresponding to the continuously learned language model and the second classification accuracy corresponding to the single-task language model; the continuously learned language model is the language model after the initial pre-trained language model has continuously learned the target classification task; the single-task language model is the language model after the initial pre-trained language model has learned the target classification task alone; the target classification task is any one of multiple classification tasks used for continuous learning;
[0010] The general representation testing module is used to test the general text representations of the continuously learned language model and the initial pre-trained language model using a probe task set, respectively, to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model.
[0011] A general evaluation module is used to determine a target general evaluation result based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result. The target general evaluation result is used to indicate the correlation between the general representation ability of the initial pre-trained language model after continuously learning the multiple classification tasks and the general representation ability of the non-continuous learning model. The non-continuous learning model includes the initial pre-trained language model and the single-task language model.
[0012] According to another aspect of this application, an electronic device is provided, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the above-described method.
[0013] According to another aspect of this application, a non-volatile computer-readable storage medium is provided, on which computer program instructions are stored, wherein the computer program instructions, when executed by a processor, implement the above-described method.
[0014] According to another aspect of this application, a computer program product is provided, including computer instructions that, when executed by a processor, cause a computer to perform the above-described method.
[0015] By testing the classification task set corresponding to the target classification task, both the continuously learned language model and the single-task language model were subjected to classification task testing. This yielded the difference in classification accuracy between the two models. Furthermore, the text general representation of the continuously learned language model and the initially pre-trained language model were tested using a probe task set, revealing the difference in text general representation ability between the continuously learned and initially pre-trained language models. Based on these two differences, the target general evaluation result of the continuously learned language model for the target classification task can be determined. This result accurately reflects both the changes in general representation of classification tasks between continuous and non-continuous learning, and the changes in text general representation between continuous learning and the initially pre-trained language model. The target general evaluation result is more precise and can more accurately and effectively explain the general variations of the continuously learned model. Based on this, by utilizing the general evaluation results of this objective, the general text representation capability of a single model can be effectively and flexibly controlled when using a single model to achieve multiple classification functions. This can both increase the diversity of single model applications and avoid training a model for each classification task, and meet the requirements of the continuous learning model for general text representation in multiple classification tasks, thereby improving the classification accuracy of the continuous learning model in multiple classification tasks.
[0016] Other features and aspects of this application will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0017] The accompanying drawings, which are included in and form part of this specification, illustrate exemplary embodiments, features, and aspects of this application together with the specification and serve to explain the principles of this application.
[0018] Figure 1 This diagram illustrates an application system provided according to an embodiment of the present application.
[0019] Figure 2 A flowchart is shown illustrating a generality evaluation method for a continuous learning model provided according to an embodiment of this application.
[0020] Figure 3 This diagram illustrates a flowchart of a classification task testing process according to an embodiment of the present application.
[0021] Figure 4 The diagram illustrates a process according to an embodiment of this application, in which a probe task set is used to test the general text representations of a continuously learned language model and an initial pre-trained language model to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model.
[0022] Figure 5 This diagram illustrates a syntactic representation testing process according to an embodiment of the present application.
[0023] Figure 6 This diagram illustrates a semantic representation testing process according to an embodiment of the present application.
[0024] Figure 7 This diagram illustrates a flowchart for evaluating the generality of a continuous learning model according to an embodiment of this application.
[0025] Figure 8 This diagram illustrates a classification accuracy and a general evaluation result of a target in continuous learning according to an embodiment of this application.
[0026] Figure 9 This diagram illustrates a block diagram of a generality evaluation apparatus for a continuous learning model according to an embodiment of this application.
[0027] Figure 10 This is a block diagram illustrating an electronic device for general evaluation of a continuous learning model according to an exemplary embodiment. Detailed Implementation
[0028] Various exemplary embodiments, features, and aspects of this application will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.
[0029] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.
[0030] Furthermore, to better illustrate this application, numerous specific details are provided in the following detailed embodiments. Those skilled in the art should understand that this application can be implemented without certain specific details. In some instances, methods, means, components, and circuits well-known to those skilled in the art have not been described in detail in order to highlight the main points of this application.
[0031] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or computers-controlled machines to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. AI software technology mainly includes computer vision, speech processing, natural language processing, and machine learning / deep learning.
[0032] In recent years, with the research and progress of artificial intelligence technology, it has been widely applied in many fields. The solutions provided in this application involve technologies such as natural language processing and machine learning / deep learning, which are specifically illustrated through the following embodiments:
[0033] Please see Figure 1 , Figure 1 This diagram illustrates an application system according to an embodiment of this application. The application system can be used for the generality evaluation method of the continuous learning model of this application. Figure 1 As shown, the application system may include at least server 01 and terminal 02.
[0034] In this embodiment of the application, the server 01 can be used for the general evaluation processing of the continuous learning model. The server 01 may include an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
[0035] In this embodiment, the terminal 02 can be used to trigger the execution of the general evaluation process and receive and display the target general evaluation results. It can also collect language text for the server 01 to construct test task sets and probe task sets. The terminal 02 can include physical devices such as smartphones, desktop computers, tablets, laptops, smart speakers, digital assistants, augmented reality (AR) / virtual reality (VR) devices, and smart wearable devices. Physical devices can also include software running on them, such as applications. In this embodiment, the operating system running on the terminal 02 can include, but is not limited to, Android, iOS, Linux, and Windows.
[0036] In the embodiments described in this specification, the terminal 02 and the server 01 can be directly or indirectly connected through wired or wireless communication, and this application does not limit this connection.
[0037] In a specific embodiment, when server 02 is a distributed system, this distributed system can be a blockchain system. When the distributed system is a blockchain system, it can be formed by multiple nodes (any form of computing device connected to the network, such as servers or user terminals). These nodes form a peer-to-peer (P2P) network. The P2P protocol is an application layer protocol running on top of the Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or terminal, can join and become a node. A node includes a hardware layer, a middleware layer, an operating system layer, and an application layer. Specifically, the functions of each node in the blockchain system may include:
[0038] 1) Routing: A basic function of nodes used to support communication between nodes.
[0039] In addition to routing capabilities, nodes can also have the following functions:
[0040] 2) Applications are deployed in the blockchain to implement specific business needs. They record data related to the implementation of functions to form record data, carry digital signatures in the record data to indicate the source of the task data, and send the record data to other nodes in the blockchain system. When other nodes successfully verify the source and integrity of the record data, they add the record data to a temporary block.
[0041] It should be noted that in the specific implementation of this application, user-related data is involved. When the following embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0042] Figure 2 A flowchart illustrating a generality evaluation method for a continuous learning model according to an embodiment of this application is shown. Figure 2 As shown, the generality evaluation method for this continuous learning model may include:
[0043] S201, using the target test task set corresponding to the target classification task, the continuous learning language model and the single-task language model are respectively subjected to classification task testing to obtain the first classification accuracy of the continuous learning language model and the second classification accuracy of the single-task language model.
[0044] In the embodiments of this specification, the target classification task can be any one of multiple classification tasks used for continuous learning. For example, the number of multiple classification tasks can be N, where N can be an integer greater than or equal to 2. This application does not limit the sequential learning order of these multiple classification tasks. After any classification task is completed, a sequentially learned language model can be obtained corresponding to the initial pre-trained language model after learning that classification task. That is, the sequentially learned language model can be the language model obtained after the initial pre-trained language model has continuously learned the target classification task. Based on this, N sequentially learned language models corresponding to N classification tasks can be obtained. For example, the initial pre-trained language model can be BERT (Bidirectional Encoder Representation from Transformers), DistilBERT (Distilled BERT model, i.e., a model obtained by knowledge distillation of BERT), etc., and this application does not limit this.
[0045] As an example, classification tasks can include classifying the emotional expression of a text, classifying the subject matter of a text, or classifying the content theme of a text, etc. This application does not limit this to any particular task.
[0046] For example, with N=3, and the sequential learning order being classification task A, classification task B, and classification task C, after the initial pre-trained language model learns classification task A, we obtain a sequentially learned language model corresponding to classification task A. This model can then classify classification task A and can be called sequential learning model A. After the initial pre-trained language model sequentially learns classification tasks A and B, we obtain a sequential learning language model AB, which can classify both classification tasks A and B. Finally, after the initial pre-trained language model sequentially learns classification tasks A, B, and C, we obtain a sequential learning model ABC, which can classify all three tasks. In this sequential learning process, we obtain sequential learning models A, AB, and ABC.
[0047] The continuous learning language model is trained on a continuous learning model based on the sample text and task labels corresponding to the continuously learned classification tasks. For example, after continuously learning classification tasks A and B, when learning classification task C, the sample text and task labels (such as topic labels) corresponding to classification task C can be obtained. The sample text corresponding to classification task C can then be input into the continuous learning model AB for text representation prediction processing to obtain text prediction features. These text prediction features can then be input into the initial task classifier for topic classification processing to obtain topic prediction information. Based on this, loss information can be determined according to the topic labels and topic prediction information. Gradient information can then be calculated based on the loss information, and gradient backpropagation can be performed to adjust the parameters of the continuous learning model AB and the parameters of the initial task classifier until the iteration conditions are met. The continuous learning model AB that meets the iteration conditions can then be taken as the continuous learning model ABC, and the initial task classifier that meets the iteration conditions can be taken as the first classifier corresponding to the continuous learning model AB. The iteration conditions can be a threshold for the number of iterations, a loss threshold, etc., which are not limited in this application.
[0048] For single-task language models, a single-task language model can be a language model that has been trained on an initial pre-trained language model to perform a target classification task. Accordingly, taking the example of classification tasks A, B, and C when N=3, a single-task language model can include single-task language model A, single-task language model B, and single-task language model C. Specifically, single-task language model A can refer to a language model that has been trained on classification task A alone, used to classify task A; single-task language model B can refer to a language model that has been trained on classification task B alone, used to classify task B alone; and single-task language model C can refer to a language model that has been trained on classification task C alone, used to classify task C alone.
[0049] In the embodiments described in this specification, the target test task set can be any one of multiple test task sets. These multiple test task sets can correspond to multiple classification tasks and can be used to test the accuracy of the model (continuous learning language model and single-task language model) on multiple classification tasks. As an example, the test task set may include text data used for testing.
[0050] It should be noted that the single-task language model can be pre-trained or trained synchronously with the continuously learned language model. This application does not limit the training timing of the single-task language model. The single-task language model can be obtained through supervised learning based on sample text data and corresponding classification task labels. For example, for a single-task language model C, sample text data and the corresponding classification task labels for each sample text data can be obtained. For example, if classification task C is to classify the text's subject matter, the corresponding classification task label can be a text subject matter label, such as prose or non-prose. The sample text data can then be input into the initial pre-trained language model to obtain text vector representations. Based on this, the text vector representations can be classified using a pre-defined classifier to obtain the predicted text subject matter. Based on this, loss information can be determined according to the predicted text subject matter and text subject matter labels. Gradient information can then be calculated based on the loss information, and gradient backpropagation can be performed to adjust the parameters of the initial pre-trained language model and the parameters of the pre-defined classifier until the iteration conditions are met. The initial pre-trained language model that meets the iteration conditions can then be used as the single-task language model C, and the pre-defined classifier that meets the iteration conditions can be used as the second classifier corresponding to the single-task language model C. Figure 3 As shown. The iteration conditions can be thresholds for the number of iterations, loss thresholds, etc., which are not limited in this application. Based on this training method, single-task language models A, B, and C can be obtained, as well as the second classifiers corresponding to single-task language models A, B, and C.
[0051] In the embodiments of this specification, since the classification task testing of the continuously learned language model is performed during the continuous learning process and before the next classification task is learned, the classification task testing of the continuously learned language model can be performed using the target test task set corresponding to the target classification task after the initial pre-trained language model has continuously learned the target classification task and completed the learning. As for the timing of using the target test task set corresponding to the target classification task to perform classification task testing on the single-task language model, this application does not limit this, as long as it is obtained before step S203 requires it. Furthermore, the test task sets corresponding to N classification tasks can be used to perform classification task testing on the corresponding single-task language model to obtain N second classification accuracies.
[0052] Specifically, an output layer can be connected to the output side of the pre-trained language model to achieve classification processing for multiple classification tasks. The initial state of this output layer can be an initial task classifier (e.g., a multilayer perceptron). Thus, the pre-trained language model and the initial task classifier can be trained based on sample text to obtain a corresponding continuously learned language model and a first classifier corresponding to the continuously learned language model. Figure 3 The specific training process can be found in the training process of the continuous learning language model described above, and will not be repeated here. Accordingly, taking the sequential learning of three classification tasks A, B, and C as an example, when learning classification task A, we can obtain continuous learning model A and its corresponding first classifier; when learning classification tasks A and B, we can obtain continuous learning model AB and its corresponding first classifier; and when learning classification tasks A, B, and C, we can obtain continuous learning model ABC and its corresponding first classifier.
[0053] like Figure 3 As shown, the text data used for testing in the target test task set can be input into the continuous learning language model for text feature extraction to obtain the first text feature. Further, the first text feature can be input into the first classifier for text classification to obtain the first task classification result. The first task classification result can then be compared with the task labels in the target test task set to obtain the first classification accuracy. The first classification accuracy can be the ratio of the number of first task classification results that match the task label to the total number of first task classification results. In the target test task set for classification task m, the total number of text data used for testing is 100. After the text data is processed by the continuous learning language model and the first classifier, the number of first task classification results that match the task label is 90. Since one text data point corresponds to one first task classification result, the total number of first task classification results is 100. Therefore, the first classification accuracy = 90 / 100. That is, the continuous learning language model that has continuously learned classification tasks 1 to m has a first classification accuracy of 90% in classification task m.
[0054] Accordingly, when testing a single-task language model, the text data used for testing in the target test task set can be input into the single-task language model for text feature extraction to obtain second text features. Further, the second text features can be input into a second classifier for text classification to obtain the second task classification result. The second task classification result can then be compared with the task labels in the target test task set to obtain the second classification accuracy. The second classification accuracy can be the ratio of the number of second task classification results that match the task labels to the total number of second task classification results.
[0055] S203, using the probe task set, test the general text representations of the continuously learned language model and the initial pre-trained language model respectively, to obtain the first test result corresponding to the continuously learned language model and the second test result corresponding to the initial pre-trained language model.
[0056] In the embodiments of this specification, the probe task set can refer to a task set used to test the general text representations of a continuously learned language model and an initial pre-trained language model. It may include general test text data. This disclosure does not limit the general test text data, as long as it can effectively test the general text representations of the model. As an example, the general test text data may include syntactic test text data and semantic test text data. For example, syntactic test text data may include text data used to test whether two consecutive markers in a sentence are reversed, the maximum depth of the sentence's syntactic tree, and whether the object and subject of a sentence are singular or plural. Semantic test text data may include text data used to test whether the order of two coordinating conjunctions in a sentence is reversed, whether the main verb of a sentence is marked as present or past tense, and whether each pair captures paraphrasing / semantic equivalence relations. For example, the maximum depth of the syntactic tree can be indicated using the textbf method.
[0057] In the embodiments of this specification, general test text data can be input into a continuously learned language model and an initial pre-trained language model respectively for general feature extraction. The extracted general features can then be input into a trained general feature classifier for classification prediction, resulting in a first classification prediction result corresponding to the continuously learned language model and a second classification prediction result corresponding to the initial pre-trained language model. Therefore, the first classification prediction result can be determined as the first test result, and the second classification prediction result can be determined as the second test result.
[0058] In one possible implementation, step S203 may include:
[0059] S401, input the general test text data from the probe task set into the continuous learning language model and the initial pre-trained language model respectively, perform general text feature extraction processing, and obtain the first general text feature corresponding to the continuous learning language model and the second general text feature corresponding to the initial pre-trained language model.
[0060] The first and second general text features can be features that characterize syntax or semantics, and this application does not limit them.
[0061] S403, input the first text general features into the general feature classifier, perform general feature classification processing, and obtain the first test result;
[0062] S405, input the second text general features into the general feature classifier, perform general feature classification processing, and obtain the second test result.
[0063] The general feature classifier can be obtained by training an initial classifier based on sample probe task data and corresponding general feature classification labels, with the parameters of the continuously learned language model fixed. The specific training process is described in detail below and will not be repeated here.
[0064] In one possible implementation, the probe task set may include a syntactic task set and a semantic task set, and the general test text data may include syntactic test text data from the syntactic task set and semantic test text data from the semantic task set; correspondingly, the first general text feature may include a first syntactic feature and a first semantic feature; the aforementioned general feature classifier may include a syntactic classifier and a semantic classifier, such as... Figure 5 and Figure 6 As shown. Accordingly, the step of inputting the first text general features into a general feature classifier for general feature classification processing to obtain the first test result may include: inputting the first syntactic features into a syntactic classifier for syntactic classification task processing to obtain the first syntactic classification result; inputting the first semantic features into a semantic classifier for semantic classification task processing to obtain the first semantic classification result; and using the first syntactic classification result and the first semantic classification result as the first test result.
[0065] Reference Figure 5 and Figure 6 The syntactic and semantic test text data can be input into the initial pre-trained language model separately for syntactic and semantic feature extraction, resulting in a second general text feature corresponding to the initial pre-trained language model. This second general text feature can include a second syntactic feature and a second semantic feature. Correspondingly, the second general text feature can be input into a general feature classifier for general feature classification to obtain a second test result. This can include inputting the second syntactic feature into a syntactic classifier for syntactic classification to obtain a second syntactic classification result, such as... Figure 5 As shown. Furthermore, the second semantic features can be input into a semantic classifier to perform semantic classification tasks and obtain the second semantic classification result; thus, the second syntactic classification result and the second semantic classification result can be used as the second test result.
[0066] The first and second syntactic classification results can include whether the object and subject of the sentence are singular or plural; the first and second semantic classification results can include whether the order of the two coordinating conjunctions is reversed or not reversed. This application does not impose any limitations on these.
[0067] S205, based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result, determine the target general evaluation result.
[0068] The general evaluation result can be used to indicate the correlation between the general representation ability of the initial pre-trained language model after continuously learning multiple classification tasks and the general representation ability of the non-continuous learning model, such as the difference in general representation ability and the trend of change in general representation ability. This application does not limit this. The non-continuous learning model here can include the initial pre-trained language model and the single-task language model.
[0069] In one possible implementation, step S205 may include: determining a first general evaluation result based on the difference between a first classification accuracy and a second classification accuracy; for example, the difference between the first classification accuracy and the second classification accuracy can be used as the first general evaluation result. A second general evaluation result may also be determined based on the difference between a first test result and a second test result; for example, the difference between the first test result and the second test result can be used as the second general evaluation result. This allows for the statistical analysis of the first and second general evaluation results for each of the multiple classification tasks to obtain a target general evaluation result. For example, the statistical results of the first and second general evaluation results for each of the multiple classification tasks can be used as the target general evaluation result. The statistical results here can be averages, summs, or other statistical results, and this application does not limit this.
[0070] In one optional implementation, determining a second general evaluation result based on the difference between the first test result and the second test result may include: using the difference between the first test result and the second test result as general difference information; and determining the ratio of the general difference information to the second test result as the second general evaluation result.
[0071] By testing the classification task set corresponding to the target classification task, both the continuously learned language model and the single-task language model were subjected to classification task testing. This yielded the difference in classification accuracy between the two models. Furthermore, the text general representation of the continuously learned language model and the initially pre-trained language model were tested using a probe task set, revealing the difference in text general representation ability between the continuously learned and initially pre-trained language models. Based on these two differences, the target general evaluation result of the continuously learned language model for the target classification task can be determined. This result accurately reflects both the changes in general representation of classification tasks between continuous and non-continuous learning, and the changes in text general representation between continuous learning and the initially pre-trained language model. The target general evaluation result is more precise and can more accurately and effectively explain the general variations of the continuously learned model. Based on this, by utilizing the general evaluation results of this objective, the general text representation capability of a single model can be effectively and flexibly controlled when using a single model to achieve multiple classification functions. This can both increase the diversity of single model applications and avoid training a model for each classification task, and meet the requirements of the continuous learning model for general text representation in multiple classification tasks, thereby improving the classification accuracy of the continuous learning model in multiple classification tasks.
[0072] In the embodiments of this specification, the general feature classifier corresponds to the classification task. After continuously learning any classification task, the parameters of the continuously learned language model can be fixed. In this case, the initial classifier is trained based on the sample probe task data and the corresponding general feature classification labels.
[0073] Taking the three classification tasks mentioned above as examples, three continuously learned language models can be obtained during the continuous learning process: continuously learned model A, continuously learned model AB, and continuously learned model ABC. This yields three general feature classifiers corresponding to continuously learned model A, continuously learned model AB, and continuously learned model ABC, respectively. For example, general feature classifier A corresponds to continuously learned model A, general feature classifier AB corresponds to continuously learned model AB, and general feature classifier ABC corresponds to continuously learned model ABC. In this process, for instance, after the pre-trained language model has continuously learned classification tasks A and B, resulting in continuously learned model AB, the model parameters of continuously learned model AB can be fixed, and subsequent classification task C is not learned. In this case, an initial classifier (such as an initial multilayer perceptron) can be connected to the output side of continuously learned model AB. Then, the initial classifier is trained using sample probe task data and corresponding general feature classification labels to obtain a general feature classifier AB that satisfies the iteration conditions and corresponds to the continuously learned model AB.
[0074] Based on the above introduction, taking the training process of a general feature classifier for any classification task (target task) as an example, the general feature classifier can be trained through the following steps:
[0075] If an initial pre-trained language model continuously learns and completes the target classification task, a continuously learned language model corresponding to the target classification task can be obtained. For example, the model parameters of the initial pre-trained language model that has continuously learned the target classification task can be frozen to obtain the continuously learned language model corresponding to the target classification task.
[0076] This allows the sample probe task data to be input into a continuously learning language model for text general feature extraction to obtain sample general features; and it also allows the sample general features to be classified based on the initial classifier to obtain sample general feature classification results.
[0077] Furthermore, loss information can be determined based on the general feature classification results and the general feature classification labels corresponding to the sample probe task data. For example, the general feature classification results can be compared with the general feature classification labels, and the classification error rate can be used as loss information; or the loss between the general feature classification results and the general feature classification labels can be calculated using a preset loss function to obtain loss information. This application does not limit the preset loss function.
[0078] Finally, the parameters of the initial classifier can be adjusted using the loss information until the training iteration conditions are met, resulting in a general feature classifier. For example, if the training iteration conditions are not met, gradient information can be determined based on the loss information, and gradient backpropagation can be used to adjust the parameters of the initial classifier. This process can then be repeated by returning to the steps described above, where sample probe task data is input into the continuously learning language model, iterating the training process until the training iteration conditions are met. The initial classifier that meets the training iteration conditions can then be used as the general feature classifier.
[0079] It should be noted that the parameters of the continuously learned language model are not adjusted during the training of the general feature classifier.
[0080] In one optional implementation, the sample probe task data may include sample syntactic data and / or sample semantic data; based on this, the trained general feature classifier may include a syntactic classifier and / or a semantic classifier. Specifically, in this case, the training process for the syntactic classifier may include: inputting the sample syntactic data into a continuously learning language model, performing syntactic feature extraction processing to obtain sample syntactic features; thereby, the sample syntactic features can be classified based on the initial classifier to obtain sample syntactic classification results; and determining loss information based on the sample syntactic classification results and the syntactic classification labels corresponding to the sample syntactic data. Further, the loss information can be used to adjust the parameters of the initial classifier until the training iteration conditions are met, thus obtaining the syntactic classifier.
[0081] Based on a training process similar to that of a syntactic classifier, an initial classifier can be trained using sample semantic data to obtain a semantic classifier. Specifically, sample semantic data can be input into a continuously learning language model for semantic feature extraction, resulting in sample semantic features. These features can then be used to perform semantic feature classification based on the initial classifier, yielding a sample semantic classification result. Furthermore, loss information can be determined based on the sample semantic classification result and the corresponding semantic classification labels. The loss information can then be used to adjust the parameters of the initial classifier until the training iteration conditions are met, resulting in the final semantic classifier.
[0082] The syntactic classification labels can include labels such as the inversion of two consecutive markers in a sentence, the non-inversion of two consecutive markers in a sentence, the maximum depth of the syntactic tree, whether the object and subject of the sentence are singular, or whether the object and subject of the sentence are plural. The semantic classification labels can include labels such as the inversion of the order of two coordinating conjunctions, the non-inversion of the order of two coordinating conjunctions, whether the main verb of the sentence is marked as present tense, or whether the main verb of the sentence is marked as past tense. This application does not limit the syntactic and semantic classification labels; sample syntactic data and sample semantic data can be set according to the syntactic and semantic representations to be evaluated, thereby setting corresponding syntactic and semantic classification labels for the sample syntactic and sample semantic data.
[0083] It should be noted that the initial classifier mentioned above is connected to the last layer of the continuous learning language model. Optionally, the initial classifier can also be connected to each layer of the continuous learning language model to train a general feature classifier for each layer. For example, if the BERT model has 12 layers, then 12 general feature classifiers can be trained for the target classification task. The training process of the general feature classifier for each layer can be referred to the training process of the general feature classifier mentioned above. That is, in each iteration, 12 loss information can be obtained. Based on these 12 loss information, the model parameters of the corresponding layer in the continuous learning language model and the parameters of the 12 initial classifiers can be adjusted respectively. This will not be elaborated further here. Accordingly, when the general feature classifier includes a syntactic classifier and a semantic classifier, based on this training method of connecting the initial classifier to each layer of the continuous learning language model, 12 syntactic classifiers and 12 semantic classifiers can be obtained after learning each classification task.
[0084] Reference Figure 7 In one application example, suppose there are multiple classification tasks, numbered 1 to N, where N is greater than or equal to 2. Taking classification task m as an example, m belongs to 1 to N. The test task set corresponding to classification task m can be used to test the continuously learned language model and the single-task language model for classification tasks, respectively, to obtain the first classification accuracy corresponding to the continuously learned language model and the second classification accuracy corresponding to the single-task language model. Here, the continuously learned language model can refer to the pre-trained language model that has continuously learned classification tasks 1 to m. Specifically, the test task set corresponding to classification task m can be input into the continuously learned language model for text representation processing to obtain the first text features. These first text features can then be input into the first classifier for classification task prediction processing to obtain the first task classification result. The first task classification result can then be compared with the task labels corresponding to the text data tested in the test task set corresponding to classification task m to obtain the first classification accuracy. The first classification accuracy can be the ratio of the number of first matching task labels in the first task classification result to the total number of first task classification results. Accordingly, when testing the single-task language model, the text data used for testing in the test task set corresponding to classification task m can be input into the single-task language model (a pre-trained language model that has only learned classification task m) for text feature extraction to obtain the second text features. Further, the second text features can be input into a second classifier for text classification to obtain the second task classification result. The second task classification result can then be compared with the task label to obtain the second classification accuracy. The second classification accuracy can be the ratio of the number of second-classification results that match the task label to the total number of second-classification results.
[0085] In the embodiments of this specification, general test text data from the probe task set can be input into a continuously learning language model and an initial pre-trained language model, respectively, to perform general text feature extraction processing, obtaining a first general text feature corresponding to the continuously learning language model and a second general text feature corresponding to the initial pre-trained language model. Specifically, the general test text data can include syntactic test text data and semantic test text data. Based on this, the syntactic test text data can be input into the continuously learning language model for syntactic representation processing to obtain a first syntactic feature; then, the first syntactic feature can be input into a syntactic classifier for syntactic classification prediction processing to obtain a first syntactic classification result. The first semantic feature is input into a semantic classifier for semantic classification task processing to obtain a first semantic classification result; the first syntactic classification result and the first semantic classification result are used as the first test result. Furthermore, the semantic test text data can be input into the continuously learning language model for semantic representation processing to obtain a first semantic feature, which can then be input into a semantic classifier for semantic classification task processing to obtain a first semantic classification result; then, the first syntactic classification result and the first semantic classification result are used as the first test result.
[0086] Furthermore, a first general evaluation result can be determined based on the difference between the first classification accuracy and the second classification accuracy; and a second general evaluation result can be determined based on the difference between the first test result and the second test result. Thus, a target general evaluation result can be obtained based on the first and second general evaluation results. In one example, the target general evaluation result can be calculated using the following formula, whereby the target general evaluation result may include GD, SynF, and SemF as described below.
[0087]
[0088]
[0089]
[0090] Where GD represents the first general evaluation result, and N represents the number of multiple classification tasks; R represents the second classification accuracy for classification task m. m,m p represents the first classification accuracy corresponding to classification task m under continuous learning; SynF and SemF represent the second general evaluation results, SynF can represent the syntactic general evaluation result, and SemF can represent the semantic general evaluation result; s It can represent a set of probe tasks, p Syn It can represent syntactic test text data. This can represent the result of the second syntactic classification. This can represent the first syntactic classification result corresponding to classification task m learned in successive learning; p Sem It can represent semantic test text data. It can represent the result of the second semantic classification. This can represent the first semantic classification result corresponding to classification task m learned in successive learning. |p Syn | can represent the number of syntactic tasks that the syntactic task set can test, that is, the number of types of tasks that test syntax; |p Sem | indicates the number of semantic tasks that the semantic task set can test, that is, the number of types of tasks that can test semantics.
[0091] Based on the above formulas, syntax and semantics can be statistically analyzed separately. For example, formula (2) above can be used to calculate the difference between the second syntactic classification result and the first syntactic classification result, and then calculate the first ratio of this difference to the second syntactic classification result. This first ratio can be statistically analyzed across multiple classification tasks to obtain the average of the first ratios, which serves as the general syntactic evaluation result. Correspondingly, the general semantic evaluation result can be calculated according to formula (3) above. For example, the difference between the second semantic classification result and the first semantic classification result can be calculated, and then calculate the second ratio of this difference to the second semantic classification result. This second ratio can be statistically analyzed across multiple classification tasks to obtain the average of the second ratios, which serves as the general semantic evaluation result. Thus, the general syntactic evaluation result and the general semantic evaluation result can be used as the second general evaluation result. Therefore, the first general evaluation result and the second general evaluation result can be used as the target general evaluation result.
[0092] Optionally, the target general evaluation result can be used to analyze the changes in the general representation ability of the pre-trained language model during continuous learning, and a trend graph or change information can be provided, along with feedback of this trend graph or change information to the terminal, to achieve the display and notification of the change trend graph or change information. Alternatively, a general representation threshold can be preset, which can be used to indicate the critical value at which the general representation ability meets the general requirements. Based on this, after continuously learning a classification task, the obtained target general evaluation result is compared with the general representation threshold. If the target general evaluation result is greater than or equal to the general representation threshold, continuous learning can be stopped because the general representation ability has decreased and either does not meet the general requirements or just meets them; if the target general evaluation result is less than the general representation threshold, it indicates that the pre-trained language model can meet the general requirements, and thus can continue learning other classification tasks. This can effectively balance the number of classification tasks and the general representation ability during continuous learning. The general characterization threshold may include the GD threshold, the SynF threshold, and the SemF threshold. Based on this, the target general evaluation result being greater than or equal to the general characterization threshold may mean being greater than or equal to at least one of the GD threshold, the SynF threshold, and the SemF threshold; the target general evaluation result being less than the general characterization threshold may mean that GD, SynF, and SemF are all less than the corresponding GD threshold, SynF threshold, and SemF threshold. This application does not limit this.
[0093] As shown in Table 1 below, GD, SemF, and SynF exhibit a positive correlation for different pre-trained language models, indicating that different evaluation metrics can corroborate each other's changes in model generality.
[0094] Table 1
[0095]
[0096] Reference Figure 8ACC represents the classification accuracy of a continuously learned language model after learning multiple classification tasks, i.e., the average accuracy. Various continuous learning methods can effectively mitigate catastrophic forgetting, allowing the model to maintain high classification accuracy even after continuous learning. These methods include BERT-FT, BERT-LwF, BERT-ER, and BERT-DERPP. BERT-FT builds upon BERT by adding a linear classification layer to directly optimize the model on the training task; this method serves as the baseline model. BERT-LwF, building upon BERT-FT, mitigates catastrophic forgetting by controlling the model's parameters to prevent significant shifts. BERT-ER, building upon BERT-FT, mitigates catastrophic forgetting by replaying memorized data. BERT-DERPP combines any two of the above strategies. Figure 8 It can be seen that the generality evaluation method of the continuous learning model in this application can accurately determine the changes in general knowledge under different continuous learning methods. During the continuous learning process, the number of classification tasks can be controlled based on the changes in this general knowledge, so that the continuous learning language model can maintain its ability to process more downstream tasks.
[0097] Figure 9 This diagram illustrates a block diagram of a generality evaluation apparatus for a continuous learning model according to an embodiment of this application. Figure 9 As shown, the device may include:
[0098] The classification task testing module 901 is used to perform classification task testing on the continuously learned language model and the single-task language model respectively using the target test task set corresponding to the target classification task, to obtain the first classification accuracy corresponding to the continuously learned language model and the second classification accuracy corresponding to the single-task language model; the continuously learned language model is the language model after the initial pre-trained language model has continuously learned the target classification task; the single-task language model is the language model after the initial pre-trained language model has learned the target classification task alone; the target classification task is any one of multiple classification tasks used for continuous learning.
[0099] The general representation testing module 903 is used to test the general text representations of the continuously learned language model and the initial pre-trained language model using a probe task set, respectively, to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model.
[0100] A general evaluation module 905 is used to determine a target general evaluation result based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result. The target general evaluation result is used to indicate the correlation between the general representation ability of the initial pre-trained language model after continuously learning the multiple classification tasks and the general representation ability of the non-continuous learning model. The non-continuous learning model includes the initial pre-trained language model and the single-task language model.
[0101] By testing the classification task set corresponding to the target classification task, both the continuously learned language model and the single-task language model were subjected to classification task testing. This yielded the difference in classification accuracy between the two models. Furthermore, the text general representation of the continuously learned language model and the initially pre-trained language model were tested using a probe task set, revealing the difference in text general representation ability between the continuously learned and initially pre-trained language models. Based on these two differences, the target general evaluation result of the continuously learned language model for the target classification task can be determined. This result accurately reflects both the changes in general representation of classification tasks between continuous and non-continuous learning, and the changes in text general representation between continuous learning and the initially pre-trained language model. The target general evaluation result is more precise and can more accurately and effectively explain the general variations of the continuously learned model. Based on this, by utilizing the general evaluation results of this objective, the general text representation capability of a single model can be effectively and flexibly controlled when using a single model to achieve multiple classification functions. This can both increase the diversity of single model applications and avoid training a model for each classification task, and meet the requirements of the continuous learning model for general text representation in multiple classification tasks, thereby improving the classification accuracy of the continuous learning model in multiple classification tasks.
[0102] In one possible implementation, the general characterization test module 903 described above may include:
[0103] A general feature extraction unit is used to input the general test text data in the probe task set into the continuous learning language model and the initial pre-trained language model respectively, and perform general text feature extraction processing to obtain the first general text feature corresponding to the continuous learning language model and the second general text feature corresponding to the initial pre-trained language model.
[0104] The first test unit is used to input the first text general features into a general feature classifier, perform general feature classification processing, and obtain the first test result; the general feature classifier is obtained by training an initial classifier based on sample probe task data and corresponding general feature classification labels while fixing the parameters of the continuous learning language model.
[0105] The second test unit is used to input the second text general features into the general feature classifier, perform general feature classification processing, and obtain the second test result.
[0106] In one possible implementation, the probe task set includes a syntactic task set and a semantic task set; the general test text data includes syntactic test text data from the syntactic task set and semantic test text data from the semantic task set; correspondingly, the first general text feature includes a first syntactic feature and a first semantic feature; the general feature classifier includes a syntactic classifier and a semantic classifier; the aforementioned first test unit may include:
[0107] The first syntactic classification subunit is used to input the first syntactic feature into the syntactic classifier, perform syntactic classification task processing, and obtain the first syntactic classification result;
[0108] The first semantic classification subunit is used to input the first semantic feature into the semantic classifier, perform semantic classification task processing, and obtain the first semantic classification result.
[0109] The first test subunit is used to take the first syntactic classification result and the first semantic classification result as the first test result.
[0110] In one possible implementation, the second general text feature includes a second syntactic feature and a second semantic feature; the aforementioned second test unit may include:
[0111] The second syntax classification subunit is used to input the second syntax features into the syntax classifier, perform syntax classification task processing, and obtain the second syntax classification result;
[0112] The second semantic classification subunit is used to input the second semantic feature into the semantic classifier, perform semantic classification task processing, and obtain the second semantic classification result.
[0113] The second test subunit is used to take the second syntactic classification result and the second semantic classification result as the second test result.
[0114] In one possible implementation, the device may further include the following modules for training a general feature classifier:
[0115] The continuous learning language model acquisition module is used to acquire the continuous learning language model corresponding to the target classification task when the initial pre-trained language model continuously learns the target classification task and completes the learning.
[0116] The feature extraction module is used to input the sample probe task data into the continuously learning language model, perform general text feature extraction processing, and obtain the general features of the sample.
[0117] A general feature classification module is used to perform general feature classification processing on the general features of the sample based on the initial classifier, and obtain the general feature classification result of the sample.
[0118] The loss information determination module is used to determine loss information based on the general feature classification result and the general feature classification label corresponding to the sample probe task data;
[0119] The parameter adjustment module is used to adjust the parameters of the initial classifier using the loss information until the training iteration conditions are met, thereby obtaining the general feature classifier.
[0120] In one possible implementation, the classification task testing module 901 described above may include:
[0121] The first general evaluation result determination unit is used to determine the first general evaluation result based on the difference between the first classification accuracy and the second classification accuracy.
[0122] The second general evaluation result determination unit is used to determine a second general evaluation result based on the difference between the first test result and the second test result;
[0123] The target general evaluation result acquisition unit is used to statistically analyze the first general evaluation result and the second general evaluation result corresponding to each of the multiple classification tasks to obtain the target general evaluation result.
[0124] In one possible implementation, the aforementioned second general evaluation result determination unit may include:
[0125] The general difference information determination subunit is used to take the difference between the first test result and the second test result as general difference information;
[0126] The second general evaluation result determination subunit is used to determine the ratio of the general difference information to the second test result as the second general evaluation result.
[0127] Regarding the apparatus in the above embodiments, the specific manner in which each module and unit performs its operations has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0128] Figure 10 This diagram illustrates a block diagram of an electronic device for asynchronous data snapshots according to an embodiment of this application. The electronic device may be a server, and its internal structure diagram may be as follows: Figure 10As shown, the electronic device includes a processor 1001, a memory, and a network interface 1004 connected via a system bus 1002. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory 1003. The non-volatile storage medium stores an operating system 1005 and a computer program 1006. The internal memory 1003 provides an environment for the operation of the operating system 1005 and the computer program 1006 stored in the non-volatile storage medium. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements a method for asynchronous data snapshotting.
[0129] Those skilled in the art will understand that Figure 10 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the electronic device to which the present application is applied. The specific electronic device may include more or fewer components than shown in the figure, or combine certain components, or have different component arrangements.
[0130] In an exemplary embodiment, an electronic device is also provided, including: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement a general evaluation method for a continuous learning model as described in the embodiments of this application.
[0131] In an exemplary embodiment, a storage medium is also provided, wherein when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to perform the generality evaluation method of the continuous learning model in the embodiments of this application.
[0132] In an exemplary embodiment, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to execute the generality evaluation method of the continuous learning model in the embodiments of this application.
[0133] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. This computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and RAMbus dynamic RAM (RDRAM), etc.
[0134] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.
[0135] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
Claims
1. A general evaluation method for continuous learning models, characterized in that, The method includes: The continuously learned language model and the single-task language model are tested on the target classification task set corresponding to the target classification task, respectively, to obtain the first classification accuracy of the continuously learned language model and the second classification accuracy of the single-task language model. The continuously learned language model is the language model after the initial pre-trained language model has continuously learned the target classification task. The single-task language model is the language model after the initial pre-trained language model has learned the target classification task alone. The target classification task is any one of multiple classification tasks used for continuous learning. The general text representations of the continuously learned language model and the initial pre-trained language model are tested using a probe task set to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model. Based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result, a target general evaluation result is determined; the target general evaluation result is used to indicate the correlation between the general representation ability of the initial pre-trained language model after continuous learning of the multiple classification tasks and the general representation ability of the non-continuous learning model, wherein the non-continuous learning model includes the initial pre-trained language model and the single-task language model.
2. The method according to claim 1, characterized in that, The step of testing the general text representations of the continuously learned language model and the initial pre-trained language model using a probe task set to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model includes: The general test text data in the probe task set is input into the continuous learning language model and the initial pre-trained language model respectively, and the general text feature extraction process is performed to obtain the first general text feature corresponding to the continuous learning language model and the second general text feature corresponding to the initial pre-trained language model. The first text general features are input into a general feature classifier for general feature classification processing to obtain the first test result; the general feature classifier is obtained by training an initial classifier based on sample probe task data and corresponding general feature classification labels while keeping the parameters of the continuous learning language model fixed. The second text general features are input into the general feature classifier for general feature classification processing to obtain the second test result.
3. The method according to claim 2, characterized in that, The probe task set includes a syntax task set and a semantic task set; the general test text data includes the syntax test text data in the syntax task set and the semantic test text data in the semantic task set; correspondingly, the first text general feature includes a first syntax feature and a first semantic feature; the general feature classifier includes a syntax classifier and a semantic classifier. The step of inputting the first text general features into a general feature classifier for general feature classification processing to obtain the first test result includes: The first syntactic feature is input into the syntactic classifier to perform syntactic classification task processing, and the first syntactic classification result is obtained. The first semantic feature is input into the semantic classifier to perform semantic classification task processing, and the first semantic classification result is obtained. The first syntactic classification result and the first semantic classification result are used as the first test result.
4. The method according to claim 3, characterized in that, The second general text feature includes a second syntactic feature and a second semantic feature; the step of inputting the second general text feature into a general feature classifier for general feature classification processing to obtain the second test result includes: The second syntactic feature is input into the syntactic classifier to perform syntactic classification task processing, and the second syntactic classification result is obtained. The second semantic feature is input into the semantic classifier to perform semantic classification task processing, and the second semantic classification result is obtained. The second syntactic classification result and the second semantic classification result are used as the second test result.
5. The method according to claim 2, characterized in that, The general feature classifier is obtained through the following steps: If the initial pre-trained language model continuously learns the target classification task and completes the learning, the continuously learned language model corresponding to the target classification task is obtained. The sample probe task data is input into the continuously learned language model to perform text general feature extraction processing to obtain sample general features; Based on the initial classifier, the general features of the samples are classified to obtain the general feature classification result of the samples. Based on the general feature classification results and the general feature classification labels corresponding to the sample probe task data, the loss information is determined; The parameters of the initial classifier are adjusted using the loss information until the training iteration conditions are met, thus obtaining the general feature classifier.
6. The method according to claim 1, characterized in that, The step of determining the target general evaluation result based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result, includes: A first general evaluation result is determined based on the difference between the first classification accuracy and the second classification accuracy; A second general evaluation result is determined based on the difference between the first test result and the second test result; The target general evaluation result is obtained by statistically analyzing the first general evaluation result and the second general evaluation result corresponding to each of the multiple classification tasks.
7. The method according to claim 6, characterized in that, The step of determining the second general evaluation result based on the difference between the first test result and the second test result includes: The difference between the first test result and the second test result is used as general difference information; The ratio of the general difference information to the second test result is determined as the second general evaluation result.
8. A universal evaluation device for a continuous learning model, characterized in that, include: The classification task testing module is used to perform classification task testing on the continuously learned language model and the single-task language model respectively using the target test task set corresponding to the target classification task, to obtain the first classification accuracy corresponding to the continuously learned language model and the second classification accuracy corresponding to the single-task language model; the continuously learned language model is the language model after the initial pre-trained language model has continuously learned the target classification task; the single-task language model is the language model after the initial pre-trained language model has learned the target classification task alone; the target classification task is any one of multiple classification tasks used for continuous learning; The general representation testing module is used to test the general text representations of the continuously learned language model and the initial pre-trained language model using a probe task set, respectively, to obtain a first test result corresponding to the continuously learned language model and a second test result corresponding to the initial pre-trained language model. A general evaluation module is used to determine a target general evaluation result based on the difference between the first classification accuracy and the second classification accuracy, and the difference between the first test result and the second test result. The target general evaluation result is used to indicate the correlation between the general representation ability of the initial pre-trained language model after continuously learning the multiple classification tasks and the general representation ability of the non-continuous learning model. The non-continuous learning model includes the initial pre-trained language model and the single-task language model.
9. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to execute the executable instructions to implement the method according to any one of claims 1 to 7.
10. A non-volatile computer-readable storage medium storing computer program instructions thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1 to 7.