A task processing method, a model training method, and related apparatus
By generating common prompt vectors for multiple tasks, the problem of limited user input length caused by excessively long prompt words is solved, thus improving task processing efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SANGFOR TECH INC
- Filing Date
- 2024-12-31
- Publication Date
- 2026-06-30
AI Technical Summary
In existing language models, the length of prompt words limits the length of user input data, which affects task processing efficiency.
By leveraging the commonalities of multiple tasks to generate task prompt vectors, which are then fused and input into the task analysis model to obtain execution results, the limitation on user input length caused by excessively long prompts is avoided.
It improves task processing efficiency, brings convenience to users, and avoids the limitation on input length caused by excessively long prompts.
Smart Images

Figure CN122309692A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of multi-task learning, and more particularly to a task processing method, a model training method, and related apparatus. Background Technology
[0002] To meet diverse needs, prompts need to be designed to adapt to various tasks. Prompt design refers to adding instructions and related content before the user's input, and then feeding this combination into a language model to obtain the desired output. Typically, the model learns from various manually designed examples using contextual prompts to arrive at the task output.
[0003] However, in existing solutions, the prompts generated by context learning are quite long. Since the maximum input data length is often fixed, the length of the prompts will limit the length of the data that the user can input, thus affecting efficiency and causing inconvenience to the user. Summary of the Invention
[0004] This application provides a task processing method, a model training method, and related apparatus to improve task processing efficiency.
[0005] The first aspect of this application provides a task processing method, including:
[0006] Based on the task information of the target task, the task prompt vector corresponding to the target task is determined from the pre-trained task analysis model, wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0007] The fused prompt vector is determined based on the target task corresponding text and the task prompt vector, and then the fused prompt vector is fed into the task analysis model to obtain the execution result of the target task.
[0008] Optionally, the task analysis model includes multiple task modules and a language module; each task module has its own task prompt vector; determining the task prompt vector corresponding to the target task from the pre-trained task analysis model based on the task information of the target task includes:
[0009] Based on the task information of the target task, a target task module is determined from the plurality of task modules, and a task prompt vector corresponding to the target task is determined based on the target task module;
[0010] Accordingly, the step of feeding the fused prompt vector into the task analysis model to obtain the execution result of the target task includes:
[0011] The fused prompt vector is input into the language module to obtain the execution result of the target task.
[0012] Optionally, determining the fused prompt vector based on the target task corresponding text and the task prompt vector includes:
[0013] The text corresponding to the target task is transformed to obtain the corresponding target question vector;
[0014] The target question vector and the task prompt vector are fused to obtain the fused prompt vector corresponding to the target task.
[0015] Optionally, determining the fused prompt vector based on the target task corresponding text and the task prompt vector includes:
[0016] The target task corresponding text and the task prompt vector are concatenated or added together to obtain the fused prompt vector.
[0017] The second aspect of this application provides a model training method, including:
[0018] The text of the sample task and the task prompt vector are fused to determine the fused prompt vector; wherein, the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0019] The task analysis model is trained by using multiple fused prompt vectors to obtain a well-trained task analysis model.
[0020] Optionally, each sample task has its own corresponding initialization prompt vector, and multiple sample tasks also have a shared initialization prompt vector. Before fusing the text of the sample task and the task prompt vector to determine the fused prompt vector, the method further includes:
[0021] The result of adding or concatenating the initialization prompt vector of each sample task with a preset vector is determined as the task prompt vector of each sample task. The preset vector is the vector obtained by operating on the initialization prompt vector and the initialization shared prompt vector.
[0022] A third aspect of this application provides a task processing apparatus, including:
[0023] The determining unit is used to determine the task prompt vector corresponding to the target task from a pre-trained task analysis model based on the task information of the target task, wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0024] The processing unit is configured to determine the fused prompt vector based on the target task corresponding text and the task prompt vector, and to send the fused prompt vector into the task analysis model to obtain the execution result of the target task.
[0025] A fourth aspect of this application provides a model training apparatus, comprising:
[0026] The fusion unit is used to fuse the text of the sample task and the task prompt vector to determine the fused prompt vector; wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0027] The training unit is used to train the task analysis model based on multiple fused prompt vectors to obtain a trained task analysis model.
[0028] A fifth aspect of this application provides a task processing apparatus, comprising:
[0029] Central processing unit, memory, and input / output interfaces;
[0030] The memory is either a short-term storage memory or a persistent storage memory;
[0031] The central processing unit is configured to communicate with the memory and execute instructions in the memory to perform the aforementioned method.
[0032] A sixth aspect of this application provides a model training apparatus, comprising:
[0033] Central processing unit, memory, and input / output interfaces;
[0034] The memory is either a short-term storage memory or a persistent storage memory;
[0035] The central processing unit is configured to communicate with the memory and execute instructions in the memory to perform the aforementioned method.
[0036] A seventh aspect of this application provides a computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the aforementioned method.
[0037] An eighth aspect of this application provides a computer program product containing instructions that, when run on a computer, cause the computer to perform the aforementioned method.
[0038] As can be seen from the above technical solutions, the embodiments of this application have the following advantages:
[0039] First, based on the task information of the target task, the task prompt vector corresponding to the target task is determined from the pre-trained task analysis model. Then, based on the corresponding text of the target task and the task prompt vector, a fused prompt vector is determined. This fused prompt vector is then fed into the task analysis model to obtain the execution result of the target task. Since the task prompt vector is generated by utilizing the commonalities of multiple tasks, the fused prompt vector also involves the commonalities of the tasks. Vectors utilizing commonalities are shorter, which can, to some extent, avoid the problem of limiting the length of user input and bring convenience to the user. Attached Figure Description
[0040] Figure 1 This is a schematic diagram of an embodiment of a task processing method disclosed in this application;
[0041] Figure 2 This is a schematic diagram of another embodiment of a task processing method disclosed in this application;
[0042] Figure 3 This is a schematic diagram of an embodiment of a model training method disclosed in this application;
[0043] Figure 4 This is a schematic diagram of another embodiment of a model training method disclosed in this application;
[0044] Figure 5 This is a schematic diagram illustrating one implementation of the task prompt vector disclosed in this application;
[0045] Figure 6 This is a schematic diagram of one embodiment of a task processing device disclosed in this application;
[0046] Figure 7 This is a schematic diagram of another embodiment of a task processing device disclosed in this application;
[0047] Figure 8 This is a schematic diagram of an embodiment of a model training device disclosed in this application;
[0048] Figure 9 This is a schematic diagram of another embodiment of a model training device disclosed in this application. Detailed Implementation
[0049] The present application will be further described in detail below with reference to the accompanying drawings.
[0050] This application provides a task processing method, a model training method, and related apparatus to improve task processing efficiency.
[0051] To adapt to various task types, instructions and examples are added before the text input to obtain the desired output. Existing context-based word learning utilizes manually provided instructions and examples to obtain the corresponding task output. However, in existing schemes, the maximum input data length of the language model is fixed, while the prompts for context-based word learning are relatively long. This limits the length of user input, affecting task processing efficiency and causing inconvenience to users. To address these issues, this application provides a task processing method and apparatus based on multi-task learning. It utilizes the commonalities of multiple tasks to optimize the prompt prefix, resulting in shorter task prompt vectors. This can, to some extent, avoid limiting the length of user input, providing convenience for users. The goal of multi-task learning is to leverage the relevant information contained in multiple tasks to help obtain more accurate results for each task.
[0052] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0053] The terms "first," "second," "third," "fourth," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0054] The following describes one task processing method according to this application. Please refer to [link / reference]. Figure 1 One embodiment of a task processing method according to this application includes:
[0055] 101. Determine the task prompt vector corresponding to the target task from the pre-trained task analysis model based on the task information of the target task;
[0056] Based on the task information of the target task, the corresponding task prompt vector is determined from a pre-trained task analysis model. This task prompt vector is generated using the commonalities of multiple tasks. The target task is the task input by the user, representing the user's needs. Task information mainly includes the task type, which is the specific task type of the target task. Task types include classification, parameter extraction, and pronoun resolution, etc., but are not specifically limited here. Specifically, the user can send relevant needs to the task processing device via a mobile device through the network. The task processing device receives the task information of the target task (specifically, the task type), or the user can send relevant needs to a server, which then sends the target task information to the task processing device. This can be set according to actual needs, but is not specifically limited here. The task analysis type has task prompt vectors for various task types and will find the corresponding task prompt vector based on the task information of the target task.
[0057] 102. Determine the fused prompt vector based on the target task's corresponding text and the task prompt vector, and then feed the fused prompt vector into the task analysis model to obtain the target task's execution result.
[0058] The fused prompt vector is determined based on the target task's corresponding text and the task prompt vector. This fused prompt vector is then fed into the task analysis model to obtain the execution result of the target task. The target task's corresponding text is the text input by the user describing the specific content of the target task. Specifically, the target task's corresponding text and the task prompt vector are processed according to a preset algorithm to obtain the fused prompt vector. The preset algorithm can be set according to actual needs and is not limited here. The task analysis model has corresponding neural units for each task type. Different task types activate different neural units, meaning that different task types have different computational paths. For example, the neural units involved in classification tasks and parameter extraction tasks are not entirely the same, and the output results will naturally be different.
[0059] In this embodiment, the task prompt vector corresponding to the target task is first determined from a pre-trained task analysis model based on the task information of the target task. Then, the fused prompt vector is determined based on the corresponding text of the target task and the task prompt vector. This fused prompt vector is then fed into the task analysis model to obtain the execution result of the target task. Since the task prompt vector is a vector generated using the commonalities of multiple tasks, the fused prompt vector also involves the commonalities of multiple tasks. Vectors utilizing commonalities are shorter, which can, to some extent, avoid the problem of limiting the length of user input and bring convenience to the user.
[0060] The task processing method of this application is described in detail below. Please refer to [link / reference]. Figure 2 Another embodiment of the task processing method of this application involves a task analysis model that includes multiple task modules and a language module, and each task module has its own task prompt vector. The method includes:
[0061] 201. Based on the task information of the target task, determine the target task module from multiple task modules, and determine the task prompt vector corresponding to the target task based on the target task module;
[0062] Based on the task information of the target task, the target task module is determined from multiple task modules, and the corresponding task prompt vector is determined based on the target task module. The task prompt vector is a vector generated using the commonalities of multiple tasks. Task information mainly includes task type, which is the specific task type of the target task. Task types include classification, parameter extraction, and pronoun resolution, etc., which are not specifically limited here. Each task module of the task analysis model has its own task prompt vector, which is a vector generated using the commonalities of multiple tasks, i.e., a vector related to the prompt word prefix for a certain task type, assuming each task can be completed as required. Each task module has its own output, which serves as the input to the language module. Different inputs to the language module can activate the corresponding neural units in the language module to output their respective results. Specifically, one task module corresponds to one task type; that is, one task module has one and only one task type. There is a one-to-one correspondence between task modules and task types. For example, assuming the task analysis model has only two task modules, a classification task module and a parameter extraction task module, then the classification task module has its own classification prompt vector, and the parameter extraction task module has its own parameter extraction prompt vector. By traversing and searching based on the task type of the target task, the target task module corresponding to the task type of the target task can be found. First, the task type of each task module is determined. Generally, task types have corresponding identifiers. Then, the specific type can be determined based on the table storing the identifiers. Next, the task types are traversed to find the task type that is the same as the target task type. The corresponding task module is the target task module corresponding to the target task.
[0063] 202. Determine the fused prompt vector based on the target task's corresponding text and the task prompt vector;
[0064] The fused prompt vector is determined based on the target task's corresponding text and the task prompt vector. The target task's corresponding text refers to the text input by the user describing the specific content of the target task. In one implementation, the target task's corresponding text can first be converted to obtain a corresponding target question vector. Then, the target question vector and the task prompt vector are fused to obtain the fused prompt vector corresponding to the target task. Specifically, the task analysis model has a preset conversion algorithm that can convert text into vectors, such as the BGE algorithm, Word2Vec, or BERT. The target task's corresponding text can be converted into a corresponding target question vector. The target question vector and the task prompt vector are then fused. Specifically, the target question vector can be concatenated or added to the task prompt vector of the target task module to obtain the corresponding fused prompt vector for the target task. It is understood that the operations between the target question vector and the task prompt vector can be set according to actual needs to obtain the required fused prompt vector; specific details are not limited here.
[0065] In another implementation, the target task's corresponding text and the task prompt vector can be concatenated or added together to obtain a fused prompt vector. Specifically, the target task's corresponding text and the task prompt vector can be directly concatenated or added together to obtain the fused prompt vector. The specific calculation method can also be set according to actual needs and is not limited here.
[0066] 203. Input the fused prompt vector into the language module to obtain the execution result of the target task.
[0067] The fused prompt vectors are input into the language module to obtain the execution result of the target task. The pre-trained language module is used to activate corresponding neural units based on the input of different task types, so that different task types have different computational paths. For example, the neural units involved in classification tasks and parameter extraction tasks are not exactly the same, and the output results will naturally be different.
[0068] In this embodiment, the target task module is first determined from multiple task modules based on the task information of the target task. Then, the corresponding task prompt vector is determined based on the target task module. Next, the fused prompt vector is determined based on the corresponding text of the target task and the task prompt vector. Finally, the fused prompt vector is input to the language module to obtain the execution result of the target task. Since the task prompt vector is generated using the commonalities of multiple tasks, the fused prompt vector also involves these commonalities. Vectors utilizing commonalities are shorter, which can, to some extent, avoid the problem of limiting user input length and provide convenience for users.
[0069] Before applying the task analysis model, it needs to be trained. The model training method of this application is described below. Please refer to [link / reference]. Figure 3 Another embodiment of the model training method of this application includes:
[0070] 301. Merge the text of the sample task and the task prompt vector to determine the fused prompt vector;
[0071] The text and task prompt vectors of the sample tasks are fused to determine the fused prompt vector, which is generated by utilizing the commonalities of multiple tasks. Specifically, an initialized task analysis model and multiple sample tasks are first obtained. Then, based on the task information of each sample task, the fused task prompt vector for each sample task is obtained. Finally, the text and task prompt vector of each sample task are fused to obtain the fused prompt vector.
[0072] 302. Train the task analysis model based on multiple fused prompt vectors to obtain a trained task analysis model.
[0073] Specifically, the fused prompt vectors can be processed first to obtain the corresponding predicted task processing results. Then, based on a preset loss function, the loss value between the actual task processing result label and the predicted task processing result for each task can be calculated. Next, the task analysis model can be updated using the loss function and a preset gradient function, combined with the commonalities of multiple tasks, until the loss value obtained from the fused prompt vectors meets the preset loss thresholds for each task. Finally, if the loss value of each task meets the preset loss thresholds for each task, the task analysis model training is considered complete, and the task prompt vector update is complete.
[0074] When processing the fused prompt vector to obtain the corresponding prediction task processing result, specifically for each sample task, the fused prompt vector can be processed to obtain the corresponding prediction task processing result.
[0075] When calculating the loss value between the real task processing result label and the predicted task processing result for each task according to the preset loss function, as an optional embodiment, the simplest loss function can be the difference between the real model training result label and the predicted model training result as the loss value, or other forms, which are not limited here.
[0076] The task analysis model is updated by using the loss function and the preset gradient function, combined with the commonalities of multiple tasks, until the loss value obtained from the fused prompt vector meets the preset loss threshold for each task. Specifically, it can be determined whether the loss value of each sample task is less than its corresponding preset loss threshold. If they are all less, it means that the training result of the prediction model of each sample task is relatively accurate and no further update is needed. If they are not all less (i.e., at least one sample task has a loss value greater than or equal to its corresponding loss threshold), it means that the training result of the prediction model of some sample tasks is inaccurate. In order to make the training result of the prediction model of all sample tasks accurate, the next round of update is required.
[0077] If the loss value for each task meets its respective preset loss threshold, then the task analysis model is considered complete, and the task prompt vector update is complete. Specifically, if all loss values are less than their respective loss thresholds, then updating the task prompt vector is unnecessary. Otherwise, if the condition is not met, the task prompt vector needs to be updated, and the test performed again.
[0078] In this embodiment, the text of the sample task and the task prompt vector are first fused to determine the fused prompt vector. Then, the task analysis model is trained based on multiple fused prompt vectors to obtain a trained task analysis model. The task prompt vector is updated by using the commonalities of multiple tasks until multiple tasks meet the requirements, at which point the update of the task prompt vector stops, and the update is complete. This allows the task prompt vector to be updated using the commonalities of multiple types of tasks. The final task prompt vector is obtained by using the commonalities of multiple tasks, so in application, the prompt words are shorter, thereby improving task efficiency.
[0079] The model training method of this application is described in detail below. Please refer to [link / reference]. Figure 4 In another embodiment of the model training method of this application, the task analysis model involved includes at least multiple initialization task modules and a trained language module, wherein the outputs of the multiple task modules are the inputs of the language module, including:
[0080] 401. The result of adding or concatenating the initialization prompt vector of each sample task with the preset vector is determined as the task prompt vector of each sample task;
[0081] The task prompt vector for each sample task is determined by adding or concatenating the initial prompt vector with a preset vector. The preset vector is the vector obtained by operating on the initial prompt vector and the initial shared prompt vector. Specifically, first, an initialized task analysis model is obtained. Then, multiple sample tasks of various types corresponding to multiple initialization task modules are obtained. Finally, based on the task type of each sample task, the initial prompt vector and the initial shared prompt vector for each sample task are input into the corresponding initialization task module for fusion to obtain the fused prompt vector for each sample task.
[0082] When obtaining the initial task analysis model, the initial task analysis model includes at least several initial task modules to be trained and a trained language module. The parameters of the language module have already been trained and do not need to be trained again. That is, during the training of the task analysis model, the parameters of the language module do not need to be trained; only the parameters of the task modules need to be trained. Specifically, the task analysis model can be obtained from the server or obtained through user input; the specific method is not limited here.
[0083] When acquiring multiple sample tasks of various types corresponding to multiple initialization task modules, each sample task has its own initialization prompt vector, task type, task text, and real model training result label. Furthermore, these multiple sample tasks also share an initialization prompt vector. Specifically, each initialization task module initializes a prompt vector for one task type. Each sample task has its own corresponding initialization prompt vector; that is, each sample task has a task module of the same task type, and the initialization prompt vector of that task module corresponds to that sample task. Task types include classification, parameter extraction, and pronoun resolution, etc., which are not specifically limited here. The task text is the input text describing the specific content of the task. The real model training result label is a relatively accurate result calculated manually, used for subsequent comparison and updating. Each task module also has an initialization shared prompt vector, which is identical across task modules and can be updated later. The initialization vector can be generated using algorithms such as Xavier or Kaiming, etc., which are not specifically limited here.
[0084] When the initial prompt vector of each sample task and the initial shared prompt vector are input into the corresponding initialization task module according to the task type of each task, and then fused to obtain the fused prompt vector of each sample task, each sample task has its own task type, and the initialization task module also has its own task type. The initial prompt vector of the task module corresponding to each sample task and the initial shared prompt vector are fused to obtain the fused prompt vector.
[0085] Specifically, based on the task type of each task, the initial prompt vector and the initial shared prompt vector for each sample task are input to the corresponding initialization task module. The result of adding or concatenating the initial prompt vector with a preset vector is determined as the fused prompt vector for each sample task. The preset vector is the vector obtained by multiplying the initial prompt vector with the initial shared prompt vector. Simply put, in each task module, the initial prompt vector and the initial shared prompt vector are first multiplied to obtain the preset vector, and then the preset vector is added or concatenated with the initial prompt vector to obtain the fused prompt vector. An example of addition is given below. Please refer to [link to relevant documentation]. Figure 5 Assuming there are only two task modules, with task types of classification and parameter extraction respectively, the two task modules share an initial shared prompt vector. The initial classification prompt vector is multiplied by the initial shared prompt vector, and then the initial classification prompt vector is added to obtain the fused classification prompt vector. On the other hand, the initial parameter extraction prompt vector is multiplied by the initial shared prompt vector, and then the initial parameter extraction prompt vector is added to obtain the fused parameter extraction prompt vector.
[0086] 402. Merge the text of the sample task and the task prompt vector to determine the merged prompt vector;
[0087] Specifically, for each sample task, the task prompt vector obtained by fusing the initial task prompt vector and the initial shared prompt vector is fused with the corresponding text to obtain the fused prompt vector. Specifically, the task text for each sample task is first converted to obtain the corresponding training question vector; the specific algorithm for text-to-vector conversion is not limited here. Then, for each sample task, the fused prompt vector is added to or concatenated with the corresponding training question vector before being input into the language module. Alternatively, the task text can be directly concatenated or added to the fused prompt vector to obtain the fused prompt vector.
[0088] 403. Train the task analysis model based on multiple fused prompt vectors to obtain a trained task analysis model.
[0089] Specifically, the fused prompt vector is first input into the language module to obtain the predicted task processing result output by each language module. Then, according to the preset loss function, the loss value between the actual task processing result label and the predicted task processing result for each task is calculated. Next, the initial shared prompt vector and the initial prompt vectors of multiple initial task modules are updated using the loss function and the preset gradient function until the loss value obtained by each initial task module when inputting the task text of each task into the language module according to the fused prompt vector meets its respective preset loss threshold. Finally, the final task prompt vector of each task module is determined.
[0090] When the fused prompt vectors are input into the language module to obtain the prediction model training results output by each language module, the fused prompt vectors are directly input into the language module. The language module calculates the prediction model training results corresponding to the input vectors based on the preset parameters. The task type of each prompt vector is different, and the part of the language module activated is also different.
[0091] When calculating the loss value between the true model training result label and the predicted model training result for each task according to the preset loss function, the simplest loss function can be the difference between the true model training result label and the predicted model training result, or other forms can be used; no specific limitation is made here. The loss value represents the difference between the true model training result label and the predicted model training result. The larger the loss value, the less accurate the predicted model training result; the smaller the loss value, the more accurate the predicted model training result.
[0092] The process involves updating the shared initialization prompt vector and the initialization prompt vectors of multiple initialization task modules using a loss function and a preset gradient function. This continues until the loss value obtained by each initialization task module when inputting the task text of each task into the language module based on the fused prompt vector satisfies its respective preset loss threshold. Specifically, it first checks whether the loss value of each sample task is less than its corresponding preset loss threshold. If all are less, it indicates that the prediction model training result of each sample task is relatively accurate, and no further updates are needed. If not all are less (i.e., at least one sample task's loss value is greater than or equal to its corresponding loss threshold), it indicates that the prediction model training result of some sample tasks is inaccurate. In order to ensure that the prediction model training results of all sample tasks are accurate, a next round of updates is required. The update process is as follows: each task module corresponds to a loss function. First, the gradient of the shared prompt vector is calculated using each loss function. Then, combined with preset weights, all gradients are weighted and summed to obtain the current total gradient. Based on the gradient function, using the preset step size, the total gradient, and the current shared prompt vector as independent variables, the updated shared prompt vector is calculated as the dependent variable. Combining the current initial prompt vector and the updated shared prompt vector, the updated task prompt vector is calculated. The updated task prompt vector is then fused with the training question vector and input into the language module. This process is repeated until the loss value for all sample tasks is less than their respective loss thresholds, thus determining the final task prompt vector.
[0093] In determining the final task prompt vector for each task module, in one implementation, if the loss value obtained by inputting the task text of each task into the language module meets the preset corresponding loss threshold, then the updated prompt vector for each task module is determined as the final task prompt vector for that task module. Specifically, if the loss value of each sample task is less than its corresponding loss threshold, it indicates that the training has reached the standard, and the previously updated prompt vector is determined as the final task prompt vector.
[0094] In another implementation, if the loss value obtained when inputting the task text of each task into the language module meets the preset corresponding loss threshold, then the prompt vector before the last update for each task module is determined as the final task prompt vector for that task module. If the loss value of each sample task is less than its corresponding loss threshold, it indicates that the training has reached the standard, and the prompt vector updated to the penultimate level is determined as the final task prompt vector.
[0095] This embodiment uses a previous implementation as an example. Assume the task analysis model has only two task modules (classification and parameter extraction), which are initialized separately. The classification task module has an initialized classification prompt vector C1, and the parameter extraction task module has an initialized parameter extraction prompt vector E1. Both task modules also have an initialized shared prompt vector P1. In one implementation, there are two sample tasks. The task text of the classification task is transformed to obtain a classification question vector Q1, and the task text of the parameter extraction task is transformed to obtain a parameter extraction question vector Q2. First, merge C1 and P1 (and E1 and P1) to obtain the merged prompt vectors C2 (and E2), such as C2 = C1 + C1 * P1, E2 = E1 + E1 * P1. Then, input C2 and E2 into the language module for processing to obtain two outputs, Y11 and Y21. Assuming the true result of the classification task is Z1 and the true result of the parameter extraction task is Z2, based on the classification loss function f1, obtain the corresponding classification loss value S11 based on Y11 and Z1, and based on the parameter extraction loss function f2, obtain the corresponding parameter extraction loss value S21 based on Y21 and Z2. Assuming the classification loss threshold is SP1 and the parameter extraction loss threshold is SP2, if S11 < SP1 and S21 < SP2, then P1 is not updated, and C2 is determined as the final prompt vector of the classification task module, and E2 is determined as the final prompt vector of the parameter extraction task module. If the conditions S11 < SP1 and S21 < SP2 are not met, then P1 is updated. In one implementation, the gradient β1 of f1 with respect to the shared prompt vector and the gradient β2 of f2 with respect to the shared prompt vector are first calculated. β1 and β2 are then weighted and summed based on preset weights to obtain β. Assuming the gradient function is P' = P + γ * β, where γ is a preset step size, then P2 = P1 + γ * β. Then, P2 is used to obtain new task prompt vectors, such as C3 = C2 + C2 * P2, E3 = E2 + E2 * P2, which are input to the language module, and so on.
[0096] In this embodiment, the shared prompt vector is updated based on the commonalities of multiple tasks. The prompt vectors of tasks that have not yet been updated are then updated based on the shared prompt vector until multiple tasks meet the requirements. At this point, the shared prompt vector stops updating, and the prompt vectors for each task module are also updated. This allows for updating the shared prompt vector using the commonalities of multiple task types, and then updating the task prompt vectors based on the shared prompt vector. Ultimately, the final task prompt vector in each task module is obtained using the commonalities of multiple tasks. Therefore, in application, the prompts are shorter, thereby improving task efficiency.
[0097] The task processing method and model training method in the embodiments of this application have been described above. The following describes a task processing apparatus in the embodiments of this application. Please refer to... Figure 6 One embodiment of a task processing device in this application includes:
[0098] The determining unit 601 is used to determine the task prompt vector corresponding to the target task from a pre-trained task analysis model based on the task information of the target task, wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0099] The processing unit 602 is used to determine the fused prompt vector based on the target task corresponding text and the task prompt vector, so as to send the fused prompt vector into the task analysis model to obtain the execution result of the target task.
[0100] In this embodiment, the determining unit 601 first determines the task prompt vector corresponding to the target task from a pre-trained task analysis model based on the task information of the target task. The processing unit 602 then determines the fused prompt vector based on the corresponding text of the target task and the task prompt vector, and sends the fused prompt vector into the task analysis model to obtain the execution result of the target task. Since the task prompt vector is a vector generated using the commonalities of multiple tasks, the fused prompt vector also involves the commonalities of the tasks. Vectors utilizing commonalities are shorter, which can, to some extent, avoid the problem of limiting the length of user input and bring convenience to the user.
[0101] The task processing apparatus of this application is described in detail below. Another embodiment of the task processing apparatus of this application includes:
[0102] The determining unit is used to determine the task prompt vector corresponding to the target task from a pre-trained task analysis model based on the task information of the target task, wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0103] The processing unit is configured to determine the fused prompt vector based on the target task corresponding text and the task prompt vector, and to send the fused prompt vector into the task analysis model to obtain the execution result of the target task.
[0104] The task analysis model includes multiple task modules and a language module; each task module has its own task prompt vector; the determining unit is specifically used for:
[0105] Based on the task information of the target task, a target task module is determined from the plurality of task modules, and a task prompt vector corresponding to the target task is determined based on the target task module;
[0106] Accordingly, the processing unit is specifically used for:
[0107] The fused prompt vector is input into the language module to obtain the execution result of the target task.
[0108] The processing unit is specifically used for:
[0109] The text corresponding to the target task is transformed to obtain the corresponding target question vector;
[0110] The target question vector and the task prompt vector are fused to obtain the fused prompt vector corresponding to the target task.
[0111] The processing unit is specifically used for:
[0112] The target task corresponding text and the task prompt vector are concatenated or added together to obtain the fused prompt vector.
[0113] The functions and processes performed by each unit in the task processing device of this embodiment are the same as those described above. Figures 1 to 2 The functions and processes performed by the task processing device are similar, and will not be described in detail here.
[0114] Figure 7 This is a schematic diagram of a task processing device structure provided in an embodiment of this application. The task processing device 700 may include one or more central processing units (CPUs) 701 and a memory 705, in which one or more applications or data are stored.
[0115] The memory 705 can be volatile or persistent storage. The program stored in the memory 705 can include one or more modules, each module including a series of instruction operations on the task processing device 700. Furthermore, the central processing unit 701 can be configured to communicate with the memory 705 and execute the series of instruction operations in the memory 705 on the task processing device 700.
[0116] The task processing device 700 may also include one or more power supplies 702, one or more wired or wireless network interfaces 703, one or more input / output interfaces 704, and / or one or more operating systems, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.
[0117] The central processing unit 701 can perform the aforementioned... Figures 1 to 2 The specific operations performed by the task processing device in the illustrated embodiment will not be described in detail here.
[0118] The task processing apparatus of this application has been described above; the model training apparatus of this application will be described below. Please refer to [link / reference]. Figure 8 One embodiment of the model training apparatus of this application includes:
[0119] The fusion unit 801 is used to fuse the text of the sample task and the task prompt vector to determine the fused prompt vector; wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks.
[0120] Training unit 802 is used to train the task analysis model based on multiple fused prompt vectors to obtain a trained task analysis model.
[0121] In this embodiment, the fusion unit 801 first fuses the text of the sample task and the task prompt vector to determine the fused prompt vector. The training unit 802 then trains the task analysis model based on multiple fused prompt vectors to obtain a trained task analysis model. The shared prompt vector is updated by utilizing the commonalities of multiple tasks, and the task prompt vectors that have not yet been updated are updated based on the shared prompt vector until multiple tasks meet the requirements, at which point the shared prompt vector stops updating, and the task prompt vector for each task module is also updated. This allows for updating the shared prompt vector using the commonalities of multiple task types, and then updating the task prompt vector based on the shared prompt vector. Ultimately, the final task prompt vector in each task module is obtained using the commonalities of multiple tasks, resulting in shorter prompts and improved task efficiency.
[0122] The model training apparatus of this application will now be described in detail. Another embodiment of the model training apparatus of this application includes:
[0123] The fusion unit is used to fuse the text of the sample task and the task prompt vector to determine the fused prompt vector; wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks;
[0124] The training unit is used to train the task analysis model based on multiple fused prompt vectors to obtain a trained task analysis model.
[0125] The model training device also includes a computing unit, specifically used for:
[0126] The result of adding or concatenating the initialization prompt vector of each sample task with a preset vector is determined as the task prompt vector of each sample task. The preset vector is the vector obtained by operating on the initialization prompt vector and the initialization shared prompt vector.
[0127] The functions and processes performed by each unit in the model training device of this embodiment are the same as those described above. Figures 3 to 4 The functions and processes performed by the model training device are similar, and will not be described in detail here.
[0128] Figure 9 This is a schematic diagram of a model training device provided in an embodiment of this application. The model training device 900 may include one or more central processing units (CPUs) 901 and a memory 909, in which one or more applications or data are stored.
[0129] The memory 909 can be volatile or persistent storage. The program stored in the memory 909 can include one or more modules, each module including a series of instruction operations on the model training device 900. Furthermore, the central processing unit 901 can be configured to communicate with the memory 909 and execute the series of instruction operations in the memory 909 on the model training device 900.
[0130] The model training device 900 may also include one or more power supplies 902, one or more wired or wireless network interfaces 903, one or more input / output interfaces 904, and / or one or more operating systems, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.
[0131] The central processing unit 901 can perform the aforementioned... Figures 3 to 4 The specific operations performed by the model training device in the illustrated embodiment will not be described here.
[0132] This application also provides a computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the methods described in the foregoing embodiments.
[0133] This application also provides a computer program product containing instructions that, when run on a computer, cause the computer to perform the methods described in the foregoing embodiments.
[0134] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0135] It should be noted that although the steps in the flowcharts of the various embodiments are drawn sequentially according to the arrows, unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the various embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages in other steps.
[0136] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection between apparatuses or units through some interfaces, and may be electrical, mechanical, or other forms.
[0137] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0138] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0139] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
Claims
1. A task processing method characterized by, include: Based on the task information of the target task, the task prompt vector corresponding to the target task is determined from the pre-trained task analysis model, wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks; The fused prompt vector is determined based on the target task corresponding text and the task prompt vector, and then the fused prompt vector is fed into the task analysis model to obtain the execution result of the target task.
2. The task processing method according to claim 1, characterized by, The task analysis model includes multiple task modules and a language module; each task module has its own task prompt vector; determining the task prompt vector corresponding to the target task from the pre-trained task analysis model based on the task information of the target task includes: Based on the task information of the target task, a target task module is determined from the plurality of task modules, and a task prompt vector corresponding to the target task is determined based on the target task module; Accordingly, the step of feeding the fused prompt vector into the task analysis model to obtain the execution result of the target task includes: The fused prompt vector is input into the language module to obtain the execution result of the target task.
3. The task processing method of claim 1, wherein, The step of determining the fused prompt vector based on the target task corresponding text and the task prompt vector includes: The text corresponding to the target task is transformed to obtain the corresponding target question vector; The target question vector and the task prompt vector are fused to obtain the fused prompt vector corresponding to the target task.
4. The task processing method of claim 1, wherein, The step of determining the fused prompt vector based on the target task corresponding text and the task prompt vector includes: The target task corresponding text and the task prompt vector are concatenated or added together to obtain the fused prompt vector.
5. A model training method, comprising: include: The text of the sample task and the task prompt vector are fused to determine the fused prompt vector; wherein, the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks; The task analysis model is trained by using multiple fused prompt vectors to obtain a well-trained task analysis model.
6. The task processing method according to claim 5, characterized by, Each sample task has its own corresponding initial prompt vector, and multiple sample tasks also share an initialized prompt vector. Before fusing the text of the sample tasks and the task prompt vector to determine the fused prompt vector, the method further includes: The result of adding or concatenating the initialization prompt vector of each sample task with a preset vector is determined as the task prompt vector of each sample task. The preset vector is the vector obtained by operating on the initialization prompt vector and the initialization shared prompt vector.
7. A task processing apparatus characterized by comprising: include: The determining unit is used to determine the task prompt vector corresponding to the target task from a pre-trained task analysis model based on the task information of the target task, wherein the task prompt vector is a vector generated by utilizing the commonalities of multiple tasks; The processing unit is configured to determine the fused prompt vector based on the target task corresponding text and the task prompt vector, and to send the fused prompt vector into the task analysis model to obtain the execution result of the target task.
8. A task processing device, characterized in that, include: Central processing unit, memory, and input / output interfaces; The memory is either a short-term storage memory or a persistent storage memory; The central processing unit is configured to communicate with the memory and execute instructions in the memory to perform the method according to any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that, Includes instructions that, when executed on a computer, cause the computer to perform the method as described in any one of claims 1 to 6.
10. A computer program product comprising instructions that, when run on a computer, cause the computer to perform the method as described in any one of claims 1 to 6.