A model training method, a task execution time prediction method and device

By training a prediction model using resource consumption data from similar historical tasks, the problems of high intrusiveness and unreasonable resource allocation in existing technologies are solved. This achieves efficient and accurate prediction of model training task execution time, improving the utilization and efficiency of training resources.

CN119623548BActive Publication Date: 2026-06-19ZHEJIANG LAB

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG LAB
Filing Date
2024-11-01
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for predicting the execution time of model training tasks are highly intrusive, affect training efficiency and security, and lack rational resource allocation, making it difficult to predict execution time efficiently and accurately.

Method used

By acquiring resource consumption data from historical model training tasks, similar historical tasks are selected from the database, and a prediction model is used for training. Resource consumption characteristic data is determined and execution time is predicted. Self-attention and cross-attention processing are used, combined with fully connected layers and linear regression for prediction.

Benefits of technology

It achieves safe, reliable, and efficient prediction model training task execution time, improves training resource utilization, avoids resource waste, and enhances overall training efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119623548B_ABST
    Figure CN119623548B_ABST
Patent Text Reader

Abstract

This specification discloses a model training method, a task execution time prediction method, and an apparatus, specifically including: filtering similar historical model training tasks based on historical resource usage data of historical model training tasks; inputting the historical resource usage data of historical model training tasks and similar historical model training tasks into a prediction model to determine the resource usage characteristic data corresponding to the historical model training tasks, thereby determining the predicted execution time corresponding to the historical model training tasks; training based on the predicted execution time and the execution time of historical tasks; and determining the predicted execution time of the target model training task based on the resource usage data of the target model training task. The method in this specification has higher prediction efficiency and greater accuracy. This effectively improves the utilization rate of training resources during subsequent resource allocation, avoiding resource waste and idleness, and greatly improving the overall training efficiency of the training process.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of model training, and in particular to a model training method, a task execution time prediction method and apparatus. Background Technology

[0002] With the continuous development and progress of science and technology, the importance of intelligent models in the world today is self-evident. Before a model is officially deployed, it needs to undergo an extremely rigorous training process to ensure its actual working ability. How to reasonably control the resource costs spent during the training process and the overall allocation of resources are even more important issues.

[0003] By accurately predicting and controlling the execution time of model training tasks, the overall efficiency of resource utilization can be greatly improved, reducing resource waste and downtime. Currently, most methods for predicting the execution time of model training tasks rely on task awareness. This approach often requires modifications to the model's code during training to obtain training data in real time. For example, adding interfaces to the code can collect training data such as the model's internal structure, number of parameters, and batch size. This highly intrusive data collection method can easily affect the model's training progress and performance, and may even lead to data leakage security issues due to the additional interface code. Furthermore, this method has limitations and may not be suitable for all types of model structures. In summary, predicting the execution time of training tasks through task awareness significantly impacts the overall training efficiency and security, extending the execution time of model training tasks and making the allocation and use of training resources less rational.

[0004] Therefore, how to predict the execution time of model training tasks efficiently without affecting the model training process, and thus rationally allocate training resources, is a very important issue. Summary of the Invention

[0005] This specification provides a model training method, a task execution time prediction method and apparatus, to partially solve the aforementioned problems existing in the prior art.

[0006] The following technical solution is adopted in this specification:

[0007] This manual provides a model training method, including:

[0008] Obtain historical resource consumption data and historical task execution time corresponding to historical model training tasks;

[0009] Based on the historical resource usage data, similar historical model training tasks corresponding to the historical model training tasks are selected from the preset database, and historical resource usage data corresponding to the similar historical model training tasks are obtained.

[0010] The historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task are input into the prediction model to be trained, so that the prediction model can determine the resource consumption feature data corresponding to the historical model training task based on the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task, and determine the prediction execution time corresponding to the historical model training task based on the resource consumption feature data.

[0011] Based on the predicted execution time and the historical task execution time, a loss value is determined for the prediction model, and the prediction model is trained based on the loss value, wherein the magnitude of the loss value is negatively correlated with the similarity between the predicted execution time and the historical task execution time.

[0012] Optionally, before filtering similar historical model training tasks corresponding to the historical model training tasks from a preset database based on the historical resource usage data, and obtaining the historical resource usage data corresponding to the similar historical model training tasks, the method further includes:

[0013] The historical resource usage data corresponding to the historical model training task is preprocessed to obtain the preprocessed historical resource usage data corresponding to the historical model training task. The data preprocessing includes at least outlier removal, data noise reduction, and missing data filling for the historical resource usage data corresponding to the historical model training task.

[0014] Based on the historical resource usage data, similar historical model training tasks corresponding to the historical model training tasks are selected from a preset database, and historical resource usage data corresponding to the similar historical model training tasks is obtained, specifically including:

[0015] Based on the preprocessed historical resource usage data corresponding to the historical model training task, similar historical model training tasks corresponding to the historical model training task are selected from the database, and historical resource usage data corresponding to the similar historical model training tasks are obtained.

[0016] Optionally, based on the historical resource usage data, similar historical model training tasks corresponding to the historical model training tasks are selected from a preset database, specifically including:

[0017] For each candidate historical model training task in the database, the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task is determined based on the historical resource usage data corresponding to the historical model training task.

[0018] When the similarity is greater than a preset similarity threshold, the candidate historical model training task is taken as the similar historical model training task corresponding to the historical model training task.

[0019] Optionally, the historical resource usage data package contains resource usage data for different resource types corresponding to the historical model training task, and the resource types include at least CPU utilization, GPU utilization, memory usage, and disk throughput.

[0020] For each candidate historical model training task in the database, the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task is determined based on the historical resource usage data corresponding to the historical model training task. Specifically, this includes:

[0021] For each type of resource usage data in the historical resource usage data corresponding to the candidate historical model training task, determine the similarity between the resource usage data of the candidate historical model training task and the resource usage data of the same type of resource usage data corresponding to the historical model training task.

[0022] The similarity of resource usage data of each resource type corresponding to the candidate historical model training task is weighted to determine the comprehensive similarity corresponding to the candidate historical model training task, and this comprehensive similarity is used as the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task.

[0023] Optionally, the historical resource usage data is time-series data composed of resource usage data of the historical model training task at various time points collected at preset time intervals;

[0024] The historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training tasks are input into the prediction model to be trained, so that the prediction model can determine the resource usage feature data corresponding to the historical model training task based on the historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training tasks, specifically including:

[0025] The historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training task are input into the prediction model to be trained. This allows the prediction model to determine the data similarity between the resource usage data corresponding to each time point in each historical resource usage data and the resource usage data corresponding to other time points in the historical resource usage data. Based on the data similarity, the attention weight corresponding to that time point is determined. Based on the attention weight corresponding to each time point, the historical resource usage data is subjected to self-attention weighted processing to obtain the self-attention feature sequence corresponding to the historical resource usage data.

[0026] Based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task, the resource consumption feature data corresponding to the historical model training task is determined.

[0027] Optionally, based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task, the resource consumption feature data corresponding to the historical model training task is determined, specifically including:

[0028] Based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task, for the self-attention feature data at each time point in the self-attention feature sequence corresponding to the historical model training task, the feature similarity between the self-attention feature data at that time point and the self-attention feature data at the time points in the self-attention feature sequence corresponding to the similar historical model training task that are arranged in the same order at that time point is determined, and the cross-attention weight corresponding to the self-attention feature data at that time point is determined based on the feature similarity.

[0029] Based on the cross-attention weights corresponding to the self-attention feature data at each time point in the self-attention feature sequence corresponding to the historical model training task, cross-attention weighting processing is performed on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task to perform feature fusion. The fused feature data of the historical model training task and the similar historical model training task is used as the resource consumption feature data corresponding to the historical model training task.

[0030] This specification provides a method for predicting task execution time, including:

[0031] Monitor and collect resource usage data corresponding to the prediction time after the target model training task starts execution;

[0032] Based on the resource usage data corresponding to the target model training task, similar historical model training tasks corresponding to the target model training task are selected from a preset database, and historical resource usage data corresponding to the similar historical model training tasks are obtained.

[0033] The resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training tasks are input into the prediction model, so that the prediction model determines the target feature data corresponding to the target model training task based on the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training tasks, and determines the execution time of the prediction task corresponding to the target model training task based on the target feature data. The prediction model is obtained through the model training method described above.

[0034] This specification provides a model training apparatus, including:

[0035] The acquisition module is used to acquire historical resource usage data and historical task execution time corresponding to historical model training tasks;

[0036] The filtering module is used to filter similar historical model training tasks corresponding to the historical model training tasks from a preset database based on the historical resource usage data, and to obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0037] The prediction module is used to input the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task into the prediction model to be trained, so that the prediction model can determine the resource consumption feature data corresponding to the historical model training task based on the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task, and determine the prediction execution time corresponding to the historical model training task based on the resource consumption feature data.

[0038] The training module is used to determine the loss value for the prediction model based on the predicted execution time and the historical task execution time, and to train the prediction model based on the loss value, wherein the magnitude of the loss value is negatively correlated with the similarity between the predicted execution time and the historical task execution time.

[0039] This specification provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described model training method and task execution time prediction method.

[0040] This specification provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the above-described model training method and task execution time prediction method.

[0041] The above-mentioned technical solutions adopted in this specification can achieve the following beneficial effects:

[0042] As can be seen from the above method, in the model training method and task execution time prediction method provided in this specification, similar historical model training tasks can be selected from the database based on the historical resource consumption data corresponding to the acquired historical model training tasks. Then, the historical resource consumption data corresponding to the historical model training tasks and similar historical model training tasks are input into the prediction model to be trained, enabling the prediction model to determine the resource consumption feature data corresponding to the historical model training tasks. Based on the resource consumption feature data, the predicted execution time corresponding to the historical model training tasks is then determined. The prediction model is trained based on the predicted execution time and the historical task execution time. The trained prediction model can then determine the predicted execution time of the target model training task based on the resource consumption data of the target model training task within a preset time.

[0043] As can be seen from the above, the model training method and task execution time prediction method provided in this specification can match similar historical model training tasks from the database based on the historical resource usage data corresponding to historical model training tasks. Then, the prediction model is trained based on the corresponding historical resource usage data. The trained prediction model can accurately predict the predicted execution time required for the target model training task from start to finish based on the resource usage data corresponding to the target model training task within a preset time. Compared with the task-aware methods in current technology for predicting training task execution time, the method proposed in this specification is safer, more reliable, more efficient, and provides more accurate prediction results. It provides an accurate basis for the subsequent reasonable allocation of training resources, thereby effectively improving the utilization rate of training resources, avoiding resource waste and idleness, and greatly improving the overall training efficiency, thus smoothly accelerating the execution of subsequent tasks or business based on the trained model. Attached Figure Description

[0044] The accompanying drawings, which are included to provide a further understanding of this specification and form part of this specification, illustrate exemplary embodiments and are used to explain this specification, but do not constitute an undue limitation thereof. In the drawings:

[0045] Figure 1 This is a flowchart illustrating a model training method provided in this specification;

[0046] Figure 2 This is a schematic diagram illustrating an example of the model structure corresponding to one of the prediction models provided in this specification.

[0047] Figure 3 This is a flowchart illustrating a task execution time prediction method provided in this specification;

[0048] Figure 4 This is a schematic diagram of a model training device provided in this specification.

[0049] Figure 5 This is a schematic diagram of a task execution time prediction device provided in this specification;

[0050] Figure 6 The one provided in this specification corresponds to Figure 1 and Figure 3 A schematic diagram of the structure of an electronic device. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of this specification clearer, the technical solutions of this specification will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this specification, and not all of them. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this specification.

[0052] The technical solutions provided in the various embodiments of this specification are described in detail below with reference to the accompanying drawings.

[0053] Figure 1 This is a flowchart illustrating a model training method provided in this specification, including the following steps:

[0054] S101: Obtain historical resource usage data and historical task execution time corresponding to historical model training tasks.

[0055] S102: Based on the historical resource usage data, filter out similar historical model training tasks corresponding to the historical model training tasks from the preset database, and obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0056] Currently, technologies related to intelligent AI models have increasingly become the mainstream development direction in various fields of modern science and technology, and the corresponding training process is of paramount importance in ensuring the final model's capabilities. By reasonably predicting the execution time of model training tasks, the rational allocation and use of training resources can be effectively controlled, thereby effectively improving model training efficiency and saving training costs.

[0057] Currently, most methods for predicting the execution time of model training tasks rely on task-aware technologies. This approach requires acquiring internal model data during the training task to predict execution time. This method is somewhat intrusive and may impact data security and training efficiency. Furthermore, the need to collect internal model data can limit or alter the overall model structure, leading to an inefficient allocation of training resources and time costs. Therefore, finding a way to efficiently, accurately, and securely predict the execution time of model training tasks is crucial.

[0058] Therefore, this specification provides a model training method and a task execution time prediction method. The execution subject of the methods provided can be a terminal device such as a desktop computer or laptop computer, or a server. In addition, the execution subject can also be software, such as a client installed on a terminal device. For ease of explanation and understanding, this specification will only use a terminal device as the execution subject to describe the provided model training method and task execution time prediction method.

[0059] Based on this, a terminal device applying the model training method and task execution time prediction method provided in this specification can determine the corresponding similar historical model training tasks from the database based on the historical resource usage data corresponding to the acquired historical model training tasks. Then, the terminal device can train and optimize the prediction model to be trained based on the historical resource usage data corresponding to the historical model training tasks and similar historical model training tasks, respectively, so as to obtain a prediction model capable of predicting the execution time of the model training task based on the resource usage data.

[0060] Through the above training process, the terminal device can use the trained prediction model to predict the total time spent on the target model training task from start to finish, based on the resource usage data within a preset time after the target model training task begins execution.

[0061] The actual application scenarios corresponding to the trained prediction models can be determined based on specific application needs. For example, in scenarios where training resources are allocated rationally, the terminal device can determine the prediction execution time for the target model training task based on the resource usage data within a preset time period corresponding to the target model training task and historical model training tasks in the database. Technical personnel responsible for resource allocation can then rationally adjust the resources used and training time of the target model training task based on the determined prediction execution time, and pre-plan the training process for subsequent model training tasks. This effectively improves the utilization of training resources, greatly ensuring their efficient use and avoiding resource idleness or waste.

[0062] For example, in the design of automated execution for model training tasks, the terminal device can use a trained prediction model to predict the execution time required for the successful completion of the target model training task, based on resource usage data within the preset duration of the target model training task and historical model training tasks recorded in the database. Then, based on the predicted execution time corresponding to each model training task, the actual training process of the model training task can be rationally allocated resources and divided into training time segments, ensuring the reasonable implementation of the overall automated model training process, thereby effectively improving the execution efficiency and resource utilization of the overall model training process.

[0063] In this specification, the terminal device can obtain historical resource usage data and historical task execution time corresponding to historical model training tasks from a preset database. Then, based on the historical resource usage data of the historical model training tasks, the terminal device can filter similar historical simulation training tasks corresponding to the historical model training tasks from the database, and the terminal device can also obtain historical resource usage data corresponding to similar historical simulation training tasks.

[0064] The historical resource usage data can include resource usage data for different resource types during the historical execution of the corresponding historical model training task. Resource types can specifically include, at a minimum, CPU utilization, GPU utilization, memory usage, and disk throughput. The specific data format for each resource type can be time-series data, with the time series length equal to the historical task execution duration. The data content can be resource usage data collected at different time points according to preset time intervals. The terminal device can use the time-series resource usage data for each resource type corresponding to the historical model training task as the historical resource usage data for that historical model training task.

[0065] Regarding how the terminal device specifically selects similar historical simulation training tasks from the database based on the historical resource usage data corresponding to the historical model training tasks, this specification states that for each candidate historical model training task recorded in the preset database, the terminal device can determine the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task.

[0066] Then, the terminal device can determine whether the candidate historical model training task is a similar historical model training task corresponding to the historical model training task based on the relationship between the similarity value determined above and the preset similarity threshold.

[0067] Specifically, when determining whether each candidate historical model training task in the database is a similar historical model training task, the terminal device can determine the similarity between the time series data corresponding to the resource occupancy data of the candidate historical model training task and the time series data corresponding to the resource occupancy data of the historical model training task for each type of resource occupancy data in the historical resource occupancy data corresponding to the candidate historical model training task.

[0068] Then, the terminal device can perform weighted summation of the similarity of resource usage data of each resource type corresponding to the candidate historical model training task, thereby determining the comprehensive similarity of the candidate historical model training task relative to the historical model training task, which is used as the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task.

[0069] The specific implementation method for determining the similarity of resource usage data of the same resource type between the candidate historical model training task and the historical model training task is not strictly limited in this specification. It can be implemented by using algorithms such as Dynamic Time Warping (DTW).

[0070] Specifically, the terminal device can calculate the DTW distance between time series data corresponding to resource occupancy data of the same resource type based on the dynamic time warping algorithm. The smaller the DTW distance, the more similar the two time series data are. By pre-setting corresponding threshold conditions for the DTW distance, the terminal device can determine the similarity between the resource occupancy data of the same resource type in the candidate historical model training task and the historical model training task through the dynamic time warping algorithm, and thus determine the similar historical model training tasks corresponding to the historical model training task. In addition, other methods for calculating time series similarity can also be used to determine similar historical model training tasks, such as the Fast Dynamic Time Warping (FastDTW) algorithm and the Cross-Correlation Function, which can be flexibly selected and set according to the actual application scenario and application requirements.

[0071] It should be noted that, to ensure the safety and stability of the subsequent training process and the accuracy and effectiveness of the historical resource usage data corresponding to the historical model training tasks, in this specification, before the terminal device filters out similar historical model training tasks from the database based on the historical resource usage data, it can also perform data preprocessing on the historical resource usage data corresponding to the historical model training tasks. The terminal device can use the preprocessed historical resource usage data corresponding to the historical model training tasks for the subsequent process of determining similar historical model training tasks from the database. Furthermore, after determining the similar historical model training tasks corresponding to the historical model training tasks, the terminal device can also perform the same preprocessing on the historical resource usage data corresponding to the obtained similar historical model training tasks, thereby ensuring the accuracy and effectiveness of the subsequent training process.

[0072] The specific data preprocessing methods for historical resource occupancy data described in this specification may include, but are not limited to, outlier removal, noise reduction, and missing data imputation. The specific preprocessing methods for historical resource occupancy data are not strictly limited in this specification and can be flexibly set according to actual application scenarios and needs.

[0073] S103: Input the historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training task into the prediction model to be trained, so that the prediction model determines the resource usage feature data corresponding to the historical model training task based on the historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training task, and determines the prediction execution time corresponding to the historical model training task based on the resource usage feature data.

[0074] In this specification, the terminal device can use historical resource usage data corresponding to historical model training tasks and historical resource usage data corresponding to similar historical model training tasks as training datasets for those historical model training tasks, and input them into the prediction model to be trained. This allows the prediction model to determine the resource usage characteristic data corresponding to the historical model training tasks based on these data. Then, the prediction model can predict the prediction execution time for the historical model training tasks based on the resource usage characteristic data.

[0075] Specifically, as mentioned earlier, historical resource usage data can be time-series data composed of resource usage data at various time points of the corresponding model training task on the mobile phone at preset time intervals. The terminal device can input historical resource usage data in the form of time-series data corresponding to historical model training tasks and similar historical model training tasks into the prediction model to be trained, so that the prediction model can determine the data similarity between the resource usage data corresponding to each time point in each historical resource usage data and the resource usage data corresponding to other time points in the historical resource usage data.

[0076] Then, the terminal device can use a prediction model to determine the attention weights corresponding to the resource usage data at other time points based on the data similarity between the resource usage data at that time point and the resource usage data at other time points in the historical resource usage data. The prediction model can then perform self-attention weighting on the resource usage data at that time point based on the attention weights and the corresponding historical resource usage data, obtaining the self-attention feature data for that time point. Therefore, based on the self-attention feature data corresponding to each time point in the historical resource usage data, the self-attention feature sequence corresponding to that historical resource usage data is determined.

[0077] Using the above method, the terminal device can determine the self-attention feature sequences corresponding to historical model training tasks and similar historical model training tasks through the prediction model to be trained. Then, the terminal device can perform cross-attention processing on the self-attention feature sequences corresponding to historical model training tasks and similar historical model training tasks through the prediction model to obtain the resource consumption feature data corresponding to the historical model training tasks.

[0078] Specifically, the terminal device can use the prediction model to determine the feature similarity between the self-attention feature data at each time point in the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to similar historical model training tasks, based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature data at time points in the same time order in the self-attention feature sequence corresponding to similar historical model training tasks.

[0079] Then, the prediction model can determine the cross-attention weights corresponding to the self-attention feature data at that time point. Based on these cross-attention weights and the self-attention feature data of similar historical model training tasks arranged in the same time order, the model performs cross-attention weighting on the self-attention feature data at that time point to obtain the cross-attention feature data corresponding to that time point. Subsequently, the terminal device can use the prediction model to determine the cross-attention feature data corresponding to each time point in the self-attention feature sequence of the historical model training task. This allows for feature fusion of the historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to similar historical model training tasks to obtain the resource usage feature data corresponding to the historical model training task.

[0080] After determining the resource consumption characteristics of historical model training tasks, the terminal device can use the prediction model to be trained to predict the prediction execution time of historical model training tasks based on the resource consumption characteristics of historical model training tasks.

[0081] Regarding the actual processing of the predicted execution time, specifically, in this specification, the terminal device can input the resource consumption feature data corresponding to the historical model training task into the preset fully connected layer in the prediction model. This allows the fully connected layer to perform linear transformation processing on the resource consumption feature data based on the preset linear regression function and the resource consumption feature data corresponding to the historical model training task determined through the above steps, thereby obtaining the predicted execution time corresponding to the historical model training task.

[0082] In addition to the above, other methods can be used to predict the execution time based on resource consumption characteristics data corresponding to the historical model training task, such as decision tree regression analysis, recurrent neural networks (RNNs), etc. This specification does not strictly limit the specific implementation method and process of determining the predicted execution time based on the resource consumption characteristics data corresponding to the historical model training task; it can be flexibly set according to the actual application scenario and requirements.

[0083] It should also be noted that this specification does not strictly limit the specific model category and structure of the prediction model mentioned above. It can be, for example, the Transformer model. For the prediction model of the Transformer model, the historical resource consumption data corresponding to the historical model training task and similar historical model training tasks are first transformed in dimension. Then, the position is encoded for each time point. Then, self-attention and cross-attention processing similar to the above methods are performed. This normalizes and means the resource consumption feature data corresponding to the historical model training task. Then, after passing the preset linear regression transformation in the fully connected layer, the prediction execution time corresponding to the historical model training task is predicted.

[0084] In addition, in this specification, the model structure of the prediction model can also be similar to a deep learning model based on a multi-layer neural network. For details on the specific process of determining the prediction execution time corresponding to the historical model training task for a prediction model with a model structure based on a multi-layer neural network deep learning model, please refer to [link to documentation]. Figure 2 As shown in the image.

[0085] Figure 2 This is a schematic diagram illustrating an example of the model structure corresponding to one of the prediction models provided in this specification.

[0086] like Figure 2 As shown, the terminal device can input the training dataset corresponding to the historical model training task, that is, the historical resource usage data corresponding to the historical model training task and the similar historical model training task, into the prediction model with a multi-layer neural network structure. The prediction model can extract and learn the resource usage feature data corresponding to the historical model training task and at least one similar historical model training task in the training dataset, and after multi-layer neural network extraction and fusion, finally predict the prediction execution time corresponding to the historical model training task.

[0087] In addition to the implementation methods of the Transformer model and the examples of deep learning models based on multilayer neural networks mentioned above, the prediction model in this specification can also be a multilayer perceptron (MLP) model. The terminal device can input historical resource usage data corresponding to historical model training tasks and similar historical model training tasks into the prediction model, which has a multilayer perceptron model structure. This allows the prediction model to extract the dependency relationship between resource usage data and time at each time point in the historical resource usage data through the hidden layer structure within the model, thereby predicting the prediction execution time corresponding to the historical model training task. Furthermore, the prediction model in this specification can also be, for example, a recurrent neural network (RNN), a random forest (RF), or a decision tree (DT) model. The specific model structure can be selected and set according to the actual application scenario and requirements.

[0088] S104: Determine the loss value for the prediction model based on the predicted execution time and the historical task execution time, and train the prediction model based on the loss value, wherein the magnitude of the loss value is negatively correlated with the similarity between the predicted execution time and the historical task execution time.

[0089] In this specification, the terminal device can determine the loss value for the prediction model based on the prediction execution time determined by the prediction model to be trained and the historical task execution times corresponding to the previously acquired historical model training tasks. Then, based on the determined loss value, the terminal device can train and optimize the prediction model to be trained. The magnitude of the determined loss value is negatively correlated with the similarity between the prediction execution time and the historical task execution time; the closer the prediction execution time and the historical task execution time are, the smaller the loss value, and vice versa.

[0090] Regarding the training process for the prediction model, it should be noted that, in this specification, the execution time of historical model training tasks can be used as training annotation labels for the corresponding training datasets of those historical model training tasks, thereby guiding the subsequent training process for the prediction model. Furthermore, training a prediction model often requires a large amount of training data to optimize the model parameters. When multiple historical model training tasks exist, each historical model training task can also become a member of the training datasets corresponding to other historical model training tasks.

[0091] For example, suppose the terminal device determines the training dataset A corresponding to the historical model training task A through the above steps. The training dataset A contains the historical resource usage data corresponding to the historical model training task A and its similar historical model training tasks. The training dataset B corresponding to the historical model training task B contains the historical resource usage data corresponding to the historical model training task B and its similar historical model training tasks.

[0092] If the historical resource usage data corresponding to historical model training task A and the historical resource usage data corresponding to historical model training task B have a high degree of similarity, the terminal device can include historical model training task B as a member of the similar historical model training tasks of historical model training task A in the training dataset A during the matching process of the above similar historical model training tasks. At the same time, the terminal device can also include historical model training task A as a member of the similar historical model training tasks of historical model training task B in the training dataset B.

[0093] However, the training dataset A corresponding to historical model training task A has a training label that corresponds to the historical task execution time 'a', while the training dataset B corresponding to historical model training task B has a training label that corresponds to the historical task execution time 'b'. During subsequent training, when the terminal device inputs training dataset A into the prediction model to be trained, the prediction model can predict only the execution time of historical model training task A and determine the loss value based on the historical task execution time 'a'. This allows the model to learn the subtle feature differences between historical model training tasks A and B. Similarly, for training dataset B, the prediction model can predict only the execution time of historical model training task B and determine the loss value based on the historical task execution time 'b'.

[0094] In summary, it's clear that while different historical model training tasks may be similar to each other's historical model training tasks, the training process still targets the historical model training task corresponding to the training dataset. This training method effectively enables the trained prediction model to accurately distinguish historical model training tasks with overly similar historical resource consumption data. In practical applications, this allows for more accurate and efficient prediction of the execution time for each target model training task, thus ensuring the rational allocation and utilization of overall training resources.

[0095] The method described in this specification is mainly divided into two stages: the model training stage and the practical application stage. The model training stage, as described above, is primarily used to obtain a predictive model that, after training, has the ability to predict the execution time of the model training task. Then, in the practical application stage, the terminal device can use the trained predictive model to predict the execution time of the target model training task, thereby rationally allocating overall training resources and planning subsequent model training tasks based on the predicted execution time of the target model training task.

[0096] To facilitate the explanation of task execution methods, a flowchart illustrating a task execution time prediction method will be used below. Figure 3 As shown.

[0097] Figure 3 This is a flowchart illustrating a task execution time prediction method provided in this specification, including the following steps:

[0098] S301: Monitor and collect resource usage data corresponding to the prediction time after the target model training task starts execution.

[0099] S302: Based on the resource usage data corresponding to the target model training task, filter out similar historical model training tasks corresponding to the target model training task from a preset database, and obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0100] With the continuous advancement of science and technology, highly intelligent and convenient intelligent mathematical models have been widely adopted and applied in many fields of modern science and technology. However, the resource costs and time consumption for training various models are extremely high. Therefore, it is crucial to rationally allocate training resources to ensure the efficient use of limited resources. Currently, the mainstream methods for predicting the execution time of model training tasks mostly rely on task-aware technology. This method requires in-depth data mining of the model during the training process, which may potentially affect the model's current training performance, impacting its training capabilities and delaying overall training efficiency. Therefore, how to efficiently, accurately, and safely predict the execution time of model training tasks, and thus rationally allocate training resources and formulate training plans, is an extremely important issue.

[0101] Therefore, this specification provides a method for predicting task execution time. The execution subject of this method can be a terminal device such as a desktop computer or laptop computer, or a server. Alternatively, the execution subject can be software, such as a client installed on a terminal device. For ease of explanation and understanding, this specification will only use a terminal device as the execution subject to describe the provided task execution time prediction method.

[0102] Based on this, the terminal device using the task execution time prediction method provided in this specification can predict the predicted execution time of the target model training task from start to finish based on the resource usage data corresponding to the target model training task within a preset time period. This allows the terminal device or relevant developers to reasonably allocate training resources and the training plan for subsequent model training tasks based on the predicted execution time of the target model training task.

[0103] The specific application scenarios for predicting the execution time of a target model training task on the terminal device can be determined according to actual needs. For example, in scenarios where limited resources need to be rationally allocated, the terminal device using the method in this specification can accurately predict the predicted execution time that the target model training task may take to complete based on the resource usage data corresponding to the target model training task within a preset time period. This allows relevant developers to accurately control the allocation and use of limited resources based on the predicted execution time of the target model training task, thereby effectively avoiding resource waste and idleness, improving resource utilization efficiency, and greatly improving the overall training efficiency of the model training task.

[0104] For example, in scenarios involving energy constraints and management during model training, terminal devices using the methods described in this specification can accurately determine the estimated execution time required for the target model training task from start to successful completion based on resource usage data within a preset time period. This allows relevant developers and resource management personnel to rationally allocate and limit resources based on the amount of resources and time consumed by the target model to complete the training task. This effectively ensures that resources are not wasted while maximizing the fulfillment of the resource requirements of the target model training task.

[0105] In this specification, the terminal device can continuously monitor the resource usage data of the target model training task after it begins execution, and collect the resource usage data corresponding to the prediction duration of the target model training task after execution begins. Then, based on the resource usage data corresponding to the target model training task within the prediction duration, the terminal device can filter and determine similar historical model training tasks corresponding to the target model training task from a preset database, and simultaneously obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0106] The specific process for determining the training task of the similar historical model is basically the same as that for determining the training task of the historical model corresponding to the training task of the model mentioned above, and will not be elaborated on here. The specific data format and content of the resource consumption data corresponding to the target model training task within the prediction time and the historical resource consumption data corresponding to the training task of the similar historical model are also basically consistent with those mentioned in the model training method above.

[0107] S303: Input the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training task into the prediction model, so that the prediction model determines the target feature data corresponding to the target model training task based on the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training task, and determines the execution time of the prediction task corresponding to the target model training task based on the target feature data.

[0108] In this specification, the terminal device can input the resource usage data corresponding to the target model training task within a preset time period and the historical resource usage data corresponding to similar historical model training tasks into the prediction model. This allows the prediction model to determine the target feature data corresponding to the target model training task based on the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to similar historical model training tasks. Subsequently, the terminal device can use the prediction model to predict the execution time of the prediction task corresponding to the target model training task based on the target feature data corresponding to the target model training task.

[0109] Although the resource usage data corresponding to the target model training task is smaller in size compared to the historical resource usage data corresponding to similar historical model training tasks—merely representing the resource usage data for a preset time period—its small size will not significantly affect the prediction results of the final prediction model. In this specification, the prediction model trained using the aforementioned model training method can effectively capture the mapping relationship between resource usage data and actual execution time. Therefore, even when predicting execution time based on the small amount of data corresponding to the target model training task, the presence of feature data from multiple similar historical model training tasks can effectively supplement the missing data content and corresponding feature data, thus effectively ensuring the accuracy and precision of the final prediction results.

[0110] As can be seen from the above, the model training method and task execution time prediction method provided in this specification can match similar historical model training tasks from the database based on the historical resource consumption data corresponding to historical model training tasks. Then, the prediction model to be trained is trained based on the historical resource consumption data corresponding to the historical model training tasks and similar historical model training tasks. The trained prediction model can accurately predict the execution time required for the target model training task from start to finish based on the resource consumption data corresponding to the target model training task within a preset time period.

[0111] Compared to current task-aware methods for predicting training task execution time, the method described in this specification is safer, more reliable, more efficient, and provides more accurate predictions. This allows both terminal devices and developers to have a reliable basis for allocating training resources. It effectively improves training resource utilization, avoids waste and delays, and significantly enhances the overall training efficiency, thereby accelerating the execution of subsequent tasks or business processes based on the trained model.

[0112] The above describes one or more implementations of the methods described in this specification. Based on the same approach, this specification also provides corresponding model training devices, such as... Figure 4 As shown.

[0113] Figure 4 This is a schematic diagram of a model training device provided in this specification, including:

[0114] The acquisition module 401 is used to acquire historical resource usage data and historical task execution time corresponding to historical model training tasks;

[0115] The filtering module 402 is used to filter similar historical model training tasks corresponding to the historical model training tasks from a preset database based on the historical resource usage data, and to obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0116] The prediction module 403 is used to input the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task into the prediction model to be trained, so that the prediction model determines the resource consumption feature data corresponding to the historical model training task based on the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task, and determines the prediction execution time corresponding to the historical model training task based on the resource consumption feature data.

[0117] The training module 404 is used to determine the loss value for the prediction model based on the predicted execution time and the historical task execution time, and to train the prediction model based on the loss value, wherein the magnitude of the loss value is negatively correlated with the similarity between the predicted execution time and the historical task execution time.

[0118] Optionally, the acquisition module 401 is specifically used to perform data preprocessing on the historical resource usage data corresponding to the historical model training task to obtain the preprocessed historical resource usage data corresponding to the historical model training task. The data preprocessing includes at least outlier deletion processing, data noise reduction processing, and missing data filling processing on the historical resource usage data corresponding to the historical model training task.

[0119] The filtering module 402 is specifically used to filter out similar historical model training tasks corresponding to the historical model training tasks from the database based on the preprocessed historical resource usage data corresponding to the historical model training tasks, and obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0120] Optionally, the filtering module 402 is specifically used to, for each candidate historical model training task in the database, determine the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task; when the similarity is greater than a preset similarity threshold, the candidate historical model training task is regarded as a similar historical model training task corresponding to the historical model training task.

[0121] Optionally, the historical resource usage data package contains resource usage data for different resource types corresponding to the historical model training task, and the resource types include at least CPU utilization, GPU utilization, memory usage, and disk throughput.

[0122] The filtering module 402 is specifically used to: for each type of resource usage data in the historical resource usage data corresponding to the candidate historical model training task, determine the similarity between the resource usage data of the candidate historical model training task and the resource usage data of the same type of resource usage data corresponding to the historical model training task; perform weighted processing on the similarity of the resource usage data of each resource type corresponding to the candidate historical model training task to determine the comprehensive similarity corresponding to the candidate historical model training task, and use it as the similarity between the historical resource usage data of the candidate historical model training task and the historical resource usage data corresponding to the historical model training task.

[0123] Optionally, the historical resource usage data is time-series data composed of resource usage data of the historical model training task at various time points collected at preset time intervals;

[0124] The prediction module 403 is specifically used to input the historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training tasks into the prediction model to be trained, so that the prediction model determines the data similarity between the resource usage data corresponding to each time point in each historical resource usage data and the resource usage data corresponding to other time points in the historical resource usage data, and determines the attention weight corresponding to the time point based on the data similarity, performs self-attention weighted processing on the historical resource usage data based on the attention weights corresponding to each time point, and obtains the self-attention feature sequence corresponding to the historical resource usage data; and determines the resource usage feature data corresponding to the historical model training task based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training tasks.

[0125] Optionally, the prediction module 403 is specifically configured to: determine the feature similarity between the self-attention feature data at each time point in the self-attention feature data of ...

[0126] Based on the same approach, this manual also provides a corresponding task execution time prediction device, such as... Figure 5 As shown.

[0127] Figure 5 This is a schematic diagram of a task execution time prediction device provided in this specification, including:

[0128] The acquisition module 501 is used to monitor and collect the resource usage data corresponding to the prediction time after the target model training task starts execution;

[0129] The filtering module 502 is used to filter similar historical model training tasks corresponding to the target model training task from a preset database based on the resource usage data corresponding to the target model training task, and to obtain the historical resource usage data corresponding to the similar historical model training tasks.

[0130] The prediction module 503 is used to input the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training task into the prediction model, so that the prediction model determines the target feature data corresponding to the target model training task based on the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training task, and determines the execution time of the prediction task corresponding to the target model training task based on the target feature data, wherein the prediction model is obtained through the model training method described above.

[0131] This specification also provides a computer-readable storage medium storing a computer program that can be used to execute the above-described... Figure 1 The provided model training methods and Figure 3 The provided method for predicting task execution time.

[0132] This instruction manual also provides Figure 6 The one shown corresponds to Figure 1 and Figure 3 A schematic diagram of the structure of an electronic device. (e.g.) Figure 6 As shown, at the hardware level, this electronic device includes a processor, internal bus, network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then runs it to achieve the above. Figure 1 The model training method and Figure 3 The task execution time prediction method described above.

[0133] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0134] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0135] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0136] For ease of description, the above devices are described in terms of function, divided into various units. Of course, in implementing this specification, the functions of each unit can be implemented in one or more software and / or hardware.

[0137] Those skilled in the art will understand that embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this specification may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0138] This specification is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this specification. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0139] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0140] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0141] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0142] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0143] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0144] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0145] Those skilled in the art will understand that the embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this specification may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0146] This specification can be described in the general context of computer-executable instructions that are executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This specification can also be practiced in distributed computing environments, where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0147] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

[0148] The above description is merely an embodiment of this specification and is not intended to limit this specification. Various modifications and variations can be made to this specification by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this specification should be included within the scope of the claims of this specification.

Claims

1. A model training method for training a prediction model, characterized in that, include: Obtain historical resource consumption data and historical task execution time corresponding to historical model training tasks; Based on the historical resource usage data, similar historical model training tasks corresponding to the historical model training tasks are selected from the preset database, and historical resource usage data corresponding to the similar historical model training tasks are obtained. The historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task are input into the prediction model to be trained, so that the prediction model determines the resource consumption feature data corresponding to the historical model training task based on the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task, and determines the prediction execution time corresponding to the historical model training task based on the resource consumption feature data. Based on the predicted execution time and the historical task execution time, a loss value is determined for the prediction model, and the prediction model is trained based on the loss value, wherein the magnitude of the loss value is negatively correlated with the similarity between the predicted execution time and the historical task execution time.

2. The method as described in claim 1, characterized in that, Before filtering similar historical model training tasks corresponding to the historical model training tasks from a preset database based on the historical resource usage data, and obtaining the historical resource usage data corresponding to the similar historical model training tasks, the method further includes: The historical resource usage data corresponding to the historical model training task is preprocessed to obtain the preprocessed historical resource usage data corresponding to the historical model training task. The data preprocessing includes at least outlier removal, data noise reduction, and missing data filling for the historical resource usage data corresponding to the historical model training task. Based on the historical resource usage data, similar historical model training tasks corresponding to the historical model training tasks are selected from a preset database, and historical resource usage data corresponding to the similar historical model training tasks is obtained, specifically including: Based on the preprocessed historical resource usage data corresponding to the historical model training task, similar historical model training tasks corresponding to the historical model training task are selected from the database, and historical resource usage data corresponding to the similar historical model training tasks are obtained.

3. The method as described in claim 1, characterized in that, Based on the historical resource usage data, similar historical model training tasks corresponding to the historical model training tasks are selected from a preset database, specifically including: For each candidate historical model training task in the database, the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task is determined based on the historical resource usage data corresponding to the historical model training task. When the similarity is greater than a preset similarity threshold, the candidate historical model training task is taken as the similar historical model training task corresponding to the historical model training task.

4. The method as described in claim 3, characterized in that, The historical resource usage data package contains resource usage data for different resource types corresponding to the historical model training task. The resource types include at least CPU utilization, GPU utilization, memory usage, and disk throughput. For each candidate historical model training task in the database, the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task is determined based on the historical resource usage data corresponding to the historical model training task. Specifically, this includes: For each type of resource usage data in the historical resource usage data corresponding to the candidate historical model training task, determine the similarity between the resource usage data of the candidate historical model training task and the resource usage data of the same type of resource usage data corresponding to the historical model training task. The similarity of resource usage data of each resource type corresponding to the candidate historical model training task is weighted to determine the comprehensive similarity corresponding to the candidate historical model training task, and this comprehensive similarity is used as the similarity between the historical resource usage data corresponding to the candidate historical model training task and the historical resource usage data corresponding to the historical model training task.

5. The method as described in claim 1, characterized in that, The historical resource usage data is time-series data composed of resource usage data of the historical model training task at various time points, collected at preset time intervals. The historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training tasks are input into the prediction model to be trained, so that the prediction model can determine the resource usage feature data corresponding to the historical model training task based on the historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training tasks, specifically including: The historical resource usage data corresponding to the historical model training task and the historical resource usage data corresponding to the similar historical model training task are input into the prediction model to be trained. This allows the prediction model to determine the data similarity between the resource usage data corresponding to each time point in each historical resource usage data and the resource usage data corresponding to other time points in the historical resource usage data. Based on the data similarity, the attention weight corresponding to that time point is determined. Based on the attention weight corresponding to each time point, the historical resource usage data is subjected to self-attention weighted processing to obtain the self-attention feature sequence corresponding to the historical resource usage data. Based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task, the resource consumption feature data corresponding to the historical model training task is determined.

6. The method as described in claim 5, characterized in that, Based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task, the resource consumption feature data corresponding to the historical model training task is determined, specifically including: Based on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task, for the self-attention feature data at each time point in the self-attention feature sequence corresponding to the historical model training task, the feature similarity between the self-attention feature data at that time point and the self-attention feature data at the time points in the self-attention feature sequence corresponding to the similar historical model training task that are arranged in the same order at that time point is determined, and the cross-attention weight corresponding to the self-attention feature data at that time point is determined based on the feature similarity. Based on the cross-attention weights corresponding to the self-attention feature data at each time point in the self-attention feature sequence corresponding to the historical model training task, cross-attention weighting processing is performed on the self-attention feature sequence corresponding to the historical model training task and the self-attention feature sequence corresponding to the similar historical model training task to perform feature fusion. The fused feature data of the historical model training task and the similar historical model training task is used as the resource consumption feature data corresponding to the historical model training task.

7. A method for predicting task execution time, characterized in that, include: Monitor and collect resource usage data corresponding to the prediction time after the target model training task starts execution; Based on the resource usage data corresponding to the target model training task, similar historical model training tasks corresponding to the target model training task are selected from a preset database, and historical resource usage data corresponding to the similar historical model training tasks are obtained. The resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training task are input into the prediction model, so that the prediction model determines the target feature data corresponding to the target model training task based on the resource usage data corresponding to the target model training task and the historical resource usage data corresponding to the similar historical model training task, and determines the execution time of the prediction task corresponding to the target model training task based on the target feature data, wherein the prediction model is obtained by the method described in any one of claims 1 to 6.

8. A model training apparatus for training a prediction model, characterized in that, include: The acquisition module is used to acquire historical resource usage data and historical task execution time corresponding to historical model training tasks; The filtering module is used to filter similar historical model training tasks corresponding to the historical model training tasks from a preset database based on the historical resource usage data, and to obtain the historical resource usage data corresponding to the similar historical model training tasks. The prediction module is used to input the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task into the prediction model to be trained, so that the prediction model determines the resource consumption feature data corresponding to the historical model training task based on the historical resource consumption data corresponding to the historical model training task and the historical resource consumption data corresponding to the similar historical model training task, and determines the prediction execution time corresponding to the historical model training task based on the resource consumption feature data. The training module is used to determine the loss value for the prediction model based on the predicted execution time and the historical task execution time, and to train the prediction model based on the loss value, wherein the magnitude of the loss value is negatively correlated with the similarity between the predicted execution time and the historical task execution time.

9. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the method described in any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method described in any one of claims 1 to 7.