A cost-based workflow scheduling method in an IaaS environment
By constructing a dynamic budget allocation model and task priority ranking, workflow tasks in the IaaS environment are rationally allocated, solving the problems of workflow execution time and cost, and minimizing the earliest completion time under budget constraints.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2023-12-22
- Publication Date
- 2026-06-30
Smart Images

Figure CN117742920B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of cloud computing, specifically relating to a cost-based workflow scheduling method in an IaaS environment. Background Technology
[0002] As data volumes continue to grow, the demands on computing environments are increasing. Traditional self-built computing clusters are less practical due to their high cost and inability to adapt to fragmented resource needs. Infrastructure as a Service (IaaS), a key service model in cloud computing, offers a pay-as-you-go service model. Users can flexibly configure computing, storage, and network resources to handle computing applications, eliminating the cost of purchasing physical machines and the hassle of maintaining them. Many applications deployed in IaaS environments consist of scientific workflows, such as CyberShake, Montage, Inspiral, Epigenomics, and Sipht.
[0003] The workflow allocation problem in cloud computing environments aims to provide a method for mapping resources to each task in a workflow, and is generally considered NP-hard. Workflows are typically represented as a directed acyclic graph (DAG), in which tasks are non-preemptive and there are predecessor-successor dependencies between tasks, meaning that a subtask can only begin execution after all its parent tasks have been processed.
[0004] However, as workflows grow in scale, executing workflows in an IaaS environment becomes increasingly time-consuming and costly. Typically, users have a defined budget for executing workflows in an IaaS environment, and allocating that budget reasonably to each task to reduce the overall workflow completion time presents a significant challenge. Summary of the Invention
[0005] To address the problems existing in the prior art, this invention proposes a cost-based workflow scheduling method in an IaaS environment, specifically including the following steps:
[0006] S1. Obtain the execution time of each task in the workflow on different virtual machines, as well as the data communication time between different virtual machines;
[0007] S2. Based on execution time and data communication time, and combined with the total cost constraints set by the user, a dynamic budget allocation model is constructed;
[0008] S3. After converting the workflow into a directed acyclic graph D, perform a single parent-child task pair merging process to obtain a new directed acyclic graph D';
[0009] S4. Calculate the priority of each task in the new directed acyclic graph D', and sort the tasks according to their priorities to obtain a sorted list;
[0010] S5. Calculate the cost constraint for each task in turn according to the sorted list, and obtain the current allowable budget when scheduling the task according to the dynamic budget allocation model;
[0011] S6. Calculate the earliest completion time of the task before and after copying the critical parent task on different virtual machines, and select the virtual machine corresponding to the smallest earliest completion time to place the task.
[0012] S7. Repeat steps S5-S6 until the task scheduling of the entire workflow is completed.
[0013] Furthermore, step S3 performs a single parent-child task pair merging process on the directed acyclic graph D to obtain a new directed acyclic graph D', including:
[0014] S31. Traverse the directed acyclic graph D to obtain all relationships between tasks; for each task, the task connected to it is its parent task, and the task connected to it is its child task.
[0015] S32. If task v i There is only one parent task v j And task v j There is only one subtask v i Then task v is called task v i With task v j For a single parent-child pair, task v i With task v j Merge into one task , is represented as:
[0016]
[0017]
[0018]
[0019] in, Indicates task v i The parent task set, Indicates task v j A set of subtasks;
[0020] S33. Merge each single parent-child pair in the directed acyclic graph D into a new task, and finally obtain a new directed acyclic graph D'.
[0021] Furthermore, the priorities of all tasks in the new directed acyclic graph D' are calculated, and the tasks are sorted according to their priorities to obtain a sorted list, including:
[0022] S41. Calculate the priority score of each task in the new directed acyclic graph D' using the upward sorting method, expressed as:
[0023]
[0024] S42. Set tasks with the same priority score, and calculate the priority score for each task, expressed as:
[0025]
[0026] S43. Determine the priority of each task based on the priority score and priority degree score, and sort them in descending order of priority to obtain a sorted list, represented as follows:
[0027]
[0028] in, This indicates that task v was obtained using the upward sorting method. i Priority score, Indicates task v i Average execution time across all virtual machines; Indicates connection task v i With task v a The edge e ia Average communication time on all types of network links in the network; Indicates task v i A set of subtasks; Indicates task v i Priority score; Indicates task v i With task v a Data communication time; This represents the set of tasks with the same priority score obtained by using the upward sorting method; Indicates task v i Priority.
[0029] Furthermore, step S5 obtains the current allowed budget for the task based on the dynamic budget allocation model, expressed as:
[0030]
[0031] in, Indicates the scheduling task v i The current allowable budget at that time This indicates the cost that has already been used. Indicates task v i The pre-allocated budget; This represents the total cost constraint set by the user.
[0032] Furthermore, task v i Pre-allocated budget The calculation formula is:
[0033]
[0034] in, Indicates task v i Average execution time across all virtual machines; Indicates task v i The set of parent tasks; For the set of tasks that have not yet been scheduled, Indicates task With the task Data communication time between them.
[0035] Furthermore, step S6 calculates task v i Analyze the earliest completion times of the critical parent task before and after copying it across different virtual machines, and select the virtual machine with the earliest completion time to place the task, including:
[0036] S61. Determine task v i Has the virtual machine cluster been traversed? If not, proceed to step S62; if yes, proceed to step S64.
[0037] S62. Computational Task v i Find the earliest completion time of task v before and after copying the critical parent task on the current virtual machine; then calculate task v. i The cost pre-allocated to the current virtual machine; determine if the cost is less than task v. i If the minimum cost is found, record and update task v. i If the lowest consumption cost is found, then step S63 is executed; otherwise, step S63 is executed directly.
[0038] S63. Judgment Task v i Check if the current allowed usage budget exceeds the pre-allocated consumption cost; if so, add the current virtual machine to task v. i If the candidate virtual machine set is not found, return to step S61; otherwise, return directly to step S61.
[0039] S64. Judgment Task v i Is the candidate virtual machine set empty? If so, then task v i The virtual machine containing the lowest cost of consumption is added to task vi The candidate virtual machine set is determined, and then step S65 is executed; otherwise, step S65 is executed directly.
[0040] S65. According to task v i From the candidate virtual machine set, select task v i The first candidate set consists of the virtual machines corresponding to the minimum and earliest completion times before copying the critical parent task, task v. i The virtual machines corresponding to the minimum and earliest completion times after copying the critical parent task form a second candidate set; then, task v is judged. i If the minimum and earliest completion time before copying the critical parent task is less than the minimum and earliest completion time after copying the critical parent task, proceed to step S66; otherwise, determine if the minimum and earliest completion time before copying the critical parent task is equal to the minimum and earliest completion time after copying the critical parent task. If so, proceed to step S67; otherwise, proceed directly to step S68.
[0041] S66. In task v i Select the latest idle virtual machine from the first candidate set, and assign task v i The virtual machine is assigned to it; then proceed to step S69.
[0042] S67. Judgment Task v i Does the number of subtasks in task v exceed 1? If so, then in task v... i From the second candidate set, select the virtual machine that is most recently idle, and assign task v to it. i The critical parent task is copied to this virtual machine, and task v is... i Assign it to the virtual machine, then jump to step S69; otherwise, proceed to task v. i Select the latest idle virtual machine from the first candidate set, and assign task v i Assign it to the virtual machine, then proceed to step S69;
[0043] S68. In task v i Randomly select a virtual machine from the second candidate set and assign task v i The critical parent task is copied to this virtual machine, and task v is... i Assign it to the virtual machine; then proceed to step S69.
[0044] S69. Determine whether the virtual machine has been used. If so, do not perform any operation; otherwise, mark the virtual machine as used.
[0045] Furthermore, step S62 calculates task v i The formula for calculating the earliest completion time before and after copying the critical parent task on the current virtual machine is as follows:
[0046]
[0047]
[0048] in, Indicates task v i In virtual machine The earliest completion time before copying the critical parent task. Indicates task v i In virtual machine The earliest completion time after copying the critical parent task. Indicates task v i In virtual machine Execution time; Indicates task v i In virtual machine The earliest start time before copying the critical parent task; Indicates task v i In virtual machine The earliest start time after copying the critical parent task is calculated using the following formula:
[0049]
[0050] in, Indicates task v i The key parent task, i.e., task v i The parent task that finishes last among all parent tasks. Indicates task v i The critical parent task in the virtual machine The earliest start time before copying the critical parent task; Indicates task v i The critical parent task in the virtual machine Execution time; Indicates task v j Not in virtual machine superior.
[0051] Furthermore, computational task v i Pre-assigned to the mth virtual machine Cost of consumption at the time The formula is:
[0052]
[0053] in, This represents a unit of time, where x represents the number of virtual machines currently in use. This represents the unit-time rental price of the resources of the j-th virtual machine; Represents the j-th virtual machine The calculation formula is as follows: (This formula is used to calculate the total usage time from the start of use to the present.)
[0054]
[0055] Among them, v to Represents virtual machine The first task executed on v from Represents virtual machine The last task executed on the platform; Indicates task v to In virtual machine The earliest completion time before copying the critical parent task. Indicates task v from In virtual machine The earliest start time before copying the critical parent task.
[0056] The beneficial effects of this invention are:
[0057] This invention constructs a dynamic budget allocation model, thereby more accurately describing the cost generated during task execution.
[0058] This invention considers the average communication time and execution time of subsequent tasks between different nodes during the work process, calculates the priority of the tasks, and sets the scheduling order for the tasks without violating the task dependencies, so that the scheduling result is closer to the global optimal solution.
[0059] Based on the cost weights of tasks and workflows, the cost of a task is pre-calculated, and the cost constraint of the task is set by this parameter. This ensures that the task does not exceed the cost constraint. By combining task merging and task replication methods, the virtual machine that minimizes the earliest completion time of the task is selected to allocate the task. Under the condition of meeting budget constraints, the completion time of the workflow is effectively reduced. Attached Figure Description
[0060] Figure 1 This is a flowchart of a cost-based scientific workflow scheduling method in an IaaS environment according to the present invention.
[0061] Figure 2 This is a workflow scheduling model diagram in the IaaS environment of this invention.
[0062] Figure 3 This is the directed acyclic graph of a single parent-child pair before and after merging in this invention;
[0063] Figure 4 This is a graph showing the task priority score results of an embodiment of the present invention;
[0064] Figure 5This is a flowchart of the overall algorithm of the present invention. Detailed Implementation
[0065] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0066] The workflow scheduling model constructed in the IaaS environment by this invention is shown in the figure below. Figure 2 As shown, the user sends a task to the workflow scheduling center. The workflow scheduling center obtains the resource list from the cloud resource terminal and sends the scheduling task to the virtual machine cluster. The workflow scheduling center generates a scheduling scheme to complete the workflow's computation job and finally returns the execution result to the user.
[0067] Based on the above architecture, this invention provides a cost-based workflow scheduling method in an IaaS environment, such as... Figure 1 As shown, it includes the following steps:
[0068] S1. Obtain the execution time of each task in the workflow on different virtual machines, as well as the data communication time between different virtual machines.
[0069] Specifically, in this embodiment of the invention, the virtual machine set is represented as , This indicates that the resource type of the nth virtual machine is the kth type, where C represents the computing power of the resource, measured in mega-floats per second (MFLOPS); B represents the bandwidth of the resource, measured in megabytes per second (MB / s); and Price represents the rental price of the resource per unit time. It can be simply remembered as .
[0070] S2. Based on execution time and task data communication time, and combined with the total cost constraints set by the user, a dynamic budget allocation model is constructed.
[0071] S3. After transforming the workflow into a directed acyclic graph D, perform a single parent-child task pair merging process to obtain a new directed acyclic graph D'.
[0072] Specifically, the workflow is represented as a directed acyclic graph. V represents the set of nodes, where each node represents a task; E represents the set of edges.
[0073] Specifically, step S3 performs a single parent-child task pair merging process on the directed acyclic graph D to obtain a new directed acyclic graph D', including:
[0074] S31. Traverse the directed acyclic graph D to obtain all relationships between tasks; for each task, the task connected to it is its parent task, and the task connected to it is its child task.
[0075] S32. If task v i There is only one parent task v j And task v j There is only one subtask v i Then task v is called task v i With task v j For a single parent-child pair, task v i With task v j Merge into one task , is represented as:
[0076]
[0077]
[0078]
[0079] in, Indicates task v i The parent task set, Indicates task v j A set of subtasks;
[0080] S33. Merge each single parent-child pair in the directed acyclic graph D into a new task, and finally obtain a new directed acyclic graph D'.
[0081] Specifically, such as Figure 3 As shown, Figure 3 (a) In one embodiment, the acquired workflow is converted into a directed acyclic graph, where task v5 and task v7 are a single parent-child pair, and therefore can be merged into a new task v'5, as shown below. Figure 3 (b) shows a directed acyclic graph; in this case, the parent tasks v2, v3 and v4 of task v5 become the parent tasks of task v'5, and the child tasks v8 and v9 of task v7 become the child tasks of task v'5.
[0082] S4. Calculate the priority of each task in the new directed acyclic graph D', and sort the tasks according to their priorities to obtain a sorted list.
[0083] Specifically, in order to schedule tasks in the workflow sequentially, this invention sorts the tasks by calculating the priority of all tasks in the new directed acyclic graph D' to obtain a sorted list, including:
[0084] S41. Calculate the priority score of each task in the new directed acyclic graph D' using the upward sorting method, expressed as:
[0085]
[0086] in, This indicates that task v was obtained using the upward sorting method. i Priority score; Indicates task v i Average execution time across all virtual machines, such as Figure 3 As shown, the number in parentheses within each circle represents the average execution time of the task across all virtual machines. Figure 3 In (a), 5 (3) indicates that the average execution time of task v5 across all virtual machines is 3 time units; Indicates connection task v i With task v a The edge e ia Average communication time on all types of network links in the network; Indicates task v i The set of subtasks.
[0087] S42. Since there are cases where the priority scores calculated through S41 are the same, we collect the tasks with the same priority scores and calculate the priority score for each task, as follows:
[0088]
[0089] in, Indicates task v i Priority score; Indicates task v i With task v a Data communication time, such as Figure 3 As shown in (a), the data communication time between task v6 and task v8 is 6 time units. It should be noted that if the two tasks are on the same virtual machine, the data communication time between the two tasks is 0. This represents the set of tasks with the same priority score obtained by using the upward sorting method; It is initially an empty set.
[0090] S43. Determine the priority of each task based on the priority score and priority degree score, and sort them in descending order of priority to obtain a sorted list, represented as follows:
[0091]
[0092] in, Indicates task v i Priority. For Figure 3 (b) Sort the data, and obtain the priority score and priority degree score as follows: Figure 4 As shown, the priority of each task is finally determined.
[0093] S5. Calculate the cost constraint for each task in turn according to the sorted list, and obtain the current allowable budget when scheduling the task based on the dynamic budget allocation model.
[0094] Specifically, in order to ensure that each task has a corresponding cost constraint, in scheduling tasks... At that time, for the task Calculate the corresponding pre-allocated budget , is represented as:
[0095]
[0096] in, Indicates task v i Average execution time across all virtual machines; Indicates task v i The set of parent tasks; For the set of tasks that have not yet been scheduled, Indicates task v j With task v i Data communication time between them This indicates the cost that has already been used. This represents the total cost constraint set by the user. Indicates task v u Average execution time across all virtual machines; Indicates task v k With task v u Data communication time between them.
[0097] Based on the pre-allocated budget, the current allowed budget for this task is obtained according to the dynamic budget allocation model, and is expressed as:
[0098]
[0099] in, Indicates the scheduling task v i The current allowable budget at that time This indicates the cost that has already been used. Indicates task v i The pre-allocated budget; This represents the total cost constraint set by the user.
[0100] S6. Calculate the earliest completion time of the task before and after copying the critical parent task on different virtual machines, and select the virtual machine corresponding to the earliest completion time to place the task.
[0101] Specifically, task vi The key parent task refers to task v i The parent task that finishes execution latest among all parent tasks; copy task v i The key parent task represents task v i The key parent task is copied to task v i In the current virtual machine's idle time slot, by reducing the time spent on critical parent tasks and task v... i Reduce data communication time between tasks and reduce task v i The earliest completion time in the current virtual machine. Task v i The earliest completion time before copying the critical parent task on the current virtual machine refers to the time when the task v is directly copied without performing the critical parent task copying operation. i Pre-allocate task v to the current virtual machine i Time of execution completion; Task v i The earliest completion time after copying the critical parent task on the current virtual machine refers to the time after task v is copied. i The key parent task is copied to task v i Run the task on the current virtual machine and execute it once, then execute task v again. i The task v obtained i The time when execution is completed.
[0102] Specifically, such as Figure 5 As shown, step S6 calculates task v i Analyze the earliest completion times of the critical parent task before and after copying it across different virtual machines, and select the virtual machine with the earliest completion time to place the task, including:
[0103] S61. Determine task v i Have you traversed the entire virtual machine cluster? If not, proceed to step S62; if yes, proceed to step S64; for the current task to be scheduled, v i It can be assigned to virtual machines that have already been used, or to k types of virtual machines that have not yet been used; This indicates the ability to assign tasks (v). i A collection of virtual machines, This represents the set of virtual machines that have already been used. Initially an empty set; This indicates a virtual machine of type k that has not yet been used.
[0104] S62. Computational Task v i Find the earliest completion time of task v before and after copying the critical parent task on the current virtual machine; then calculate task v. iThe cost pre-allocated to the current virtual machine; determine if the cost is less than task v. i If the minimum cost (i.e., less than the previously calculated cost) is found, then record and update task v. i If the minimum consumption cost is found, then proceed to step S63; otherwise, proceed directly to step S63.
[0105] Specifically, the earliest completion time of different tasks before and after copying the critical parent task on different virtual machines is expressed as:
[0106]
[0107]
[0108] in, Indicates task v i In virtual machine The earliest completion time before copying the critical parent task. Indicates task v i In virtual machine The earliest completion time after copying the critical parent task. Indicates task v i In virtual machine Execution time; Indicates task v i In virtual machine The earliest start time before copying the critical parent task; Indicates task v i In virtual machine Earliest start time after copying the critical parent task (if task v) i The critical parent task in the virtual machine If the result is above, then no copying operation is needed. The calculation formula is:
[0109]
[0110] in, Indicates task v i The key parent task, i.e., task v i The parent task that finishes execution latest among all parent tasks, such as Figure 3 As shown in (a), under isomorphic conditions, such as the task It's a task The key parent task. Indicates task v i The critical parent task in the virtual machine The earliest start time before copying the critical parent task; Indicates task v i The critical parent task in the virtual machine Execution time; Indicates task v j Not in virtual machine superior.
[0111] Specifically, computation task v i Cost of pre-allocated to the m-th virtual machine The formula is:
[0112]
[0113] in, This represents a unit of time, where x represents the number of virtual machines currently in use (from the start of scheduling the first task in the workflow to the current task v). i (At that time, the number of virtual machines that have already been assigned tasks). This represents the unit-time rental price of the resources of the j-th virtual machine; Represents the j-th virtual machine The calculation formula is as follows: (This formula is used to calculate the total usage time from the start of use to the present.)
[0114]
[0115] Among them, v to Represents virtual machine The first task executed on v from Represents virtual machine The last task executed on the platform; Indicates task v to In virtual machine The earliest completion time before copying the critical parent task. Indicates task v from In virtual machine The earliest start time before copying the critical parent task.
[0116] S63. Judgment Task v i Pre-allocated consumption costs Is it less than the currently allowed budget? If so, add the current virtual machine to task v. i The candidate virtual machine set is initially empty, and then the process returns to step S61; otherwise, the process returns directly to step S61.
[0117] S64. Judgment Task v i If the candidate virtual machine set is empty, then the task v recorded in step S62 will be added. i The virtual machine containing the lowest cost of consumption is added to task v iIf a candidate virtual machine set is selected, then step S65 is executed; otherwise, step S65 is executed directly.
[0118] Specifically, task v i The candidate virtual machine set is the set of virtual machines that satisfy the pre-allocated budget, if task v i If the pre-allocated budget cannot be met by assigning it to any virtual machine, then task v i The set of candidate virtual machines is the set of virtual machines that minimizes the cost of virtual machines, denoted as:
[0119]
[0120] in, This indicates the ability to assign tasks (v). i A collection of virtual machines, , This represents the set of virtual machines that have already been used. Initially an empty set; This indicates a virtual machine of type k that has not yet been used. Indicates pre-assigned tasks The lowest cost that has been used since then; Indicates task The set of candidate virtual machines that satisfy the task A set of virtual machines with budget constraints or when no task is satisfied. When using a budget-constrained virtual machine, pre-allocate tasks. The set of virtual machines with the lowest cost.
[0121] S65. According to task v i From the candidate virtual machine set, select task v i The first candidate set consists of the virtual machines corresponding to the minimum and earliest completion times before copying the critical parent task, task v. i The virtual machines corresponding to the minimum and earliest completion times after copying the critical parent task form the second candidate set; because the earliest completion time of the same task may be the same on several different virtual machines (here, the earliest completion time is a collective term for the earliest completion time before and after copying the critical parent task), there may be multiple virtual machines with the minimum and earliest completion time; then, task v is determined. i If the minimum earliest completion time before copying the critical parent task is less than the minimum earliest completion time after copying the critical parent task, proceed to step S66; otherwise, determine if the minimum earliest completion time before copying the critical parent task is equal to the minimum earliest completion time after copying the critical parent task. If so, proceed to step S67; otherwise, proceed directly to step S68.
[0122] Specifically, in task v iThe candidate virtual machine set is used to filter task v. i The minimum and earliest completion times before and after copying the critical parent task are expressed as follows:
[0123]
[0124]
[0125] in, Indicates task v i The minimum and earliest completion time to satisfy budget constraints before replicating the critical parent task; Indicates task v i The minimum and earliest completion time that satisfies the budget constraints before replicating the critical parent task.
[0126] S66. In task v i Select the latest idle virtual machine from the first candidate set, and assign task v i Assign it to the virtual machine; then proceed to step S69.
[0127] Specifically, step S66 can be expressed as: when That is, when the minimum earliest completion time before copying the critical parent task is less than the minimum earliest completion time after copying the critical parent task:
[0128]
[0129]
[0130] in, Indicates task v i The assigned virtual machine, Indicates task v i The first candidate set, Indicates task v i Do not copy critical parent tasks; Represents virtual machine The latest free time.
[0131] S67. Judgment Task v i Does the number of subtasks in task v exceed 1? If so, then in task v... i From the second candidate set, select the virtual machine that is most recently idle, and assign task v to it. i The critical parent task is copied to this virtual machine, and task v is... i Assign it to the virtual machine, then jump to step S69; otherwise, proceed to task v. i Select the latest idle virtual machine from the first candidate set, and assign task v i The virtual machine is assigned to it, and then the process proceeds to step S69.
[0132] Specifically, step S67 can be expressed as: when That is, when the minimum and earliest completion time before copying the critical parent task is equal to the minimum and earliest completion time after copying the critical parent task:
[0133]
[0134]
[0135] in, Indicates task v i The second candidate set, Indicates task v i Copy the critical parent task. Indicates task v i The number of subtasks.
[0136] S68. In task v i Randomly select a virtual machine from the second candidate set and assign task v i The critical parent task is copied to this virtual machine, and task v is... i Assign it to the virtual machine; then proceed to step S69.
[0137] Specifically, step S68 can be represented as: when That is, when the minimum earliest completion time before copying the critical parent task is greater than the minimum earliest completion time after copying the critical parent task:
[0138]
[0139]
[0140] In this embodiment of the invention, when task v i When the minimum and earliest completion times still correspond to multiple virtual machines, each virtual machine must first complete its previously assigned tasks before it becomes idle, and then it can begin executing task v. i For example: virtual machine x can start scheduling task v at time t. i The virtual machine y only starts scheduling task v at time t+100. i Therefore, virtual machine y is the last one to become idle. The reason for choosing the last virtual machine to become idle is because task v... i If the minimum and earliest completion times are the same when assigning tasks to these virtual machines, then task v will be... i By assigning a virtual machine to the one that becomes available last, we can provide virtual machines that become available earlier for scheduling subsequent tasks.
[0141] S69. Determine whether the virtual machine has been used. If so, do not perform any operation; otherwise, mark the virtual machine as used.
[0142] Task v i The final earliest completion time is the smaller of the minimum earliest completion time before copying the critical parent task and the minimum earliest completion time after copying the critical parent task, expressed as:
[0143]
[0144] S7. Repeat steps S5-S6 until the task scheduling of the entire workflow is completed. Output the scheduling arrangement and scheduling length of the entire workflow. The scheduling arrangement includes the start scheduling time, scheduling completion time, and allocated virtual machine for each task in the workflow.
[0145] Specifically, the earliest completion time for workflow D is the end time of the task. The earliest final completion time is expressed as:
[0146]
[0147] in, Indicates the earliest completion time of the workflow; This indicates the end task in workflow D.
[0148] In this invention, unless otherwise explicitly specified and limited, the terms "installation," "setting," "connection," "fixing," "rotation," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral part; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; they can refer to the internal communication of two components or the interaction between two components. Unless otherwise explicitly limited, those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.
[0149] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A cost-based workflow scheduling method in an IaaS environment, characterized in that, Includes the following steps: S1. Obtain the execution time of each task in the workflow on different virtual machines, as well as the data communication time between different virtual machines; S2. Based on execution time and data communication time, and combined with the total cost constraints set by the user, a dynamic budget allocation model is constructed; S3. After converting the workflow into a directed acyclic graph D, perform a single parent-child task pair merging process to obtain a new directed acyclic graph D'; S4. Calculate the priority of each task in the new directed acyclic graph D', and sort the tasks according to their priorities to obtain a sorted list; S5. Calculate the cost constraint for each task in turn according to the sorted list, and obtain the current allowable budget when scheduling the task according to the dynamic budget allocation model; S6. Calculate the earliest completion time of the task before and after copying the critical parent task on different virtual machines, and select the virtual machine corresponding to the smallest earliest completion time to place the task. Step S6 calculates the task v i Before and after copying the key parent task on different virtual machines, the earliest completion times are calculated, and the virtual machine corresponding to the minimum earliest completion time is selected to place the task. S61. Determine task v i whether the virtual machine cluster has been traversed, if not, step S62 is performed; if yes, step S64 is performed; S62. Computational Task v i Find the earliest completion time of task v before and after copying the critical parent task on the current virtual machine; then calculate task v. i The cost pre-allocated to the current virtual machine; determine if the cost is less than task v. i The minimum cost of consumption, if so, then update task v i If the lowest consumption cost is found, then step S63 is executed; otherwise, step S63 is executed directly. S63. Judgment Task v i Check if the current allowed budget exceeds the consumption cost; if so, add the current virtual machine to task v. i If the candidate virtual machine set is not found, return to step S61; otherwise, return directly to step S61. S64. Judgment Task v i Is the candidate virtual machine set empty? If so, then task v i The virtual machine containing the lowest cost of consumption is added to task v i The candidate virtual machine set is determined, and then step S65 is executed; otherwise, step S65 is executed directly. S65. According to task v i From the candidate virtual machine set, select task v i The first candidate set consists of the virtual machines corresponding to the minimum and earliest completion times before copying the critical parent task, task v. i The virtual machines corresponding to the minimum and earliest completion times after copying the critical parent task form a second candidate set; then, task v is judged. i If the minimum and earliest completion time before copying the critical parent task is less than the minimum and earliest completion time after copying the critical parent task, proceed to step S66; otherwise, determine if the minimum and earliest completion time before copying the critical parent task is equal to the minimum and earliest completion time after copying the critical parent task. If so, proceed to step S67; otherwise, proceed directly to step S68. S66. In task v i Select the latest idle virtual machine from the first candidate set, and assign task v i The virtual machine is assigned to it; then proceed to step S69. S67. Judgment Task v i Does the number of subtasks in task v exceed 1? If so, then in task v... i From the second candidate set, select the virtual machine that is most recently idle, and assign task v to it. i The critical parent task is copied to this virtual machine, and task v is... i Assign it to the virtual machine, then jump to step S69; otherwise, proceed to task v. i Select the latest idle virtual machine from the first candidate set, and assign task v i Assign it to the virtual machine, then proceed to step S69; S68. In task v i Randomly select a virtual machine from the second candidate set and assign task v i The critical parent task is copied to this virtual machine, and task v is... i Assign it to the virtual machine; then proceed to step S69. S69. Determine whether the virtual machine has been used. If so, do nothing; otherwise, mark the virtual machine as used. S7. Repeat steps S5-S6 until the task scheduling of the entire workflow is completed.
2. The cost-based workflow scheduling method in an IaaS environment according to claim 1, characterized in that, Step S3 performs a single parent-child task pair merging process on the directed acyclic graph D to obtain a new directed acyclic graph D', including: S31. Traverse the directed acyclic graph D to obtain all relationships between tasks; for each task, the task connected to it is its parent task, and the task connected to it is its child task. S32. If task v i There is only one parent task v j And task v j There is only one subtask v i Then task v is called task v i With task v j For a single parent-child pair, task v i With task v j Merge into one task , is represented as: in, Indicates task v i The parent task set, Indicates task v j A set of subtasks; S33. Merge each single parent-child pair in the directed acyclic graph D into a new task, and finally obtain a new directed acyclic graph D'.
3. The cost-based workflow scheduling method in an IaaS environment according to claim 1, characterized in that, Calculate the priority of all tasks in the new directed acyclic graph D', and sort the tasks according to their priorities to obtain a sorted list, including: S41. Calculate the priority score of each task in the new directed acyclic graph D' using the upward sorting method, expressed as: S42. Set tasks with the same priority score, and calculate the priority score for each task, expressed as: S43. Determine the priority of each task based on the priority score and priority degree score, and sort them in descending order of priority to obtain a sorted list, represented as follows: in, This indicates that task v was obtained using the upward sorting method. i Priority score, Indicates task v i Average execution time across all virtual machines; Indicates connection task v i With task v a The edge e ia Average communication time on all types of network links in the network; Indicates task v i A set of subtasks; Indicates task v i Priority score; Indicates task v i With task v a Data communication time; This represents the set of tasks with the same priority score obtained by using the upward sorting method; Indicates task v i Priority; V represents the set of nodes.
4. The cost-based workflow scheduling method in an IaaS environment according to claim 1, characterized in that, Step S5 obtains the current allowed budget for the task based on the dynamic budget allocation model, expressed as: in, Indicates the scheduling task v i The current allowable budget at that time This indicates the cost that has already been used. Indicates task v i The pre-allocated budget; This represents the total cost constraint set by the user.
5. A cost-based workflow scheduling method in an IaaS environment according to claim 4, characterized in that, Task v i Pre-allocated budget The calculation formula is: in, Indicates task v i Average execution time across all virtual machines; Indicates task v i The set of parent tasks; For the set of tasks that have not yet been scheduled, Indicates task With the task Data communication time between them Indicates task v u Average execution time across all virtual machines; Indicates task v k With task v u Data communication time between them.
6. The cost-based workflow scheduling method in an IaaS environment according to claim 1, characterized in that, Step S62 Calculate task v i The formula for calculating the earliest completion time before and after copying the critical parent task on the current virtual machine is as follows: in, Indicates task v i In virtual machine The earliest completion time before copying the critical parent task. Indicates task v i In virtual machine The earliest completion time after copying the critical parent task. Indicates task v i In virtual machine Execution time; Indicates task v i In virtual machine The earliest start time before copying the critical parent task; Indicates task v i In virtual machine The earliest start time after copying the critical parent task is calculated using the following formula: in, Indicates task v i The key parent task, i.e., task v i The parent task that finishes last among all parent tasks. Indicates task v i The critical parent task in the virtual machine The earliest start time before copying the critical parent task; Indicates task v i The critical parent task in the virtual machine Execution time; Indicates task v j Not in virtual machine superior, Indicates task v j With task v i Data communication time between them.
7. The cost-based workflow scheduling method in an IaaS environment according to claim 1, characterized in that, Computation task v i Cost of pre-allocated to the m-th virtual machine The formula is: in, This represents a unit of time, where x represents the number of virtual machines currently in use. This represents the unit-time rental price of the resources of the j-th virtual machine; Represents the j-th virtual machine The calculation formula is as follows: (This formula is used to calculate the total usage time from the start of use to the present.) Among them, v to Represents virtual machine The first task executed on v from Represents virtual machine The last task executed on the platform; Indicates task v to In virtual machine The earliest completion time before copying the critical parent task. Indicates task v from In virtual machine The earliest start time before copying the critical parent task.