A multi-device cooperative task execution method, device and readable storage medium
By deploying large models in shards across multiple edge computing devices and optimizing task allocation using Monte Carlo tree search and load-aware strategies, the problem of limited resources on edge computing devices is solved, achieving efficient and high-precision data processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JIANGNAN UNIV
- Filing Date
- 2025-06-23
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, when large models are deployed on edge computing devices, it is impossible to balance data processing accuracy and efficiency, resulting in tasks that cannot be executed efficiently and with high precision.
The large model is divided into multiple model shards and executed collaboratively on multiple edge computing devices. The task allocation is optimized by using the Monte Carlo tree search algorithm and the load-aware upper confidence bound strategy, thereby achieving load balancing and parallel processing of model shards across different devices.
It improves the output accuracy and efficiency of data detection and identification tasks, solves the problem of limited resources of a single edge computing device, and realizes efficient parallel data processing.
Smart Images

Figure CN120745816B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of edge inference technology, and in particular to a method, apparatus and computer-readable storage medium for multi-device collaborative task execution. Background Technology
[0002] With the rapid development of IoT devices and deep learning technology, deep learning models, through the powerful representation capabilities of multi-layer neural networks, are widely used in computer vision, natural language processing, intelligent control, and other scenarios. Examples include accurate lesion identification in medical images, real-time analysis of complex road conditions by autonomous vehicles, intelligent recommendations on e-commerce platforms, and semantic understanding of user commands by smart home devices. Among various deep learning models, the Transformer architecture based on self-attention mechanisms can significantly improve the model's learning ability on input data when processing large-scale data. For example, large models such as the GPT series, BERT, and T5, with their excellent contextual understanding and generation capabilities, can output results that are more consistent with actual data in applications such as object recognition, image processing, and intelligent question answering. Therefore, in high-precision and high-timeliness applications such as production equipment status and data monitoring in the Industrial Internet of Things (IIoT) and intelligent driving vehicle control, large models are often deployed on edge computing devices. By inputting the text or image data to be processed into the edge computing device, the data processing capabilities of the large model and the efficient data processing characteristics of edge computing can be combined to achieve low-latency data processing and decision-making loops.
[0003] However, edge computing devices are limited by bottlenecks in computing, storage, and memory. For example, a typical edge computing device like the Raspberry Pi 5 has only 8GB of memory. In contrast, the parameter size of large models has exploded in recent years. For instance, the GPT-3 model contains 175 billion trainable parameters, while the parameter size of the GPT-4 model is estimated to be in the trillions, almost dozens of times that of GPT-3. Especially when large models are applied to scenarios with multi-source heterogeneous data characteristics, such as industrial IoT data monitoring, intelligent traffic control, and power grid energy dispatch, the data processing volume of large models is greatly increased. This means that the memory usage of mainstream large models often reaches tens or even hundreds of GB, far exceeding the memory capacity of general edge computing devices, making loading and running large models on these edge computing devices a daunting task.
[0004] To address this issue, two methods have been proposed in existing technologies: one is to reduce the computational load, storage requirements, and model complexity of large models through model optimization techniques. This includes model pruning, compression, and quantization. By removing some neurons, connections, or hierarchical structures from the large model, or encoding or compressing model parameters into a smaller form, or converting floating-point parameters in the model into low-precision integer values, the optimized large model can be deployed on edge computing devices. However, the simplification of the model structure inevitably reduces the model's ability to learn from input data. This also leads to the optimized large model being unable to fully extract the feature information of the input data when processing input information such as images and text, resulting in low-precision processing results. Another approach is to split a large model into multiple model fragments, with each fragment undertaking a portion of the computational tasks. These fragments are then deployed on edge computing devices. Because the model's processing of input data is divided into serial, small-scale data processing processes, the edge computing device only needs to load and process the parameters of a single model fragment in a single data processing cycle. This reduces the computational load per cycle for the edge computing model and avoids the resource consumption caused by the complex network structure in the large model processing input data simultaneously. However, this time-for-space strategy reduces the data processing rate and cannot output data processing results in a timely manner, making it impossible to meet the high timeliness requirements for data processing in scenarios such as intelligent traffic control and production data monitoring.
[0005] In summary, existing technologies using large models to process input data such as images and text cannot balance data processing accuracy and efficiency, thus making it impossible to perform data detection and recognition tasks based on large models efficiently and with high accuracy. Summary of the Invention
[0006] Therefore, the technical problem to be solved by the present invention is to overcome the problem that when using large models to process input data such as images and text, the data processing accuracy and efficiency cannot be balanced, thus making it impossible to perform data detection, recognition and other tasks based on large models efficiently and with high accuracy.
[0007] To address the aforementioned technical problems, this invention provides a method for multi-device collaborative task execution, comprising:
[0008] S10: Build a large model based on the tasks to be executed, and divide the large model into M model fragments for sequential processing of the tasks to be executed;
[0009] S20: Load and compute each model fragment on each interconnected edge computing device, obtain the loading time, computation time and communication latency of each model fragment on each edge computing device, and thus construct the simulation model of the task to be executed;
[0010] S30: Using the Monte Carlo tree search algorithm and the load-aware upper confidence bound strategy, based on the execution order of model shards and the cumulative reward value of the task allocation strategy in the first n-1 iterations, the edge computing devices are searched iteratively for M model shards to obtain the task allocation strategy for the nth iteration; where n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value.
[0011] S40: Use the simulation model of the task to be executed to perform simulation based on the task allocation strategy of the nth iteration, obtain the reward value of the task allocation strategy of the nth iteration, update n=n+1 and return to the execution step S30 until the preset iteration convergence condition is reached, and obtain the target task allocation strategy.
[0012] S50: Based on the target task allocation strategy, M model shards are deployed on corresponding edge computing devices, thereby utilizing the collaborative model shards on multiple edge computing devices to achieve the task to be executed.
[0013] Preferably, when the task to be performed is an intelligent question-answering task, the large model is an intelligent question-answering robot model, the input of the large model is the query question text, and the output of the large model is the answer text;
[0014] When the task to be performed is an image classification task, the large model is an image classification model. The input of the large model is the image to be classified, and the output of the large model is the image category prediction probability.
[0015] Preferably, step S30 includes:
[0016] S300: Initialize m=1;
[0017] S301: Using the load-aware upper confidence limit strategy, based on the cumulative reward value of each edge computing device under the task allocation strategy of the first n-1 iterations, calculate the priority of allocating the m-th model shard to each edge computing device, and thus allocate the m-th model shard.
[0018] S302: Update m=m+1 and return to execute step S301 until, based on the allocation results of the first m-1 model fragments, there is an edge computing device that has not been selected as the allocation device for the m-th model fragment in the first n-1 iterations, then the m-th model fragment is allocated to that edge computing device.
[0019] S303: Randomly allocate edge computing devices to the (m+1)th to Mth model segments, thereby obtaining the task allocation strategy for the nth iteration based on the allocation results of the 1st to Mth model segments.
[0020] Preferably, step S301 includes:
[0021] Input the number of times each edge computing device was selected, the cumulative reward value, and the current load of each edge computing device under the task allocation strategy of the first n-1 iterations into the load-aware upper confidence boundary calculation formula to obtain the priority of allocating the m-th model shard to each edge computing device.
[0022] The m-th model fragment is assigned to the highest priority edge computing device.
[0023] Preferably, the formula for calculating the upper confidence limit of load sensing is expressed as follows:
[0024] ,
[0025] in, This represents the formula for calculating the confidence threshold for load perception. This represents the task allocation strategy in the first n-1 iterations. The cumulative reward value of each edge computing device; This represents the task allocation strategy in the first n-1 iterations. The number of times an edge computing device is selected; This indicates that under the task allocation strategy in the first n-1 iterations, when the th... The total number of times a parent node is selected when an edge computing device is selected as a child node; This indicates that the m-th model piece is assigned to the m-th model piece. One edge computing device; This indicates that the m-th model piece is assigned to the m-th model piece. After the first edge computing device The load of an edge computing device; , This represents the preset hyperparameters of positive constants.
[0026] Preferably, step S40 includes:
[0027] The actual loading time, actual computing time, and actual communication latency of each edge computing device are simulated using a task simulation model to simulate the task allocation strategy in the nth iteration.
[0028] The actual loading time, actual computing time and actual communication latency of each edge computing device are converted to obtain the loading reward, computing reward and communication latency reward of each edge computing device.
[0029] The reward value of each edge computing device under the task allocation strategy in the nth iteration is obtained by weighted summing of the loading reward, computing reward and communication latency reward of each edge computing device.
[0030] Preferably, the formula for calculating the reward value of each edge computing device under the task allocation strategy in the nth iteration is expressed as:
[0031] ,
[0032] in, Indicates the first The reward value for each edge computing device; A function representing the conversion from time to reward; Indicates the first The actual loading time of each edge computing device; express The weights; Indicates the first The actual computing time of each edge computing device; express The weights; Indicates the first The actual communication latency of an edge computing device; express The weight.
[0033] Preferably, the method further includes the following steps before step S30:
[0034] Based on the preset loading time and preset computation time of each model segment, as well as the loading time and computation time of each edge computing device for each model segment, a set of available edge computing devices for each model segment is constructed so that edge computing devices can be iteratively searched for for M model segments in the set of available edge computing devices for each model segment.
[0035] The present invention also provides a multi-device collaborative task execution device, comprising:
[0036] The model building and sharding module is used to build a large model based on the task to be executed, and divide the large model into M model shards for sequential processing of the task to be executed.
[0037] The modeling and analysis module is used to load and compute each model fragment on each interconnected edge computing device, obtain the loading time, computing time and communication latency of each model fragment on each edge computing device, and thus construct the simulation model of the task to be executed.
[0038] The model allocation module is used to perform iterative search for edge computing devices for M model fragments based on the execution order of model fragments and the cumulative reward value of the task allocation strategy in the first n-1 iterations, using the Monte Carlo tree search algorithm and the load-aware upper confidence boundary strategy. Here, n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value.
[0039] The environment simulation module is used to simulate the task allocation strategy based on the task simulation model to be executed in the nth iteration, obtain the reward value of the task allocation strategy in the nth iteration, update n=n+1 and return to the steps executed by the execution iteration allocation module until the preset iteration convergence condition is reached and the target task allocation strategy is obtained.
[0040] The collaborative task execution module is used to deploy M model shards on corresponding edge computing devices based on the target task allocation strategy, thereby utilizing the collaborative model shards on multiple edge computing devices to complete the task to be executed.
[0041] The present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the above-described multi-device collaborative task execution method.
[0042] The multi-device collaborative task execution method provided in this application has the following beneficial effects:
[0043] This application considers distributing multiple model fragments of a large model across multiple edge computing devices. Data exchange and processing are completed in parallel through collaborative communication between these edge computing devices, alleviating the problem of a single edge computing device being unable to handle a large model due to resource constraints. Specifically, a large model is first constructed based on the task to be executed (i.e., feature extraction and detection of input data such as images and text). Model fragmentation technology is then used to divide the large model into multiple model fragments, each of which may contain one or more basic units. Pre-evaluation is performed on each edge computing device participating in the collaborative task execution and all model fragments, quantifying the loading time and computation time of each model fragment, as well as the communication latency between two edge computing devices. Based on the performance parameters of each edge computing device, a simulation model is constructed, and this simulation model is then used to refine the simulation. This paper accurately simulates the actual loading, computation, and communication latency during multi-device collaborative task execution under different task allocation strategies. Furthermore, since there are usually data dependencies between model fragments, i.e., the operation process of each model fragment has obvious temporal characteristics, the model fragment allocation problem can be transformed into a temporal decision problem. Considering that each model fragment can be deployed on multiple edge computing devices, the allocation decision space of the model fragment can be abstracted into a multi-branch tree structure. Based on this characteristic, this application chooses to use the Monte Carlo tree search algorithm to allocate multiple model fragments. On this basis, a load-aware upper confidence boundary node selection strategy is used to evaluate the priority of each edge computing device during the allocation process, thereby achieving load balancing among multiple edge computing devices and avoiding the problem of some edge computing devices being overloaded. The solution provided in this application divides the complete data processing process into multiple small-scale data processing tasks. By distributing these small-scale data processing tasks to different edge computing devices, the inherent parallelism of distributed deployment is utilized to achieve a high degree of parallelism in data processing. This not only solves the problem that a single edge computing device cannot support a large model due to limited resources, but also preserves the structure and parameter processing performance of the large model, thereby improving the output accuracy and efficiency of data detection, recognition and other tasks based on the large model. Attached Figure Description
[0044] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings, wherein:
[0045] Figure 1 A comparison diagram of the tasks performed by a single device and a network of devices provided for this application;
[0046] Figure 2 The flowchart of the multi-device collaborative task execution method provided in this application;
[0047] Figure 3This is a schematic diagram illustrating the principle of multi-device collaborative task execution provided in this application;
[0048] Figure 4 A schematic diagram of the lightweight simulation framework for deployment at the edge of a large model provided in this application;
[0049] Figure 5 The diagram illustrates the task allocation principle based on the Monte Carlo tree search algorithm provided in this application. Detailed Implementation
[0050] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. However, the embodiments described are not intended to limit the present invention.
[0051] Please see Figure 1 , Figure 1 The diagram shows a comparison of the execution of tasks by a single device and a networked device (multi-device collaboration) provided in this application. Since the large model used to perform the task often needs to process the input data in multiple stages, such as the large model used for face recognition, it first needs to preprocess the input face image, then extract features from the preprocessed image, and finally perform recognition and classification based on the extracted features to output the results. If the large model used to perform the task is deployed in a single device, the device needs to load and run the model architecture corresponding to the current step when executing each step of the model. This sequential execution method results in a long time for a single task and high performance requirements for device resources and running speed.
[0052] Current mainstream large-scale models generally adopt a modular and hierarchical structure, typically composed of a series of sequentially stacked basic units. These units are considered relatively independent computational units, responsible for performing specific tasks such as feature extraction and information processing. If a large model is divided into multiple model fragments using model fragmentation techniques, a model fragment may contain one or more basic units, and there are usually data dependencies between model fragments. This makes the model's operation process exhibit obvious temporal characteristics. By deploying model fragments on different edge computing devices, Figure 1Taking the dual-device collaborative scenario shown as an example, in the initial stage, device A and device B can load their respective first model fragments (e.g., model fragment 1 and model fragment 2) in parallel. After device A completes the calculation of model fragment 1, it immediately loads model fragment 3. At the same time, device B can calculate model fragment 2 based on the calculation result of model fragment 1 output by device A. This process is repeated to execute a single task in a pipeline manner. Compared with the traditional single-device task execution method, this multi-device collaborative task execution method significantly reduces the task execution time. Moreover, since the model is distributed and loaded on different edge computing devices, it significantly alleviates the problem that a single device cannot support a large model due to limited resources.
[0053] Based on the above principles, this application provides a method for multi-device collaborative task execution, such as... Figure 2 As shown, the method specifically includes:
[0054] S10: Build a large model based on the tasks to be executed, and divide the large model into M model fragments for sequential processing of the tasks to be executed.
[0055] S20: Load and compute each model fragment on each interconnected edge computing device, obtain the loading time, computation time and communication latency of each model fragment on each edge computing device, and thus construct the simulation model of the task to be executed.
[0056] S30: Using the Monte Carlo tree search algorithm and the load-aware upper confidence bound strategy, based on the execution order of model shards and the cumulative reward value of the task allocation strategy in the first n-1 iterations, the edge computing devices are searched iteratively for M model shards in sequence to obtain the task allocation strategy for the nth iteration; where n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value.
[0057] S40: Use the simulation model of the task to be executed to perform simulation based on the task allocation strategy of the nth iteration, obtain the reward value of the task allocation strategy of the nth iteration, update n=n+1 and return to execute step S30 until the preset iteration convergence condition is reached, and obtain the target task allocation strategy.
[0058] S50: Based on the target task allocation strategy, M model shards are deployed on corresponding edge computing devices, thereby utilizing the collaborative model shards on multiple edge computing devices to achieve the task to be executed.
[0059] Specifically, the large model of the task to be executed is denoted as... and divide it into A sequentially executed model slice: At the same time A connected edge computing device is denoted as Due to edge computing devices Execution model sharding The process includes two parts: model piece loading and model piece computation. Therefore, the loading time and computation time are denoted as follows: and Meanwhile, considering the communication latency that occurs when the edge computing device transmits the calculation results of the current model fragment to the edge computing device where the next model fragment is located, therefore, [the following is used]. Indicates edge device and Communication delay between them, of which Given the limited resources of edge computing devices, a single edge computing device can only perform one task at any given time: loading or computing a model fragment.
[0060] like Figure 3 The diagram shown illustrates the principle of multi-device collaborative task execution provided in this application. It can be seen that this application coordinates the memory, computing power, and communication resources of multiple edge computing devices, constructing a "loading-waiting-computing-communication" workflow at the single edge computing device level from a global perspective of "multi-model sharding - multi-device," optimizing the overall inference process, thereby significantly reducing end-to-end inference latency and effectively controlling peak memory usage. Specifically, by collecting key parameters of the current edge environment, a lightweight simulation environment is constructed. Based on this simulation environment, the task allocation phase is explored and searched to quickly find a model sharding allocation scheme, which is then applied to the actual deployment environment to drive the collaborative task execution of multiple edge computing devices.
[0061] Specifically, in different application scenarios, the tasks to be executed in step S10 are different, and the large model constructed is also different. When the task to be executed is an intelligent question answering task, the large model is an intelligent question answering robot model. The input of the large model is the query question text, and the output of the large model is the answer text.
[0062] When the task to be performed is an image classification task, the large model is an image classification model, which can be one of Vision Transformer, MSG-Transformer, or Pyramid Vision Transformer. The input of the large model is the image to be classified, and the output of the large model is the image category prediction probability.
[0063] For example, in large-scale autonomous driving control scenarios, it is necessary to identify and detect a large number of high-definition images, and then control the control commands of each autonomous vehicle based on the identification and detection results. At this time, the large model architecture is complex, with many parameters, and requires a large amount of memory resources. Therefore, the computing power of a single edge computing device is limited. By deploying multiple interconnected edge computing devices, the large model can be divided into pieces and deployed on multiple edge computing devices. The image to be detected and recognized is input into the device with the first model piece deployed, and multiple devices can then collaboratively process the image to be detected and recognized, and finally output the identification and detection results.
[0064] Furthermore, the main purpose of step S20 is to pre-evaluate each edge computing device and all model shards participating in the collaborative task execution, quantify the loading time and computation time of each model shard, as well as the communication latency between two edge computing devices, so that a simulation model can be built based on the performance parameters of each edge computing device. Using this simulation model, the actual loading, computation and communication latency in the multi-device collaborative task execution process under different task allocation strategies can be accurately simulated, and effective simulation of diverse and heterogeneous large model edge deployment scenarios can be achieved.
[0065] In addition, not all edge computing devices have the ability to successfully load and run each model shard. Therefore, step S20 can also assess the feasibility of deploying each model shard on each edge computing device based on the memory usage of each model shard.
[0066] like Figure 4 The diagram shows a lightweight simulation framework for edge deployment of large models provided in this application. In this framework, the model is the large model of the task to be executed, the slice is the basic computing unit or layer for building the large model, the device represents the edge computing device used to deploy each slice, and the environment represents the global manager of the entire simulation scene. The simulation process is driven by time-series multi-device collaborative task execution based on the task allocation strategy.
[0067] Furthermore, since there are usually data dependencies between model shards, meaning the execution process of each model shard has a clear temporal sequence, the model shard allocation problem can be transformed into a temporal decision problem, such as... Figure 5 As shown, given that each model shard can be deployed on multiple potential edge computing devices, the allocation decision space of the model shard can be abstracted as a multi-branch tree structure.
[0068] Monte Carlo Tree Search (MCTS) is a widely used search algorithm in decision-making processes. It cleverly combines the ideas of Monte Carlo simulation and tree search, making it particularly suitable for decision problems with incomplete information, large state spaces, or high complexity. Its core idea is to iteratively evaluate the potential value of the current decision based on simulating potential future behaviors and outcomes. This method allows for efficient exploration and evaluation of each branch of the decision tree without exhaustive search. Therefore, MCTS demonstrates significant efficiency and speed in environments with limited computational resources and time. Furthermore, given the search space characteristics and complexity of the model partitioning assignment problem, this application uses the MCTS algorithm to solve for the allocation strategy of multiple model partitions.
[0069] The Monte Carlo Tree Search algorithm comprises four main steps: selection, expansion, simulation, and backtracking. The selection step starts from the root node of the search tree and traverses downwards according to a specific strategy until a node that has not yet been fully explored is reached. The expansion step involves adding one or more new child nodes below the selected node if it corresponds to an unexplored state (i.e., there are unevaluated legal actions). The simulation step involves performing a complete Monte Carlo simulation starting from the newly expanded node using a random strategy to evaluate the node's potential value. The final backtracking step involves backpropagating the reward value obtained after each simulation along all nodes on the path from the root node to the final state of the simulation, updating the visit count of each node on the path. and the accumulated reward value .
[0070] By iteratively repeating the four steps of selection, expansion, simulation, and backtracking, the search tree is continuously refined and the estimation accuracy of node value is improved. Finally, the optimal path can be obtained based on the converged node statistics.
[0071] Building upon this, the node traversal strategy in the selection phase of the Monte Carlo Tree Search (MCTS) algorithm is crucial. The core task of this phase is to determine the path downwards in the search tree based on currently known information, until a node requiring further expansion or simulation is encountered. This directly determines the balance between exploration and exploitation, thus profoundly impacting the search direction and efficiency of MCTS. Over-biasing towards exploitation may lead to premature convergence to a local optimum, missing the global optimum; conversely, over-biasing towards exploration may waste excessive computational resources and time on low-value paths, reducing overall search efficiency. Therefore, designing an effective selection strategy to achieve a proper balance between exploration and exploitation is key to ensuring that Monte Carlo Tree Search is both efficient and comprehensive.
[0072] Among numerous MCTS node selection strategies, the Upper Confidence Bound (UCB) algorithm is widely used due to its effectiveness in balancing exploration and utilization. However, this application finds that directly applying the UCB algorithm to the model sharding allocation scenario in this application fails to consider load balancing among edge computing devices. This can lead to MCTS wasting significant computational and time resources exploring allocation paths with uneven load distribution and poor overall performance during the search process. To address this issue, this application proposes a novel node selection strategy based on Load Aware Upper Confidence Bound (LA-UCB).
[0073] Specifically, when evaluating the selection priority of child nodes in order to allocate edge computing devices to each model shard based on priority, LA-UCB, in addition to inheriting the original mechanism of the UCB algorithm, also introduces an evaluation item specifically for measuring load balancing.
[0074] Specifically, step S30 includes the following steps:
[0075] S300: Initialize m=1.
[0076] S301: Using the load-aware upper confidence limit strategy, based on the cumulative reward value of each edge computing device under the task allocation strategy of the first n-1 iterations, calculate the priority of allocating the m-th model shard to each edge computing device, and thus allocate the m-th model shard.
[0077] S302: Update m=m+1 and return to execute step S301 until, based on the allocation results of the first m-1 model fragments, there is an edge computing device that has not been selected as the allocation device for the m-th model fragment in the first n-1 iterations. Then, the m-th model fragment is allocated to that edge computing device.
[0078] S303: Randomly allocate edge computing devices to the (m+1)th to Mth model segments, thereby obtaining the task allocation strategy for the nth iteration based on the allocation results of the 1st to Mth model segments.
[0079] Specifically, step S301 corresponds to the selection step in the MCTS algorithm. This application uses a load-aware upper confidence boundary strategy to replace the upper confidence boundary strategy in the selection step of the traditional MCTS algorithm, so that the load balancing problem of each edge computing device can be considered when selecting edge computing devices for each model shard. Step S302 corresponds to the extension step in the MCTS algorithm, and step S303 corresponds to the simulation step in the MCTS algorithm.
[0080] Further, step S301 includes:
[0081] Input the number of times each edge computing device was selected under the task allocation strategy in the first n-1 iterations, the cumulative reward value, and the load of each edge computing device after the m-th model shard is allocated to each edge computing device into the load-aware upper confidence limit calculation formula to obtain the priority of allocating the m-th model shard to each edge computing device.
[0082] The m-th model fragment is assigned to the highest priority edge computing device.
[0083] Specifically, the formula for calculating the confidence threshold for load perception is as follows:
[0084] ,
[0085] in, This represents the formula for calculating the confidence threshold for load perception. This represents the task allocation strategy in the first n-1 iterations. The cumulative reward value of each edge computing device; This represents the task allocation strategy in the first n-1 iterations. The number of times an edge computing device is selected; This indicates that under the task allocation strategy in the first n-1 iterations, when the th... The total number of times a parent node is selected when an edge computing device is selected as a child node; This indicates that the m-th model piece is assigned to the m-th model piece. One edge computing device; This indicates that the m-th model piece is assigned to the m-th model piece. After the first edge computing device The load of an edge computing device; , This represents the preset hyperparameters of positive constants.
[0086] Furthermore, this application uses a simulation model to simulate the allocation strategy obtained from the simulation steps of the MCTS algorithm, thereby obtaining the reward value of the current allocation strategy. This reward value is then used as the reward value for the backtracking step in the MCTS algorithm. By backpropagating the reward value along the allocation results of the M model segments, the number of times each edge computing device is selected in the current allocation strategy is updated. and the accumulated reward value .
[0087] Specifically, step S40 includes:
[0088] The simulation model of the task to be executed is used to simulate the actual loading time, actual computing time and actual communication latency of each edge computing device under the task allocation strategy of the nth iteration.
[0089] The actual loading time, actual computing time, and actual communication latency of each edge computing device are converted to obtain the loading reward, computing reward, and communication latency reward for each edge computing device.
[0090] The reward value of each edge computing device under the task allocation strategy in the nth iteration is obtained by weighted summing of the loading reward, computing reward and communication latency reward of each edge computing device.
[0091] Specifically, the formula for calculating the reward value of each edge computing device under the task allocation strategy in the nth iteration is expressed as follows:
[0092] ,
[0093] in, Indicates the first The reward value for each edge computing device; A function representing the conversion from time to reward; Indicates the first The actual loading time of each edge computing device; express The weights; Indicates the first The actual computing time of each edge computing device; express The weights; Indicates the first The actual communication latency of an edge computing device; express The weight.
[0094] The following specific example further illustrates the above task allocation process:
[0095] Suppose a large model is divided into model shard 1, model shard 2, and model shard 3, which are executed sequentially. The computation of model shard 2 depends on the computation result of model shard 1, and the computation of model shard 3 depends on the computation result of model shard 2. Meanwhile, there are edge computing devices ED1, ED2, and ED3 with different attributes such as computing power and memory size. It is necessary to obtain the optimal task allocation strategy to deploy the three model shards to the three edge computing devices, so as to achieve the best output results and execution performance of multi-device collaborative task execution.
[0096] The entire task allocation process can be viewed as a search tree. The root node of the tree represents the initial state where no model shards have been allocated to edge computing devices. The first-level nodes of the tree represent allocating model shard 1 to an edge computing device, the second-level nodes represent allocating model shard 2 to an edge computing device, and the third-level nodes represent allocating model shard 3 to an edge computing device.
[0097] Step 1: Selection. Based on the load-aware upper confidence boundary strategy, select an edge computing device for model shard 1. Specifically, the LA-UCB strategy evaluates the priority of assigning model shard 1 to ED1, ED2, and ED3. It not only considers which edge computing device has performed better in the historical iterations, but also gives a certain opportunity to edge computing devices that have been selected less often. In addition, it also considers the load balancing among the devices.
[0098] Assuming ED2 is ultimately selected for model shard 1, the algorithm will start from the first-level node representing "model shard 1 is already on ED2" and continue using the LA-UCB strategy to select a device (ED1, ED2, or ED3) for model shard 2 and generate a second-level node. This process continues until a "leaf node" or a node that has not yet been fully explored is reached. For example, when assigning model shard 2, starting from the first-level node "model shard 1 is already on ED2", previous iterations have tried assigning model shard 2 to ED1 and ED2, but never to ED3. That is, the node "model shard 1 is on ED2" currently has only two child nodes, "model shard 2 is on ED1" and "model shard 2 is on ED2", which indicates that a "leaf node" or a node that has not yet been fully explored has been reached.
[0099] The second step is expansion. When the algorithm reaches a "leaf node" or a node that has not yet been fully explored, it creates a brand new child node in the tree. This node represents the decision "given that model piece 1 has already been assigned to ED2, assign model piece 2 to ED3". This is the process of adding a new branch at the end of the existing tree. This brand new child node is the starting point for the next "simulation".
[0100] Step 3: Simulation. Starting with entirely new child nodes, the long-term value of this allocation decision path needs to be quickly evaluated. Specifically, the allocation schemes for model shard 1 and model shard 2 have been determined (ED2 and ED3), leaving only model shard 3 unallocated. To quickly obtain a result, the algorithm will use a random strategy to allocate all remaining shards (in this case, only model shard 3).
[0101] Suppose the system randomly selects ED1 for model partition 3, thus obtaining a complete allocation scheme: {model partition 1 → ED2, model partition 2 → ED3, model partition 3 → ED1}. That is, it is necessary to simulate and execute this complete scheme in the simulation environment and calculate its reward value R.
[0102] Step 4: Backtracking. The reward value R obtained in the simulation step is backpropagated to update the information of all nodes on the allocation decision path. Specifically, this reward value R will be passed back to each node on the path. In this example, the information of the following three nodes will be updated: 1. Root node (initial state); 2. "Model partition 1 → ED2" node; 3. "Model partition 2 → ED3" node.
[0103] For each node on the current task allocation decision path, its visit count is incremented by 1, and the cumulative reward value is added to R obtained in this simulation. From an effectiveness standpoint, if the path {Model Shard 1→ED2, Model Shard 2→ED3, ...} ultimately yields a high reward (i.e., a short total time), then the "value" of the decisions "Model Shard 1→ED2" and "Model Shard 2→ED3" is enhanced. In future "selection" steps, based on the load-aware upper confidence boundary strategy, this potentially valuable path will be more likely to be chosen again.
[0104] Furthermore, considering the resource limitations and significant heterogeneity of edge computing devices, not all edge computing devices can meet the operational requirements (i.e., successfully load and complete computation) of each model fragment in a given large model. Therefore, ensuring the feasibility of model fragment allocation for each edge computing device is a crucial constraint that must be strictly considered during the model fragment allocation search process. To address this issue, this application applies a feasibility guarantee mechanism based on hierarchical pruning.
[0105] Specifically, before initiating the MCTS-based model shard allocation search, this application first constructs a list based on the evaluation data obtained in step S20, clearly identifying the feasibility relationships between model shards and edge computing devices. This list comprehensively records which model shards cannot be successfully deployed and executed on specific edge computing devices due to hardware resource limitations. During the MCTS search process, when the algorithm needs to allocate model shards to the currently unassigned model shards... When evaluating all possible edge computing devices, a pre-generated feasibility list is queried, and if a particular edge computing device... Marked as a model slice If this is not feasible, then the model will be split into pieces. Assigned to edge computing devices This potential decision option will be immediately considered an illegal action and removed from the current node's possible expansion options. This means that the MCTS child node representing this illegal allocation will not be generated, and its corresponding search branch will be pruned, effectively avoiding further exploration and simulation of infeasible paths. That is, only when the edge computing device... Confirmed to be part of the model Only when feasible is the allocation decision considered a legitimate action and may be incorporated into the search tree expansion and evaluation process.
[0106] Specifically, before step S30, the method further includes: constructing a set of available edge computing devices for each model segment based on the preset loading time and preset computation time of each model segment, as well as the loading time and computation time of each edge computing device for each model segment, so as to iteratively search for edge computing devices for M model segments in the set of available edge computing devices for each model segment.
[0107] To more clearly demonstrate the technical solution provided in this application, several specific embodiments are provided below to illustrate how to implement the multi-device adaptive collaborative task execution scheme (MAC-IP) based on large model edge inference provided in this application in a real environment. These embodiments demonstrate the feasibility and application effect of the method:
[0108] Embodiment 1 of this application provides a collaborative task execution method based on two edge computing devices. In this embodiment, edge computing device A and edge computing device B are two edge computing devices for collaborative task execution. Each edge computing device has limited computing and memory resources and cannot load and run the entire large model at the same time.
[0109] First, the large model is split into four fragments: model fragment 1, model fragment 2, model fragment 3, and model fragment 4. In order to achieve collaborative task execution, edge computing device A and edge computing device B communicate through the network to exchange computing results and complete the transfer of data dependencies.
[0110] The model fragment allocation strategy obtained in this embodiment is as follows: edge computing device A is responsible for loading and calculating model fragment 1 and model fragment 3, and edge computing device B is responsible for loading and calculating model fragment 2 and model fragment 4.
[0111] Based on the above allocation strategy, the final multi-device collaborative task execution process is as follows: Edge computing device A and edge computing device B load model fragment 1 and model fragment 2 in parallel, with the loading and calculation processes alternating; after loading model fragment 1, edge computing device A begins to calculate model fragment 1, while edge computing device B waits for the calculation result of model fragment 1 after loading model fragment 2; once edge computing device A completes the calculation of model fragment 1, it immediately transmits the calculation result of model fragment 1 to edge computing device B, and simultaneously begins loading model fragment 3; after receiving the calculation result of model fragment 1, edge computing device B continues to calculate model fragment 2, and after completing the calculation, transmits the calculation result of model fragment 2 to edge computing device A, and simultaneously loads model fragment 4; after receiving the calculation result of model fragment 2, edge computing device A continues to calculate model fragment 3, and after completing the calculation, transmits the calculation result of model fragment 3 to edge computing device B; after receiving the calculation result of model fragment 3, edge computing device B continues to calculate model fragment 4, and after completing the calculation, transmits the calculation result of model fragment 4 to edge computing device A.
[0112] Compared with traditional single-device task execution, the collaborative task execution method using two devices significantly improves task execution efficiency and reduces end-to-end latency. At the same time, since the task is distributed to two devices for processing, the memory usage and computing load of each device are effectively balanced, avoiding performance bottlenecks caused by resource overload of a single device.
[0113] Embodiment 2 of this application provides a multi-device collaborative task execution method. In this embodiment, edge computing devices A, B, C, and D with heterogeneous performance are used as four edge computing devices for collaborative task execution. Among them, edge computing devices A and B are high-performance devices, and C and D are low-performance devices.
[0114] First, the large model is divided into eight fragments: model fragments 1 to 8; in order to achieve collaborative task execution, edge computing devices A, B, C, and D communicate through the network to exchange computing results and complete the transfer of data dependencies.
[0115] The model fragment allocation strategy obtained in this embodiment is as follows: edge computing device A (high performance) is responsible for loading and computing model fragment 1, model fragment 2, and model fragment 3; edge computing device B (high performance) is responsible for loading and computing model fragment 4, model fragment 5, and model fragment 6; edge computing device C (low performance) is responsible for loading and computing model fragment 7; and edge computing device D (low performance) is responsible for loading and computing model fragment 8.
[0116] This allocation connects consecutive shards sequentially: the input goes through model shard 1→2→3 (edge computing device A), then through model shard 4→5→6 (edge computing device B), then through model shard 7 (edge device C) and model shard 8 (edge device D), forming a pipeline; high-performance devices handle more shards, and low-performance devices handle fewer shards, avoiding resource overload.
[0117] Based on the above allocation strategy, the final multi-device collaborative task execution process is as follows: Edge computing device A loads model fragments 1-3 into memory in parallel; edge computing device B loads model fragments 4-6 in parallel; edge computing device C loads model fragment 7; and edge computing device D loads model fragment 8. Loading can be done in parallel with the previous round of inference computation to reduce latency.
[0118] When a new input arrives, edge computing device A first performs forward computation sequentially according to model fragment 1 → model fragment 2 → model fragment 3 to obtain the third-level output. Edge computing device A transmits the third-level output to edge computing device B, and can simultaneously begin preparing to load model fragment 1 for the next input. After receiving the output, edge computing device B performs computation according to model fragment 4 → model fragment 5 → model fragment 6 to obtain the sixth-level output, and transmits it to edge computing device C. After receiving the output, edge computing device C performs computation on model fragment 7 to obtain the seventh-level output, and transmits it to edge computing device D. After receiving the output, edge computing device D performs computation on model fragment 8, outputs the final result, and returns it to the initiating end. If pipelined parallelism is used, multiple inputs can be processed concurrently among edge computing devices: while edge computing device B is processing the current input, edge computing device A can start the next input's model fragments 1-3, thereby improving throughput.
[0119] It's worth noting that during operation, MAC-IP continuously monitors the CPU / GPU utilization, memory usage, and network latency of each edge computing device. If an edge computing device experiences excessive load or network fluctuations, temporary adjustments can be made: some model shards can be reallocated to idle or high-performance edge computing devices; the number of parallel inputs can be adjusted to alleviate pressure. Through online adjustments, the system ensures continuous, efficient, and stable task execution in heterogeneous and dynamic environments.
[0120] This multi-device collaborative task execution method significantly improves task execution efficiency, especially on resource-constrained edge computing devices, where tasks can be rationally allocated to optimize resource utilization. Because each edge computing device allocates tasks based on its resource availability, the overall system performance is greatly enhanced, particularly on low-performance edge computing devices, where model loading and computation are executed efficiently, avoiding performance bottlenecks.
[0121] Based on the multi-device collaborative task execution method provided in the above embodiments, this application also provides a multi-device collaborative task execution apparatus, which specifically includes:
[0122] The model building and sharding module is used to build a large model based on the task to be executed, and divide the large model into M model shards for sequential processing of the task to be executed.
[0123] The modeling and analysis module is used to load and compute each model fragment on each interconnected edge computing device, obtain the loading time, computing time and communication latency of each model fragment on each edge computing device, and thus construct the simulation model of the task to be executed.
[0124] The model allocation module is used to perform iterative searches for edge computing devices for M model fragments based on the execution order of model fragments and the cumulative reward value of the task allocation strategy in the first n-1 iterations, using the Monte Carlo tree search algorithm and the load-aware upper confidence boundary strategy. Here, n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value.
[0125] The environment simulation module is used to simulate the task allocation strategy based on the task simulation model to be executed in the nth iteration, obtain the reward value of the task allocation strategy in the nth iteration, update n=n+1 and return to the execution steps of the execution iteration allocation module until the preset iteration convergence condition is reached to obtain the target task allocation strategy.
[0126] The collaborative task execution module is used to deploy M model shards on corresponding edge computing devices based on the target task allocation strategy, thereby utilizing the collaborative model shards on multiple edge computing devices to complete the task to be executed.
[0127] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the multi-device collaborative task execution method described above.
[0128] Furthermore, the Industrial Internet of Things (IIoT) includes numerous production devices and equipment status monitoring sensors. To promptly understand and track production status, intelligent question-answering robots are often used to parse input queries and output results, thereby enabling real-time querying of production data. Due to the heterogeneous nature of multi-source data and the sheer volume of data in the IIoT, the parameters and scale of intelligent question-answering robots are substantial. Therefore, to ensure the timeliness and accuracy of production data queries, this application provides an application of a multi-device collaborative task execution method in intelligent question answering within the IIoT, specifically including:
[0129] Step 1: Construct a large-scale intelligent question-answering robot model that realizes the query tasks of status changes of various production equipment and monitoring data of various sensors in the Industrial Internet of Things; divide the large-scale intelligent question-answering robot model into M model fragments that sequentially process the input query questions.
[0130] Step 2: Load and compute each model fragment on the interconnected edge computing devices in the Industrial Internet of Things, obtain the loading time, computing time and communication latency of each model fragment on each edge computing device, and thus construct a simulation model of the intelligent question-answering robot.
[0131] Specifically, each edge computing device is deployed on an edge computing node of the Industrial Internet of Things (IIoT) and is directly connected to the production equipment and sensor network in the IIoT.
[0132] Step 3: Using the Monte Carlo tree search algorithm and the load-aware upper confidence bound strategy, based on the execution order of the model shards and the cumulative reward value of the task allocation strategy in the first n-1 iterations, iteratively search for edge computing devices for the M model shards to obtain the task allocation strategy for the nth iteration; where n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value.
[0133] Step 4: Use the intelligent question-answering robot simulation model to simulate the input query question based on the task allocation strategy of the nth iteration, obtain the reward value of the task allocation strategy of the nth iteration, update n=n+1 and return to execute step 3, until the preset iteration convergence condition is reached, and obtain the target task allocation strategy.
[0134] Step 5: Based on the target task allocation strategy, deploy the M model shards on the corresponding edge computing devices, thereby using the model shards on multiple edge computing devices to answer the input query.
[0135] In a specific example, the task to be performed (i.e. the input query question) is the electricity consumption of production equipment O in the Industrial Internet of Things (IIoT). The intelligent question-answering robot model is divided into model segments 1 to 4. Each model segment performs the following tasks: Model segment 1 performs natural language parsing on the input query question; Model segment 2 matches production equipment O from the IIoT based on the parsed question; Model segment 3 obtains parameters such as the usage time, operating power, voltage and current of production equipment O; and Model segment 4 calculates the electricity consumption of production equipment O based on the parameters output by Model segment 3.
[0136] This industrial IoT includes edge computing device A and edge computing device B. The intelligent question-answering task execution flow obtained in this embodiment is as follows:
[0137] Edge computing device A and edge computing device B load model fragment 1 and model fragment 2 in parallel, with the loading process alternating with the computing process.
[0138] Edge computing device A starts calculating model fragment 1 immediately after loading model fragment 1, while edge computing device B waits for edge computing device A to complete its calculation and obtains the calculation result of model fragment 1 after loading model fragment 2.
[0139] After edge computing device A completes the calculation of model fragment 1, it transmits the calculation result to edge computing device B through the network and starts loading model fragment 3 at the same time.
[0140] Edge computing device B begins to use the calculation results of model fragment 1 transmitted from edge computing device A to perform calculations for model fragment 2.
[0141] After edge computing device B completes the calculation of model fragment 2, it transmits the calculation result to edge computing device A and loads model fragment 4 for calculation.
[0142] After receiving the calculation result of model fragment 2, edge computing device A continues to calculate model fragment 3 and prepares to transmit its calculation result to edge computing device B.
[0143] After edge computing device A completes the calculation of model fragment 3, it transmits the calculation result to edge computing device B. Edge computing device B continues to process model fragment 4 and transmits the final output result back to edge computing device A for further processing or feedback to the user.
[0144] By distributing the large-scale intelligent question-answering robot model across multiple edge computing devices, not only is the response speed of intelligent question answering improved, but it can also handle larger-scale data, ensuring that the large-scale intelligent question-answering robot model can acquire massive amounts of data from production equipment and sensors in real time and provide timely feedback. Furthermore, collaborative inference across multiple edge computing devices can ensure effective management of the memory and computing resources of each edge computing device through load balancing, thereby avoiding system crashes or response failures due to resource overload.
[0145] This application utilizes distributed inference and dynamic task allocation to fragment and distribute the computational tasks of large models across multiple edge computing devices. This achieves a high degree of parallelism in model loading, computation, and communication, significantly improving inference speed and reducing latency between devices through pipelined parallelism. This substantially reduces end-to-end inference latency and fully leverages the potential of edge computing environments. Furthermore, by splitting large models into multiple sub-tasks and assigning them to different devices for processing, this approach effectively alleviates resource bottlenecks on single devices, enabling edge computing devices to collaboratively complete large model inference tasks without exceeding resource limitations. In addition, this technical solution not only enables efficient deployment of large model inference in existing edge computing environments but also possesses high scalability. As more edge devices are added, MAC-IP can flexibly adjust inference strategies based on device heterogeneity and load conditions, further enhancing system computing power and inference efficiency to adapt to application needs of different scales.
[0146] In addition, when allocating multiple model shards, the use of a load-aware upper confidence bound (LA-UCB) strategy can effectively balance the computational load among devices, avoid overloading of devices, improve the overall performance of the system, ensure that the computing power and load of each device are fully considered when allocating tasks, and optimize resource utilization.
[0147] In summary, the MAC-IP technology solution provided in this application demonstrates significant advantages in improving the efficiency of large-model edge inference, reducing resource consumption, and enhancing system stability and scalability. It fills the gap in existing technologies for deploying large models in edge computing environments and exhibits strong technological advancement. Furthermore, experimental data from simulation and real-world environments verify that the solution provided in this application can achieve a data processing speedup of 1.81 to 2.53 times for large models, effectively reducing peak memory usage and achieving a memory compression ratio of 1.34 to 1.79 times, greatly alleviating the memory pressure on resource-constrained edge computing devices.
[0148] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0149] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0150] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0151] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0152] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.
Claims
1. A method for multi-device collaborative task execution, characterized in that, include: S10: Build a large model based on the tasks to be executed, and divide the large model into M model fragments for sequential processing of the tasks to be executed; S20: Load and compute each model fragment on each interconnected edge computing device, obtain the loading time, computation time and communication latency of each model fragment on each edge computing device, and thus construct the simulation model of the task to be executed; S30: Utilizing the Monte Carlo tree search algorithm and load-aware upper confidence bound strategy, based on the execution order of model shards and the cumulative reward value of the task allocation strategy in the first n-1 iterations, iteratively search for edge computing devices for M model shards to obtain the task allocation strategy for the nth iteration; where n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value; specifically including: S300: Initialize m=1; S301: Utilizing a load-aware upper confidence bound strategy, based on the cumulative reward values of each edge computing device under the task allocation strategy of the first n-1 iterations, calculate the priority for allocating the m-th model shard to each edge computing device, thereby allocating the m-th model shard. Specifically, this includes: inputting the number of times each edge computing device was selected under the task allocation strategy of the first n-1 iterations, its cumulative reward value, and the load of each edge computing device after allocating the m-th model shard to each edge computing device into the load-aware upper confidence bound calculation formula to obtain the priority for allocating the m-th model shard to each edge computing device; and allocating the m-th model shard to the edge computing device with the highest priority. The load-aware upper confidence bound calculation formula is expressed as follows: , in, This represents the formula for calculating the confidence threshold for load perception. This represents the task allocation strategy in the first n-1 iterations. The cumulative reward value of each edge computing device; This represents the task allocation strategy in the first n-1 iterations. The number of times an edge computing device is selected; This indicates that under the task allocation strategy in the first n-1 iterations, when the th... The total number of times a parent node is selected when an edge computing device is selected as a child node; This indicates that the m-th model piece is assigned to the m-th model piece. One edge computing device; This indicates that the m-th model piece is assigned to the m-th model piece. After the first edge computing device The load of an edge computing device; , This represents the preset hyperparameters of positive constants; S302: Update m=m+1 and return to execute step S301 until, based on the allocation results of the first m-1 model fragments, there is an edge computing device that has not been selected as the allocation device for the m-th model fragment in the first n-1 iterations, then the m-th model fragment is allocated to that edge computing device. S303: Randomly allocate edge computing devices to the (m+1)th to Mth model segments, thereby obtaining the task allocation strategy for the nth iteration based on the allocation results of the 1st to Mth model segments; S40: Use the simulation model of the task to be executed to perform simulation based on the task allocation strategy of the nth iteration, obtain the reward value of the task allocation strategy of the nth iteration, update n=n+1 and return to the execution step S30 until the preset iteration convergence condition is reached, and obtain the target task allocation strategy. S50: Based on the target task allocation strategy, M model shards are deployed on corresponding edge computing devices, thereby utilizing the collaborative model shards on multiple edge computing devices to achieve the task to be executed.
2. The multi-device collaborative task execution method according to claim 1, characterized in that, When the task to be performed is an intelligent question answering task, the large model is an intelligent question answering robot model. The input of the large model is the query question text, and the output of the large model is the answer text. When the task to be performed is an image classification task, the large model is an image classification model. The input of the large model is the image to be classified, and the output of the large model is the image category prediction probability.
3. The multi-device collaborative task execution method according to claim 1, characterized in that, Step S40 includes: The actual loading time, actual computing time, and actual communication latency of each edge computing device are simulated using a task simulation model to simulate the task allocation strategy in the nth iteration. The actual loading time, actual computing time and actual communication latency of each edge computing device are converted to obtain the loading reward, computing reward and communication latency reward of each edge computing device. The reward value of each edge computing device under the task allocation strategy in the nth iteration is obtained by weighted summing of the loading reward, computing reward and communication latency reward of each edge computing device.
4. The multi-device collaborative task execution method according to claim 3, characterized in that, The formula for calculating the reward value of each edge computing device under the task allocation strategy in the nth iteration is expressed as follows: , in, Indicates the first The reward value for each edge computing device; A function representing the conversion from time to reward; Indicates the first The actual loading time of each edge computing device; express The weights; Indicates the first The actual computing time of each edge computing device; express The weights; Indicates the first The actual communication latency of an edge computing device; express The weight.
5. The multi-device collaborative task execution method according to claim 1, characterized in that, Step S30 also includes: Based on the preset loading time and preset computation time of each model segment, as well as the loading time and computation time of each edge computing device for each model segment, a set of available edge computing devices for each model segment is constructed so that edge computing devices can be iteratively searched for for M model segments in the set of available edge computing devices for each model segment.
6. A multi-device collaborative task execution device, characterized in that, The apparatus is used to implement the multi-device collaborative task execution method according to any one of claims 1 to 5, comprising: The model building and sharding module is used to build a large model based on the task to be executed, and divide the large model into M model shards for sequential processing of the task to be executed. The modeling and analysis module is used to load and compute each model fragment on each interconnected edge computing device, obtain the loading time, computing time and communication latency of each model fragment on each edge computing device, and thus construct the simulation model of the task to be executed. The model allocation module is used to perform iterative search for edge computing devices for M model fragments based on the execution order of model fragments and the cumulative reward value of the task allocation strategy in the first n-1 iterations, using the Monte Carlo tree search algorithm and the load-aware upper confidence boundary strategy. Here, n≥1, and when n=1, the reward value of the task allocation strategy in the n-1th iteration is a preset value. The environment simulation module is used to simulate the task allocation strategy based on the task simulation model to be executed in the nth iteration, obtain the reward value of the task allocation strategy in the nth iteration, update n=n+1 and return to the steps executed by the execution iteration allocation module until the preset iteration convergence condition is reached and the target task allocation strategy is obtained. The collaborative task execution module is used to deploy M model shards on corresponding edge computing devices based on the target task allocation strategy, thereby utilizing the collaborative model shards on multiple edge computing devices to complete the task to be executed.
7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, which, when executed by a processor, implements the steps of the multi-device collaborative task execution method according to any one of claims 1 to 5.