Task processing method and apparatus, agent, device, medium, and program product

By dynamically adjusting the loading order and storage location of operator weights based on hardware resources and task attribute information, the problem of matching hardware resources with complex AI-generated content tasks is solved, achieving efficient and flexible task processing.

CN122309168APending Publication Date: 2026-06-30BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Filing Date
2026-04-03
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, hardware resources are insufficient to handle complex, large-scale AI-generated content tasks, resulting in excessively long inference times and increased hardware investment costs.

Method used

By determining the loading method of the operator weight set based on hardware and task attribute information, and using partial loading and multi-stream parallelism, the loading order and storage location of the operator weights are dynamically adjusted to adapt to hardware resources.

Benefits of technology

It improves the flexibility and efficiency of task processing, makes reasonable use of hardware resources, and reduces processing latency and hardware costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309168A_ABST
    Figure CN122309168A_ABST
Patent Text Reader

Abstract

This disclosure discloses a task processing method, apparatus, intelligent agent, electronic device, storage medium, and program product, relating to the field of artificial intelligence technology, particularly to the fields of large models, deep learning, and image processing. The specific implementation scheme is as follows: In response to receiving a target task, based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task, loading method information is determined; if the loading method information indicates partial loading of the operator weight set, the hardware unit processes the data to be processed corresponding to the target task based on the first operator weight of the operator weight set; and the second operator weight of the operator weight set is loaded into the target memory.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, particularly to the fields of large models, deep learning, and image processing, and specifically to task processing methods, devices, intelligent agents, electronic devices, storage media, and program products. Background Technology

[0002] With the rapid iteration of artificial intelligence technology, especially the large-scale application of Artificial Intelligence Generated Content (AIGC) technology, task processing scenarios are rapidly evolving towards greater complexity, diversification, and efficiency. Hardware resources have become a new challenge and requirement for the development of current task processing technologies. Summary of the Invention

[0003] This disclosure provides a task processing method, apparatus, intelligent agent, electronic device, storage medium, and program product.

[0004] According to one aspect of this disclosure, a task processing method is provided, comprising: in response to receiving a target task, determining loading method information based on hardware attribute information of a hardware unit for processing the target task and task attribute information of the target task, wherein the loading method information indicates a method for loading an operator weight set of a target model for processing the target task into a target memory of the hardware unit; when the loading method information indicates that the operator weight set is partially loaded, processing unprocessed data corresponding to the target task using the hardware unit based on a first operator weight of the operator weight set, wherein the first operator weight has been loaded into the target memory; and loading a second operator weight of the operator weight set into the target memory, wherein the execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model.

[0005] According to another aspect of this disclosure, a task processing apparatus is provided, comprising: a loading determination module, configured to, in response to receiving a target task, determine loading method information based on hardware attribute information of a hardware unit for processing the target task and task attribute information of the target task, wherein the loading method information indicates a method for loading an operator weight set of a target model for processing the target task into a target memory of the hardware unit; a data processing module, configured to, when the loading method information indicates that the operator weight set is partially loaded, process data to be processed corresponding to the target task using the hardware unit based on a first operator weight of the operator weight set, wherein the first operator weight has been loaded into the target memory; and a weight loading module, configured to load a second operator weight of the operator weight set into the target memory, wherein the execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model.

[0006] According to another aspect of this disclosure, an intelligent agent is provided, comprising: an input module for receiving input information; a processing module for determining a target task based on the input information received by the input module, determining a large model based on the target task, and obtaining output information by calling the large model to execute the method described above; and an output module for outputting the output information obtained by the processing module.

[0007] According to another aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method described above.

[0008] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method described above.

[0009] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method described above.

[0010] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0011] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0012] Figure 1A This illustration schematically shows an exemplary system architecture to which task processing methods and apparatus can be applied according to embodiments of the present disclosure;

[0013] Figure 1B A schematic diagram of a server in a distributed cluster according to an embodiment of the present disclosure is shown.

[0014] Figure 2 A flowchart illustrating a task processing method according to an embodiment of the present disclosure is shown schematically.

[0015] Figure 3A A schematic diagram illustrating the global loading operator weights according to an embodiment of the present disclosure is shown.

[0016] Figure 3B A schematic diagram illustrating a portion of the loaded operator weights according to an embodiment of the present disclosure is shown.

[0017] Figure 4 A schematic diagram illustrating the determination of a target model according to an embodiment of the present disclosure is shown.

[0018] Figure 5 This schematic diagram illustrates the loading of the operator weight set of the target model into a preload memory according to an embodiment of the present disclosure;

[0019] Figure 6 The illustration shows a schematic diagram of a third-party function call according to an embodiment of the present disclosure;

[0020] Figure 7 A block diagram of a task processing apparatus according to an embodiment of the present disclosure is shown schematically;

[0021] Figure 8 A schematic diagram illustrating the structure of an intelligent agent according to embodiments of the present disclosure is shown; and

[0022] Figure 9 A block diagram of an electronic device suitable for implementing a task processing method according to an embodiment of the present disclosure is shown schematically. Detailed Implementation

[0023] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0024] In text-based and image-based video applications, model inference takes too long. For example, multimodal large models based on diffusion models as the inference framework have large and complex network structures, making it difficult for hardware resources to match the complex inference models, thus increasing hardware investment costs.

[0025] In view of this, the present disclosure provides a task processing method, comprising: in response to receiving a target task, determining loading method information based on hardware attribute information of a hardware unit used to process the target task and task attribute information of the target task, wherein the loading method information indicates a method for loading an operator weight set of a target model used to process the target task into a target memory of the hardware unit; when the loading method information indicates that the operator weight set is partially loaded, processing the data to be processed corresponding to the target task using the hardware unit based on the first operator weight of the operator weight set, wherein the first operator weight has been loaded into the target memory; and loading the second operator weight of the operator weight set into the target memory, wherein the execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model.

[0026] The task processing method provided in this disclosure can determine the loading method information of the operator weight set of the target model based on the task attribute information of the target task and the hardware attribute information of the hardware unit. This adapts the loading method of the operator weight set into the target memory to the hardware unit, improving the flexibility of task processing. Furthermore, in the case of partial loading, a multi-stream parallel loading method is adopted. While processing the data to be processed based on the first operator weight, the operation of loading the second operator weight is performed, thereby improving processing efficiency.

[0027] Figure 1A An exemplary system architecture for applying task processing methods and apparatus according to embodiments of this disclosure is illustrated.

[0028] It is important to note that Figure 1A The examples shown are merely examples of system architectures that can be applied to the embodiments of this disclosure, in order to help those skilled in the art understand the technical content of this disclosure, but do not mean that the embodiments of this disclosure cannot be used in other devices, systems, environments or scenarios.

[0029] like Figure 1A As shown, the system architecture 100 according to this embodiment may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 serves as a medium for providing a communication link between the terminal devices 101, 102, and 103 and the server 105. The network 104 may include various connection types, such as wired and / or wireless communication links, etc.

[0030] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as knowledge reading applications, web browser applications, search applications, instant messaging tools, email clients, and / or social platform software, etc. (for example only).

[0031] Terminal devices 101, 102, and 103 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, and desktop computers.

[0032] Server 105 can be a server that provides various services, such as a backend management server that supports the content browsed by users using terminal devices 101, 102, and 103 (for example only). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.

[0033] A server can be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system. It solves the shortcomings of traditional physical hosts and VPS services ("Virtual Private Server", or simply "VPS"), such as high management difficulty and weak business scalability. A server can also be a distributed cluster server or a server combined with blockchain.

[0034] It should be noted that the task processing method provided in the embodiments of this disclosure can generally be executed by terminal devices 101, 102, or 103. Accordingly, the task processing device provided in the embodiments of this disclosure can also be disposed in terminal devices 101, 102, or 103.

[0035] Alternatively, the task processing method provided in this embodiment can generally be executed by server 105. Correspondingly, the task processing apparatus provided in this embodiment can generally be located in server 105. The task processing method provided in this embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103 and / or server 105. Correspondingly, the task processing apparatus provided in this embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103 and / or server 105.

[0036] It should be understood that Figure 1AThe number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0037] Figure 1B A schematic diagram of a server in a distributed cluster according to an embodiment of the present disclosure is shown.

[0038] like Figure 1B As shown, server 105 can be a distributed cluster of servers, which may include intermediate server 1051 and server clusters 1052, 1053, 1054 and 1055.

[0039] Parallel inference for the target task can be performed using server clusters 1052, 1053, 1054, and 1055.

[0040] Optionally, the parallel inference strategy can be determined based on the hardware attributes of each server unit in the server cluster, such as the number of processors and the storage space configured on the processors. Examples include sequential parallelism and pipelined parallelism. This ensures that the parallel inference strategy is adapted to the hardware configuration of the server unit.

[0041] Optionally, different types of tasks can be assigned to each server cluster, based on hardware attribute information. For example, server cluster 1052, with its fewer processors, can be used to perform text generation tasks. Server cluster 1053, with its more processors and larger storage space, can be used to perform video generation tasks.

[0042] The intermediate server 1051 can receive the target task from the terminal device. Based on the task attribute information of the target task, the intermediate server determines the target server cluster from multiple server clusters to execute the target task. For example, if the target task is a text generation task, the intermediate server 1051 can assign the target task to server cluster 1052, but it is not limited to this. It can also assign the target task to multiple server clusters based on the task attribute information to perform distributed parallel processing using multiple server clusters.

[0043] In the technical solutions disclosed herein, the collection, storage, use, processing, transmission, provision, disclosure, and application of any type of information, such as user personal information, comply with the provisions of relevant laws and regulations, necessary confidentiality measures have been taken, and they do not violate public order and good morals.

[0044] In the technical solution disclosed herein, the user's authorization or consent is obtained before acquiring or collecting the user's personal information.

[0045] It should be noted that the sequence numbers of the operations in the following methods are for descriptive purposes only and should not be considered as indicating the execution order of the operations. Unless explicitly stated otherwise, the method does not need to be executed in the exact order shown.

[0046] Figure 2 A flowchart illustrating a task processing method according to an embodiment of the present disclosure is shown schematically.

[0047] like Figure 2 As shown, the method includes operations S210~S230.

[0048] In operation S210, in response to receiving the target task, loading method information is determined based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task.

[0049] In operation S220, when the loading method information indicates that the operator weight set is partially loaded, the hardware unit processes the data to be processed corresponding to the target task based on the first operator weight of the operator weight set.

[0050] In operation S230, the second operator weight of the operator weight set is loaded into the target memory.

[0051] The target task can include online human-computer interaction tasks, such as inference tasks, but is not limited to these. It can be any task whose operator weight set used to process the target model has a data volume greater than a predetermined data volume threshold. For example, a large model can be used as the target task of the target model. A large model can include a large language model (LLM) or a multimodal large language model (MLLM).

[0052] The hardware unit may include a processor, such as at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Neural Processing Unit (NPU). Taking a GPU as an example, the target memory can be the memory configured on the GPU.

[0053] The hardware attribute information of a hardware unit can include basic information, status information, and functional information. For example, basic information may include name, model, and manufacturing process. Status information may include available resources, power consumption, and latency. Functional information may include parallel processing capabilities, acceleration engines, and security features.

[0054] For example, hardware attribute information may include information such as whether parallel processing is supported and available storage space.

[0055] The target task's attribute information can include basic task information and task requirements. Basic task information may include task type, type of data to be processed, and task computational complexity. Task requirements may include real-time performance, reliability, security, and expected results.

[0056] For example, task attribute information may include the amount of data to be processed, expected results, etc.

[0057] The target model used to process the target task can include multiple operators, each of which can indicate the smallest unit of computation, such as addition, subtraction, multiplication, etc. However, it is not limited to these. It can also include operations such as convolution or softmax, as well as operations such as pooling or transpose.

[0058] Operator weights can include the parameter matrix of the function corresponding to the operator.

[0059] Taking a target model that includes convolutional layers as an example, the operators can include convolution operators, and the operator weights of the convolution operators can include a parameter matrix used to perform convolution operations on the data to be processed.

[0060] Before processing the target task using the target model, the loading method information of the operator weight set of the target model can be determined based on task attribute information and hardware attribute information. This allows the operator weights in the operator weight set to be loaded into the target memory according to the loading method information, improving the adaptability of the loading method to the hardware unit.

[0061] The recording method information can indicate how the operator weight set of the target model used to process the target task is loaded into the target memory of the hardware unit.

[0062] Specifically, the loading method information can indicate that the entire operator weight set is loaded, but it is not limited to this; it can also indicate that only a portion of the operator weight set is loaded. Full loading means that all operator weights in the target memory are loaded before the target model-based operator weight set processes the data to be processed. Partial loading means that while the first operator weight in the target model-based operator weight set processes the data to be processed, some operator weights, such as the second operator weight, are not yet loaded into the target memory.

[0063] The execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model. For example, the target model includes a stacked first convolutional layer, a second convolutional layer, and a third convolutional layer. The execution order of the first convolutional operator corresponding to the first convolutional layer is earlier than the execution order of the second convolutional operator corresponding to the second convolutional layer and the execution order of the third convolutional operator corresponding to the third convolutional layer.

[0064] A multi-stream parallel loading approach can be adopted. The weights of the first operator have already been loaded into the target memory. The data to be processed is first processed based on the first operator weights of the first operator that have already been loaded into the target memory. Asynchronously, the process of loading the second operator weights of the second operator into the target memory can be executed. This achieves parallel processing of loading and data processing.

[0065] According to embodiments of this disclosure, the loading method information of the operator weight set of the target model can be determined based on the task attribute information of the target task and the hardware attribute information of the hardware unit. This adapts the method of loading the operator weight set into the target memory to the hardware unit, improving the flexibility of task processing. Furthermore, in the case of partial loading, a multi-stream parallel loading method is adopted. While processing the data to be processed based on the first operator weight, the operation of loading the second operator weight is performed, thereby improving processing efficiency.

[0066] According to embodiments of this disclosure, for example, Figure 2 The operation S210 shown determines loading method information based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task. This may include: identifying the hardware unit based on the hardware attribute information and the task attribute information, and obtaining an identification result. If the identification result indicates that the available resources of the hardware unit are greater than the resources occupied by the target task, the loading method information indicates that the operator weight set has been loaded into the target memory before processing the data to be processed based on the operator weight set of the target model. If the identification result indicates that the available resources of the hardware unit are less than or equal to the resources occupied by the target task, the loading method information indicates that the operator weight set is partially loaded.

[0067] Available resources may include the available storage space of the target memory, and occupied resources may include the unoccupied space of the target memory.

[0068] If the identification result indicates that the available resources of the hardware unit are greater than the resources required by the target task, then the available storage space is determined to be greater than the space to be occupied. In this case, it can be determined that the loading method information indicates that the operator weight set has been loaded into the target memory before the operator weight set based on the target model processes the data to be processed. Conversely, it indicates that the available storage space is less than or equal to the space to be occupied. In this case, it can be determined that the loading method information indicates that the operator weight set is partially loaded. This means that some or all of the operator weights reside in other storage spaces, such as memory, and are only loaded into the target memory when computation is required. Upon completion of computation, they can be unloaded from the target memory to provide storage space for operator weights to be computed subsequently.

[0069] The dynamic hardware awareness method provided in this disclosure supports dynamically adjustable loading of operator weights to adapt to different types of hardware units, thereby making reasonable use of hardware resources and improving the flexibility of task processing.

[0070] According to embodiments of this disclosure, when performing such Figure 2 Before the operation S210 shown, the task processing method may further include: determining the task attribute information of the target task based on the data volume information of the data to be processed corresponding to the target task and the data volume information of the operator weight set of the target model.

[0071] The target memory needs to store not only the operator weight set but also the data to be processed. Therefore, the resources occupied, such as the space to be occupied, include not only the storage space for the operator weight set but also the storage space for the data to be processed.

[0072] Therefore, the task attribute information determined by combining the data volume information of the data to be processed and the data volume information of the operator weight set of the target model is accurate and effective.

[0073] According to embodiments of this disclosure, the following is performed: Figure 2 Following the operation S220 shown, the task processing method may further include an unloading operation. For example, in response to the completion of the operation corresponding to the first operator, the weight of the first operator is unloaded from the target memory.

[0074] The offload method can be used to reduce the storage space requirements of the target memory of the hardware unit. For example, after the operation corresponding to the first operator is completed, the weight of the first operator can be deleted from the target memory to free up storage space and then store the weights of subsequent unloaded operators.

[0075] According to embodiments of this disclosure, by unloading the first operator weight of the first operator that has been executed, the storage space of the target memory is released, the storage space requirement of the target memory of the hardware unit is reduced, and the ability to make reasonable use of hardware resources is improved.

[0076] Figure 3A A schematic diagram of global loading operator weights according to an embodiment of the present disclosure is shown.

[0077] like Figure 3A As shown, the operator weight set is loaded from the preload memory into the target memory. If, based on the identification results, it is determined that the available storage space in the target memory is greater than the space to be occupied, the entire operator weight set can be stored from the preload memory into the target memory. This improves processing stability and avoids processing failures caused by the failure of any loading task in the parallel processing thread.

[0078] Figure 3B A schematic diagram illustrating a partial loading operator weight according to an embodiment of the present disclosure is shown.

[0079] and Figure 3A The loading methods shown are different. Figure 3B The loading method shown determines that the available storage space in the target memory is greater than the space to be occupied based on the identification results. In this case, the operator weight set can be processed using an offload method. Some operator weights are loaded from the preload memory into the target memory. After the operator corresponding to the pending operator weight has completed its task, the operator weight can be unloaded from the target memory. This reduces the storage space requirements of the target memory of the hardware unit and improves the rational utilization of hardware resources.

[0080] The above text explained how to load and unload operator weights. The following text will explain how to determine the weights of the second operator.

[0081] According to embodiments of this disclosure, when performing such Figure 2 Before operation S230, the task processing method may further include: determining a second operator weight from the operator weight set of the target model. The second operator weight may be the operator weight of the operator whose execution order follows that of the first operator. However, it is not limited to this. It may also be determined from a candidate operator weight set. The candidate operator weight set is the set of operator weights that has not been pre-loaded into the target memory before the target task is received.

[0082] According to embodiments of this disclosure, determining a second operator weight from the operator weight set of the target model may include: determining target candidate operator weights that are not currently loaded into the target memory from the candidate operator weight set of the target model; and determining, from the target candidate operator weights, the operator weight of the operator whose execution order is closest to the first operator's execution order, as the second operator weight.

[0083] Taking a target model that includes an attention mechanism as an example, the operator weight set includes convolution operator weights, multiplication operator weights, query (Q) matrix, key (K) matrix, and value (V) matrix. Some operator weights, such as convolution operator weights and multiplication operator weights, can be pre-stored in the target memory. The Q matrix, K matrix, and V matrix, as candidate operator weight sets, are stored in the pre-loaded memory.

[0084] When performing Q-matrix operations, the operator weights to be computed, such as the K-matrix, can be determined from the target candidate operator weights that are not currently loaded into the target memory in the candidate operator weight set, and the K-matrix weights are copied to the target memory.

[0085] Compared with determining the second operator weight from the set of operator weights, determining the second operator weight from the set of candidate operator weights can reduce the recognition range and improve recognition efficiency and accuracy.

[0086] According to an optional embodiment of this disclosure, the operator weights of operators executed later in the execution order can be updated to the candidate operator weight set. This ensures that the processing operation of the data to be processed can be started immediately upon receiving the target task, avoiding data processing delays caused by executing loading tasks.

[0087] According to another optional embodiment of this disclosure, the processing time information of each operator can be determined based on the data type information of the data to be processed and the operator type information of each operator in the target model. Based on the processing time information of each operator, a candidate operator weight set is determined from the operator weight set.

[0088] The processing time of an operator can be determined based on the operator type information. If the processing time is less than or equal to the loading time, and the operator is not pre-loaded into the target memory, the computation task may not be executed in time due to excessive loading time, leading to task latency. Therefore, the operator weights can be pre-loaded into the target memory.

[0089] Taking a large language model as the target model as an example, the amount of data to be processed in a large language model is relatively small, and the processing time information of the operators corresponding to the Q, K, and V matrices in the attention mechanism is short, less than the loading time. Therefore, if it is not loaded in time, it may cause interaction latency.

[0090] If the processing time of an operator is greater than the loading time, the loading time of the operator weight will not affect the execution of the computation task. Therefore, the storage space requirement of the target memory can be reduced by temporarily loading the operator weight. Thus, the operator weight of this operator can be added to the candidate operator weight set.

[0091] Taking a multimodal large model as the target model as an example, in image processing or video generation tasks, the processing time of the operators corresponding to the Q, K, and V matrices in the attention mechanism is relatively long, much longer than the loading time. Therefore, temporary loading has no impact on the computation.

[0092] For example, the precision requirements of the 8-bit floating-point data format (Floating-Point 8, FP8), enabling the offload loading mode for Q, K, and V weights, and reducing the storage space requirement of the target memory by about 20GB can enable the video generation task to run on hardware units with low hardware parameters.

[0093] According to embodiments of this disclosure, the weights of candidate operators in the candidate operator weight set in the target model are dynamically adjusted by using the processing time information of the operators. This improves the storage requirements of the hardware unit while combining with the network structure of the target model and the data to be processed in the target task, thereby improving the efficiency of processing and reducing the processing latency caused by loading.

[0094] Optionally, determining the candidate operator weight set from the operator weight set based on the processing time information of each operator may include: determining the candidate operator weight set from the operator weight set based on the processing time information of each operator and the coupling relationship between multiple operators.

[0095] Coupling refers to whether there is a correlation during the processing. If there is a correlation, and the operator weights of some operators are not loaded in time, it will affect the computation latency.

[0096] According to embodiments of this disclosure, a candidate operator weight set is determined from the operator weight set based on the processing time information of each operator and the coupling relationship between multiple operators. This can expand the reference factors used to determine the candidate operator weight set, improve the effectiveness of determining the candidate operator weight set, and ensure the stability and reliability of task processing.

[0097] The above section explained how to determine the second weighting operator. The following section will illustrate this with examples and... Figure 4 This will explain how to determine the target model used to handle the target task.

[0098] According to embodiments of this disclosure, when performing such Figure 2 Before the operation S210 shown, the task processing method may further include: determining a target model for processing the target task.

[0099] Determining the target model for processing the target task may include: determining the target adapter for performing the target task from multiple candidate adapters based on the task type information of the target task.

[0100] Candidate adapters can be obtained by training an initial adapter based on a base model. Multiple candidate adapters are used to perform tasks of different types. Based on the base model and the target adapter, the target model is determined.

[0101] Figure 4 A schematic diagram illustrating the determination of a target model according to an embodiment of the present disclosure is shown.

[0102] like Figure 4 As shown, different types of training samples can be used to train the base model and the initial adapter separately, resulting in multiple candidate adapters that match different task types. A one-to-one mapping relationship can be established between multiple candidate adapters and multiple task type information.

[0103] The base model can include a pre-trained large model. This pre-trained large model can be an open-source, general-purpose model. During training, while maintaining the general capabilities of the base model (e.g., freezing the operator weight set of the base model), the weight elements in the initial adapter's Low-Rank Adaptation (LoRA) weights can be adjusted. This allows the base model and the adjusted adapter weights to perform target tasks where the base model's processing performance is insufficient.

[0104] For example, a base model and an initialization adapter can be trained using animation training samples to obtain candidate adapters for generating animations. Similarly, a base model and an initialization adapter can be trained using digital human training samples to obtain candidate adapters for generating digital humans.

[0105] like Figure 4 As shown, in response to receiving a target task, a target adapter for processing the target task can be determined from multiple candidate adapters based on the mapping relationship and the task type information of the target task.

[0106] Task type information can include, but is not limited to, the type of data to be processed. It can also include information such as the type or requirements of the target task's processing result. For example, in an image generation task, the task type information can include the resolution of the image to be generated.

[0107] Candidate adapters can be determined based on the resolution of the image to be generated, and these adapters can be used to generate images at that resolution.

[0108] like Figure 4 As shown, the target model is determined based on the base model and the target adapter.

[0109] For example, the operator weights of the basic model The weights of the adaptation operator of the target adapter. Operator weights of the target model Where R represents a matrix, and d, k, and r all represent the dimensions of the matrix.

[0110] According to embodiments of this disclosure, multiple candidate adapters are provided to configure different types of target models for different application scenarios, enabling one model to adapt to multiple tasks, thereby expanding the scope of task processing.

[0111] According to embodiments of this disclosure, different optimization strategies can be configured for candidate adapters. For example, optimization strategies can employ different floating-point data formats for processing. The optimization strategy can be adaptively adjusted based on the task type of the target task. For instance, in a portrait video generation task scenario, a time-consuming but high-quality floating-point data format can be used. In a comic image generation scenario, a floating-point data format with short inference time can be used. This improves the flexibility and versatility of task processing, saving resources while reducing latency.

[0112] The preceding text explained how to determine the target model used to process the target task. The following text will explain how to switch target adapters.

[0113] Figure 5 The diagram illustrates a loading of the operator weight set of the target model into a preload memory according to an embodiment of the present disclosure.

[0114] like Figure 5 As shown, the target adapter may include multiple adapters, such as a first adapter and a second adapter.

[0115] Switching target adapters can refer to loading the target adapter used to perform the current target task into the preload memory, or unloading the target adapter used to perform historical target tasks from the preload memory.

[0116] like Figure 5 As shown, when the weights of the first target adaptation operator are loaded into the preload memory at time T, the weights of the second adaptation operator of the second adapter and the weights of the second basic operator of the basic model are fused together.

[0117] The first target adaptation operator weight is obtained by fusing the first adaptation operator weight of the first adapter with the first basic operator weight of the basic model.

[0118] In the human-computer interaction scenario of AIGC video generation, the target adapter corresponding to the target task can be adaptively loaded based on the target task triggered by the user through controls in the interactive interface.

[0119] In the first adapter of the target adapter execution Multiplication or 1 and While performing the merge calculation, write operations on B2 and A2 in the second adapter are executed asynchronously to improve processing efficiency.

[0120] According to embodiments of this disclosure, while providing multiple candidate adapters to adapt to different types of target tasks and expanding the application scope, the switching efficiency of candidate adapters is improved by multi-stream loading, minimizing dynamic switching overhead, so as to achieve microsecond-level switching or near-native inference latency.

[0121] According to embodiments of this disclosure, the above description illustrates how to address diverse target tasks in human-computer interaction scenarios by providing multiple candidate adapters. The following description will explain how to expand the application scope through third-party function calls.

[0122] Figure 6 The illustration shows a schematic diagram of a third-party function call according to an embodiment of the present disclosure.

[0123] like Figure 6 As shown, the target task can be parsed to obtain the parsing results.

[0124] In human-computer interaction scenarios, intent recognition can be performed on the data to be processed in the target task, such as images to be processed and text input by the user, to obtain intent recognition results. Based on the intent recognition results and the data to be processed, the task is parsed, for example, the target task is broken down to determine multiple sub-tasks. Each sub-task is analyzed to determine the tool used to execute it. If the tool used to execute the sub-task includes a model, and the model does not match the target model, the model used to process the target sub-task is determined to meet predetermined conditions with the target model.

[0125] For example, if no candidate adapter exists for executing this subtask, then this subtask is set as the target subtask.

[0126] like Figure 6 As shown, if the target task is determined to include target subtasks based on the parsing results, the subtask information of the target subtasks is sent to the target server.

[0127] The target server can be determined from multiple candidate servers by using the mapping relationship between subtasks and server clusters. The target server is configured with a model for executing the target subtasks.

[0128] The subtask information for the target subtask can include the image to be processed, processing requirements, etc. Any reference information used to execute the target subtask can be used as subtask information. This subtask information can be sent to the target server via an API (Application Programming Interface) so that the target server can process the subtask information and obtain the task result.

[0129] like Figure 6 As shown, the system receives the task results regarding the target subtask sent by the target server. Based on the task results and subtask information from other subtasks, the data to be processed is determined.

[0130] According to embodiments of this disclosure, remote invocation can make reasonable use of existing resources, improve pipeline parallelism, and reduce the storage space occupied in hardware units.

[0131] According to optional embodiments of this disclosure, not all target models can achieve the target task.

[0132] For example, in image generation tasks, the task requirements may include generating an image at a target resolution. If the resolution of the image generated using the target model is lower than the target resolution, then a post-processing super-resolution operation can be used to achieve the target resolution.

[0133] For example, in video generation tasks, the task requirements may include generating video with a target frame rate. If the frame rate of the video generated using the target model is lower than the target frame rate, frame interpolation can be used to achieve the target frame rate.

[0134] This improves the utilization rate of existing resources while reducing production costs.

[0135] According to optional embodiments of this disclosure, a flexible resource management method can be set up, and a task queue can be established when there are multiple tasks to be processed. A high-priority queue scheduling method is adopted, for example, online tasks are scheduled first, and idle resources are allocated to offline tasks for computation, maximizing the utilization of computing resources, improving the efficiency of human-computer interaction processing, avoiding latency, and thus improving the interactive experience.

[0136] According to another optional embodiment of this disclosure, a tidal computing power management method can be introduced. For example, online traffic has obvious periodic peak and trough traffic characteristics. Traffic characteristics can be dynamically predicted based on traffic profiles to prepare for scaling up and down operations in advance, thereby improving the adaptability between the hardware resources of the hardware unit and the number of tasks of the target task.

[0137] Figure 7 A block diagram of a task processing apparatus according to an embodiment of the present disclosure is shown schematically.

[0138] like Figure 7 As shown, the task processing device 700 includes a loading determination module 710, a data processing module 720, and a weight loading module 730.

[0139] The loading determination module 710 is used to determine loading method information in response to receiving a target task, based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task. The loading method information indicates the method of loading the operator weight set of the target model used to process the target task into the target memory of the hardware unit.

[0140] Data processing module 720 is used to process the data to be processed corresponding to the target task using hardware units based on the first operator weights of the operator weight set, when the loading method information indicates that the operator weight set is partially loaded; wherein the first operator weights have been loaded into the target memory; and

[0141] The weight loading module 730 is used to load the second operator weight of the operator weight set into the target memory, wherein the execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model.

[0142] According to embodiments of this disclosure, the task processing apparatus further includes: a candidate weight determination module and a second operator weight determination module.

[0143] The candidate weight determination module is used to determine the target candidate operator weights that are not currently loaded into the target memory from the candidate operator weight set of the target model. The candidate operator weight set consists of operator weights that were not pre-loaded into the target memory before the target task was received.

[0144] The second operator weight determination module is used to determine the operator weight of the nearest operator whose execution order is after the execution order of the first operator from the target candidate operator weights, and use it as the second operator weight.

[0145] According to embodiments of this disclosure, the task processing apparatus further includes: a duration determination module and a candidate weight set determination module.

[0146] The duration determination module is used to determine the processing duration of each operator based on the data type information of the data to be processed and the operator type information of each operator in the target model.

[0147] The candidate weight set determination module is used to determine the candidate operator weight set from the operator weight set based on the processing time information of each operator.

[0148] According to embodiments of this disclosure, the task processing apparatus further includes an unloading module.

[0149] The unloading module is used to unload the weights of the first operator from the target memory in response to the completion of the operation corresponding to the first operator.

[0150] According to embodiments of this disclosure, the loading determination module includes:

[0151] The identification submodule is used to identify hardware units based on hardware attribute information and task attribute information, and obtain the identification results.

[0152] The first loading determination submodule is used to determine, when the identification result indicates that the available resources of the hardware unit are greater than the resources occupied by the target task, that the loading method information indicates that the operator weight set has been loaded into the target memory before the operator weight set based on the target model processes the data to be processed.

[0153] The second loading determination submodule is used to determine the loading method information indicator operator weight set part loading when the identification result indicates that the available resources of the hardware unit are less than or equal to the resources occupied by the target task.

[0154] According to embodiments of this disclosure, the task processing apparatus further includes a task attribute determination module.

[0155] The task attribute determination module is used to determine the task attribute information of the target task based on the data volume information of the data to be processed corresponding to the target task and the data volume information of the operator weight set of the target model.

[0156] According to embodiments of this disclosure, the task processing apparatus further includes an adapter determination module and a model determination module.

[0157] The adapter determination module is used to determine the target adapter for performing the target task from multiple candidate adapters based on the task type information of the target task. The candidate adapters are obtained by training the initial adapter based on the base model, and multiple candidate adapters are used to perform tasks of different task types.

[0158] The model determination module is used to determine the target model based on the base model and the target adapter.

[0159] According to embodiments of this disclosure, the second operator weights of the operator weight set are loaded from the preload memory into the target memory.

[0160] The target adapter includes a first adapter and a second adapter.

[0161] According to embodiments of this disclosure, the task processing apparatus further includes a multi-stream loading module.

[0162] The multi-stream loading module is used to perform fusion processing on the second adaptation operator weight of the second adapter and the second basic operator weight of the base model when the first target adaptation operator weight is loaded into the preload memory. The first target adaptation operator weight is obtained by fusion processing based on the first adaptation operator weight of the first adapter and the first basic operator weight of the base model.

[0163] According to embodiments of this disclosure, the task processing apparatus further includes: a parsing module, a sending module, a receiving module, and a data determination module.

[0164] The parsing module is used to parse the target task and obtain the parsing results.

[0165] The sending module is used to send the subtask information of the target subtask to the target server when the target task is determined to include target subtasks based on the parsing results, wherein the model used to process the target subtask and the target model meet predetermined conditions.

[0166] The receiving module is used to receive the task results of the target subtask sent by the target server.

[0167] The data determination module is used to determine the data to be processed based on the task results.

[0168] Figure 8 A schematic block diagram of a smart agent according to an embodiment of the present disclosure is shown.

[0169] In embodiments of this disclosure, such as Figure 8 As shown, the intelligent agent 800 may include an input module 810, a processing module 820, and an output module 830.

[0170] Input module 810 is used to receive input information.

[0171] The processing module 820 is used to determine the target task based on the input information received by the input module, determine the large model based on the target task, and execute the task processing method provided according to the embodiments of this disclosure by calling the large model.

[0172] Output module 830 is used to output the output information obtained by the processing module.

[0173] According to embodiments of this disclosure, the input module 810 is responsible for receiving or sensing information such as queries, requests, instructions, signals, or data from the outside world (e.g., users or the external environment), and converting it into a format that the intelligent agent 800 can understand and process. The input module 810 is the primary link for the intelligent agent 800 to interact with the outside world, enabling the intelligent agent 800 to efficiently and accurately obtain necessary "sensory" information from the outside world and respond to this information.

[0174] In the example, input module 810 can input the target task described above.

[0175] In the example, the processing module 820 is the core support for the agent 800's ability to handle complex tasks. The processing module 820 can execute the task processing methods described above.

[0176] In the example, the performance of processing module 820 is closely related to the large model on which agent 800 is based. To fully leverage the capabilities of the large model, the internal structure of processing module 820 can be designed to be highly configurable and scalable to handle various types of tasks and requirements in real-world scenarios.

[0177] In the example, after the agent 800 acquires the target task, the processing module 820 can determine the loading method information based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task. If the loading method indicates partial loading of the operator weight set, the hardware unit processes the data to be processed corresponding to the target task based on the first operator weight of the operator weight set, loads the second operator weight of the operator weight set into the target memory, and then transmits the final task processing result to the output module 830.

[0178] Understandably, while large models possess excellent language understanding, image understanding, and generation capabilities, like humans, their ability to solve tasks is limited without the aid of any tools. When the agent 800 is given the ability to invoke tools, it can perform tasks such as using a calculator to complete mathematical calculations, using Python to perform data analysis, and using a search engine to create weather forecasts.

[0179] In the example, output module 830 can output the task processing results described above.

[0180] The intelligent agent 800 according to the embodiments of this disclosure can simply and effectively improve the level of intelligence, and enhance flexibility and versatility.

[0181] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0182] According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method described above.

[0183] According to embodiments of the present disclosure, a non-transitory computer-readable storage medium stores computer instructions, wherein the computer instructions are used to cause a computer to perform the method described above.

[0184] According to an embodiment of this disclosure, a computer program product includes a computer program that, when executed by a processor, implements the method described above.

[0185] Figure 9 A schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0186] like Figure 9 As shown, device 900 includes a computing unit 901, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 902 or a computer program loaded from storage unit 908 into random access memory (RAM) 903. RAM 903 may also store various programs and data required for the operation of device 900. The computing unit 901, ROM 902, and RAM 903 are interconnected via bus 904. Input / output (I / O) interface 905 is also connected to bus 904.

[0187] Multiple components in device 900 are connected to input / output (I / O) interface 905, including: input unit 906, such as keyboard, mouse, etc.; output unit 907, such as various types of monitors, speakers, etc.; storage unit 908, such as disk, optical disk, etc.; and communication unit 909, such as network card, modem, wireless transceiver, etc. Communication unit 909 allows device 900 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0188] The computing unit 901 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the various methods and processes described above, such as task processing methods. For example, in some embodiments, the task processing method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and / or installed on device 900 via ROM 902 and / or communication unit 909. When the computer program is loaded into RAM 903 and executed by the computing unit 901, one or more steps of the task processing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform task processing methods by any other suitable means (e.g., by means of firmware).

[0189] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0190] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0191] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0192] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0193] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0194] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, distributed system servers, or servers incorporating blockchain technology.

[0195] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0196] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A task processing method, comprising: In response to receiving a target task, based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task, loading method information is determined, wherein the loading method information indicates the method of loading the operator weight set of the target model used to process the target task into the target memory of the hardware unit. When the loading method information indicates that the operator weight set is partially loaded, the hardware unit processes the data to be processed corresponding to the target task based on the first operator weight of the operator weight set, wherein the first operator weight has been loaded into the target memory; and The second operator weight of the operator weight set is loaded into the target memory, wherein the execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model.

2. The method according to claim 1, further comprising: From the candidate operator weight set of the target model, determine the target candidate operator weights that are not currently loaded into the target memory, wherein the candidate operator weight set consists of operator weights that were not pre-loaded into the target memory before receiving the target task; and From the target candidate operator weights, determine the operator weight of the operator whose execution order is closest to that of the first operator, and use it as the second operator weight.

3. The method according to claim 2, further comprising: Based on the data type information of the data to be processed and the operator type information of each operator of the target model, the processing time information of each operator is determined; as well as Based on the processing time information of each operator, the candidate operator weight set is determined from the operator weight set.

4. The method according to any one of claims 1 to 3, further comprising: In response to the completion of the operation corresponding to the first operator, the weight of the first operator is unloaded from the target memory.

5. The method according to any one of claims 1 to 4, wherein, The determination of loading method information based on the hardware attribute information of the hardware unit used to process the target task and the task attribute information of the target task includes: Based on the hardware attribute information and the task attribute information, the hardware unit is identified to obtain the identification result; If the identification result indicates that the available resources of the hardware unit are greater than the resources occupied by the target task, it is determined that the loading method information indicates that the operator weight set has been loaded into the target memory before the data to be processed is processed based on the operator weight set of the target model; and If the identification result indicates that the available resources of the hardware unit are less than or equal to the resources occupied by the target task, the loading method information is determined to indicate that the operator weight set is partially loaded.

6. The method according to any one of claims 1 to 5, further comprising: Based on the data volume information of the data to be processed corresponding to the target task and the data volume information of the operator weight set of the target model, the task attribute information of the target task is determined.

7. The method according to any one of claims 1 to 6, further comprising: Based on the task type information of the target task, a target adapter is determined from multiple candidate adapters to perform the target task. The candidate adapters are obtained by training an initialization adapter using a base model, and the multiple candidate adapters are used to perform tasks of different task types. Based on the base model and the target adapter, the target model is determined.

8. The method according to claim 7, wherein, The second operator weights of the operator weight set are loaded from the preloaded memory into the target memory; The target adapter includes a first adapter and a second adapter; The method further includes: When the first target adaptation operator weight is loaded into the preload memory, the second adaptation operator weight of the second adapter and the second basic operator weight of the base model are fused together, wherein the first target adaptation operator weight is obtained by fusion processing based on the first adaptation operator weight of the first adapter and the first basic operator weight of the base model.

9. The method according to any one of claims 1 to 8, further comprising: The target task is analyzed to obtain the analysis results; If, based on the analysis results, it is determined that the target task includes target subtasks, the subtask information of the target subtasks is sent to the target server, wherein the model used to process the target subtasks and the target model satisfy predetermined conditions. Receive the task results regarding the target subtask sent by the target server; and Based on the task results, the data to be processed is determined.

10. A task processing apparatus, comprising: A loading determination module is configured to, in response to receiving a target task, determine loading method information based on hardware attribute information of a hardware unit used to process the target task and task attribute information of the target task, wherein the loading method information indicates the method of loading the operator weight set of the target model used to process the target task into the target memory of the hardware unit. The data processing module is configured to, when the loading method information indicates that the operator weight set is partially loaded, utilize the hardware unit to process the data to be processed corresponding to the target task based on the first operator weight of the operator weight set, wherein the first operator weight has been loaded into the target memory; and A weight loading module is used to load the second operator weight of the operator weight set into the target memory, wherein the execution order of the first operator corresponding to the first operator weight in the target model is earlier than the execution order of the second operator corresponding to the second operator weight in the target model.

11. An intelligent agent, comprising: The input module is used to receive input information; The processing module is configured to determine a target task based on the input information received by the input module, determine a large model based on the target task, and execute the method of any one of claims 1 to 9 by calling the large model to obtain output information; as well as An output module is used to output the output information obtained by the processing module.

12. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.

13. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1 to 9.

14. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1 to 9.