Hardware scheduling execution method, device and equipment of deep learning model and medium

By using fuzzy matching and hardware scheduling execution of kernel fusion subgraphs, the low efficiency of computational graph structure analysis in deep learning models is solved, achieving efficient inference acceleration and rapid deployment on the chip.

CN116205279BActive Publication Date: 2026-06-23SHANGHAI SUIYUAN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI SUIYUAN TECH CO LTD
Filing Date
2023-02-20
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing technologies, the computational graph structure analysis of deep learning models relies on manual analysis, which is inefficient and prone to errors, making it difficult to effectively utilize chip resources and thus hindering the acceleration of inference in deep learning models.

Method used

By performing fuzzy matching between the computation graph of the deep learning model and a pre-defined functional subgraph library, the matching computation subgraph and the fuzzy matching degree are determined. The kernel fusion subgraph is then used to perform hardware scheduling execution of the chip's multi-layer memory structure, thereby optimizing the execution efficiency of the computation graph.

Benefits of technology

It enables rapid model analysis and hardware acceleration of deep learning models, improves inference efficiency on chips, and supports generalization analysis and rapid deployment of a large number of deep learning models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116205279B_ABST
    Figure CN116205279B_ABST
Patent Text Reader

Abstract

Embodiments of the present application disclose a hardware scheduling execution method and device for a deep learning model, an apparatus, and a medium. The method comprises: obtaining a target deep learning model computation graph; performing fuzzy matching between the computation graph and each to-be-matched function subgraph in a preset function subgraph library; determining a computation subgraph matched with each to-be-matched function subgraph and a corresponding fuzzy matching degree in the computation graph; determining a target function subgraph matched successfully with the target computation subgraph from each to-be-matched function subgraph according to the fuzzy matching degree; determining a kernel fusion subgraph corresponding to the target computation subgraph according to the target function subgraph; and performing hardware scheduling execution of a chip multi-layer memory structure on the target computation subgraph in the deep learning model according to the kernel fusion subgraph. The method can realize fast model analysis of the deep learning model, thereby performing model hardware acceleration according to the matched function subgraph and performing inference acceleration on the deep learning model on the chip.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a hardware scheduling and execution method, apparatus, device, and medium for deep learning models. Background Technology

[0002] With the rapid evolution and development of artificial intelligence, the types of deep learning models are increasing. Examples include deep learning models used for image classification and recognition, automatic speech recognition, and language understanding. While there are many deep models, their construction follows inherent rules. For instance, most models can be built by modifying some basic modules. For example, stacking different numbers of residual structures can create residual networks such as ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152. These basic modules are transformed into directed acyclic computation graphs composed of operators. Then, the lower-level framework optimizes and schedules the execution of the kernel functions corresponding to the operators in the computation graph, thus completing the execution of a module.

[0003] Because the basic modules of deep learning models have many variations, the corresponding computational subgraphs also have many variations, making it difficult to determine the basic modules contained in the deep model from the computational graph of the lower-level framework. Currently, the structural analysis of the computational graph of deep learning models mainly relies on manual analysis by algorithm engineers. However, the basic model structure of deep learning models differs across different domains. The analysis process requires algorithm engineers to be familiar with a large number of basic model structures in different domains, which places high demands on their background knowledge. Moreover, manual analysis is inefficient, prone to errors or omissions, and difficult to generalize.

[0004] Therefore, structural analysis of the computation graph of a deep learning model, determining the basic model corresponding to the computation graph, and then accelerating the inference of the deep learning model on a chip based on this basic model are of great significance to the development of deep learning models and the rational utilization of chip resources. Summary of the Invention

[0005] This invention provides a hardware scheduling execution method, apparatus, device, and medium for deep learning models, enabling rapid model analysis of deep learning models and accelerating inference of deep learning models on a chip.

[0006] According to one aspect of the present invention, a hardware scheduling and execution method for a deep learning model is provided, the method comprising:

[0007] Obtain the computation graph of the target deep learning model;

[0008] The computation graph is fuzzy matched with each functional subgraph to be matched in the preset functional subgraph library; and the computation graph that matches each functional subgraph to be matched, and the corresponding fuzzy matching degree are determined in the computation graph.

[0009] Based on the fuzzy matching degree, the target functional subgraphs that successfully match the target computation subgraphs in each of the functional subgraphs to be matched are determined; and the kernel fusion subgraph corresponding to the target computation subgraph is determined based on the target functional subgraphs.

[0010] The target computation subgraph in the deep learning model is executed by hardware scheduling using a multi-layer memory structure based on the kernel fusion subgraph.

[0011] According to another aspect of the present invention, a hardware scheduling and execution apparatus for a deep learning model is provided, the apparatus comprising:

[0012] The computation graph acquisition module is used to acquire the computation graph of the target deep learning model;

[0013] The fuzzy matching module is used to perform fuzzy matching between the computation graph and each functional subgraph to be matched in the preset functional subgraph library; and to determine the computation subgraph that matches each of the functional subgraphs to be matched, and the corresponding fuzzy matching degree in the computation graph.

[0014] The kernel fusion subgraph determination module is used to determine, based on the fuzzy matching degree, the target functional subgraphs in each of the functional subgraphs to be matched that successfully match the target computation subgraph; and to determine the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph.

[0015] The hardware scheduling and execution module is used to perform hardware scheduling and execution of the target computation subgraph in the deep learning model based on the kernel fusion subgraph using the chip's multi-layer memory structure.

[0016] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising:

[0017] At least one processor; and

[0018] A memory communicatively connected to the at least one processor; wherein,

[0019] The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to execute the hardware scheduling execution method of the deep learning model according to any embodiment of the present invention.

[0020] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute a hardware scheduling execution method for a deep learning model as described in any embodiment of the present invention.

[0021] The technical solution of this invention addresses the problems of model analysis and hardware acceleration in deep learning models by: acquiring the computation graph of a target deep learning model; performing fuzzy matching between the computation graph and each functional subgraph to be matched in a preset functional subgraph library; determining the computation subgraph that matches each functional subgraph to be matched and the corresponding fuzzy matching degree in the computation graph; determining the target functional subgraph that successfully matches the target computation subgraph in each functional subgraph to be matched based on the fuzzy matching degree; determining the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph; and performing hardware scheduling execution of the target computation subgraph in the deep learning model using a multi-layer memory structure on the chip based on the kernel fusion subgraph. This solves the problem of model analysis and hardware acceleration in deep learning models. By performing fuzzy matching between the computation graph of the deep learning model and the functional subgraphs in the preset functional subgraph library, rapid model analysis of deep learning models can be achieved; and hardware acceleration of the model can be performed based on the kernel fusion subgraph corresponding to the matched functional subgraph, thereby accelerating inference of the deep learning model on the chip.

[0022] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0023] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0024] Figure 1 This is a flowchart of a hardware scheduling and execution method for a deep learning model according to Embodiment 1 of the present invention;

[0025] Figure 2a This is a flowchart of a hardware scheduling and execution method for a deep learning model according to Embodiment 2 of the present invention;

[0026] Figure 2b This is a schematic diagram of fuzzy matching when adding nodes to a functional subgraph according to Embodiment 2 of the present invention;

[0027] Figure 2c This is a schematic diagram of fuzzy matching when a functional subgraph lacks nodes, provided according to Embodiment 2 of the present invention;

[0028] Figure 2d This is a schematic diagram of fuzzy matching during functional subgraph node replacement according to Embodiment 2 of the present invention;

[0029] Figure 3 This is a schematic diagram of the structure of a hardware scheduling and execution device for a deep learning model according to Embodiment 3 of the present invention;

[0030] Figure 4 This is a schematic diagram of the structure of an electronic device that implements the hardware scheduling and execution method of the deep learning model in this embodiment of the invention. Detailed Implementation

[0031] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present invention.

[0032] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0033] Example 1

[0034] Figure 1 This is a flowchart of a hardware scheduling and execution method for a deep learning model according to Embodiment 1 of the present invention. This embodiment is applicable to situations where a deep learning model is analyzed and hardware acceleration is performed based on the analysis results. This method can be executed by a hardware scheduling and execution device for the deep learning model. This hardware scheduling and execution device can be implemented in hardware and / or software and can be configured in electronic devices such as computers or laptops. Figure 1 As shown, the method includes:

[0035] Step 110: Obtain the computation graph of the target deep learning model.

[0036] The computation graph can be a graphical representation of the computation process of a deep learning model. A computation graph can be a directed acyclic graph (DAG) with directionality, an acyclic structure, and input / output nodes.

[0037] Step 120: Perform fuzzy matching between the calculation graph and each functional subgraph to be matched in the preset functional subgraph library; and determine the calculation subgraph that matches each functional subgraph to be matched, and the corresponding fuzzy matching degree in the calculation graph.

[0038] In this invention, a FunctionDAG can be a local graph in the computation graph that implements a certain function. In an optional embodiment, the FunctionDAG includes: a block unit subgraph composed of at least one operator, and a layer unit subgraph composed of at least one block unit. The block unit subgraph consists of a small number of operators (Op). Op is a basic computational unit in a deep learning model. The layer unit subgraph is composed of block units. For example, the block unit subgraph may include: a normalization layer (Layer Norm), convolution (Conv) + batch normalization (BN) + line correction unit (ReLU), activation functions (such as Swish), etc. The layer unit subgraph may include: residual layer units in ResNet, multi-head self-attention layer units and forward computation layer units in Transformer networks, etc.

[0039] In this embodiment of the invention, fuzzy matching can be performed between the deep learning model computation graph and each functional subgraph to be matched in a preset functional subgraph library. Specifically, fuzzy matching can first be performed between the block unit subgraphs in the preset functional subgraph library and the computation graph; then, treating the block unit subgraphs as a whole, fuzzy matching can be performed between the layer unit subgraphs in the preset functional subgraph library and the computation graph.

[0040] In specific matching, the computation graph and each functional subgraph to be matched can be fuzzy matched according to the characteristics of the computation graph and the functional subgraphs in a certain order. For example, in an optional embodiment of the present invention, considering that the computation graph and each functional subgraph to be matched are directed, acyclic, and have input and output nodes, the input can be used as the root node, and the computation subgraphs that match each functional subgraph to be matched can be determined in the computation graph according to the breadth-first search (BFS) order and the matching situation.

[0041] In this embodiment, the preset functional subgraph library can be a functional module library composed of some typical functional subgraphs. Exact matching means that the topological relationships and node labels of all nodes and edges of the computational subgraph in the computational graph match those of the functional subgraph to be matched in the preset functional subgraph library. The computational subgraph can be a part of the computational graph. However, in practice, the computational graph in a deep learning model may be obtained by deforming typical functional subgraphs. For example, a certain Op may be missing, an Op may be added, or the Op type may be replaced. When constructing the preset functional subgraph library, it is difficult to pre-add all deformations of each typical functional subgraph to the library. Therefore, in this embodiment of the invention, fuzzy matching can be performed between the computational graph and each functional subgraph to be matched in the preset functional subgraph library. Fuzzy matching can be distinguished from the strong constraints of exact matching. It can be understood that when a computational subgraph is obtained after deforming a typical functional subgraph, fuzzy matching can be used to match the corresponding functional subgraph in the preset functional subgraph library.

[0042] When performing fuzzy matching, the computation graph can be matched node by node with each functional subgraph in a preset functional subgraph library. Specifically, it can be determined whether the nodes in the computation graph and the functional subgraph to be matched are the same, and whether the in-degrees of the nodes are the same. When the nodes are different, the higher-order neighborhood information of the node can be used to determine the nodes before and after it, as well as whether the in-degrees of all nodes are the same. Different matching values ​​can be set for different matching cases, and then the fuzzy matching degree can be determined based on the matching situation and matching value of each node.

[0043] Fuzzy matching degree can characterize the degree of matching between the computation subgraph and the functional subgraph to be matched. For example, a higher fuzzy matching degree value indicates a better degree of matching between the computation subgraph and the functional subgraph to be matched.

[0044] In an optional embodiment of the present invention, fuzzy matching is performed between the computation graph and each functional subgraph to be matched in the preset functional subgraph library, including: taking the input as the root node, performing fuzzy matching between the nodes of the computation graph and each functional subgraph to be matched in breadth-first order.

[0045] Specifically, using the input as the root node, and following a breadth-first search order, along with the matching status of nodes in the computation graph with the functional subgraphs to be matched, we can determine the nodes to be matched in the functional subgraphs and the target nodes corresponding to the nodes to be matched in the computation graph. This allows us to determine whether the nodes to be matched and the target nodes can be matched.

[0046] Step 130: Based on the fuzzy matching degree, determine the target functional subgraphs that successfully match the target computation subgraph in each functional subgraph to be matched; and determine the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph.

[0047] Specifically, the module structure of each functional subgraph in a pre-defined functional subgraph library can be analyzed in advance, and optimization schemes can be generated for the module structure. Based on the fuzzy matching degree, a suitable target functional subgraph can be determined for the target computation subgraph in the pre-defined functional subgraph library. Then, based on the matching results, the optimization scheme of the suitable target functional subgraph in the pre-defined functional subgraph library can be used as the optimization scheme of the target computation subgraph.

[0048] In this embodiment of the invention, the optimization scheme can be understood as follows: a deep learning model can be transformed into a directed acyclic computation graph composed of operators, and then the lower-level framework optimizes and schedules the execution of the kernel functions corresponding to the operators in the computation graph, thereby completing the execution of a module. For common deep learning modules, kernel fusion can improve the execution efficiency of the module in the chip. Kernel fusion is to merge multiple kernel functions into one kernel function. This has two advantages: 1) reducing the scheduling and startup overhead of kernel functions; 2) reducing access to global memory, improving data transfer efficiency, and improving computational performance. The steps of kernel fusion can be: 1) graph optimization; analyze the computation graph of the deep model and perform hardware-independent optimizations, such as constant folding; 2) detecting fusionable computational subgraphs; in a given computation graph, find a combination of graph nodes that can be fused; 3) code generation; given a fusionable computational subgraph, generate a kernel function code for it; 4) modifying the computation graph; replace the previous computational subgraph with the operators corresponding to the fused kernel function and insert it into the original computation graph.

[0049] For each functional subgraph in the preset functional subgraph library, a corresponding kernel fusion subgraph can be generated through the kernel fusion steps described above, and the corresponding kernel fusion subgraph is stored. Furthermore, when determining a matching target functional subgraph for the target computation subgraph in the target deep learning model computation graph from the preset functional subgraph library, the kernel fusion subgraph corresponding to this matching target functional subgraph can be used as the kernel fusion subgraph corresponding to the target computation subgraph. This allows for optimization of the target computation subgraph based on the kernel fusion subgraph, improving hardware execution performance.

[0050] In an optional embodiment of the present invention, determining the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph includes: using the kernel fusion subgraph corresponding to the target functional subgraph as the kernel fusion subgraph corresponding to the target computation subgraph.

[0051] For example, the functional subgraph with the highest fuzzy matching degree value when matching the target computational subgraph in the preset functional subgraph library can be used as the target functional subgraph. The kernel fusion subgraph corresponding to the target functional subgraph can be used as the kernel fusion subgraph corresponding to the target computational subgraph.

[0052] Step 140: Perform hardware scheduling and execution of the target computation subgraph in the deep learning model based on the kernel fusion subgraph using the chip's multi-layer memory structure.

[0053] In this embodiment of the invention, structural analysis can be performed on the computation graph of the deep learning model to find structures with algorithmic significance within the computation graph, i.e., target computation subgraphs that can be matched with functional subgraphs in a preset functional subgraph library. These structures can be optimized using high-level graph fusion of operators, taking into account chip hardware characteristics, to determine the kernel fusion subgraph corresponding to the target computation subgraph. The chip has multiple memory structures, such as three layers: high bandwidth memory (HBM), L2 memory, and L1 memory. Each memory layer has specific size specifications. By combining the specific memory structure on the chip with the determined kernel fusion subgraph, hardware scheduling and execution of the deep learning model can be performed on the chip. For example, the input / output data layout of the kernel fusion subgraph within the chip's multi-layer memory structure can be designed to achieve hardware scheduling of the corresponding target computation subgraph, thereby enabling hardware scheduling and execution of the deep learning model on the chip. In this embodiment of the invention, the kernel fusion subgraph can be reused in many deep learning model structures, ensuring high performance and high scalability of the technical solution.

[0054] The technical solution of this embodiment obtains the computation graph of the target deep learning model; performs fuzzy matching between the computation graph and each functional subgraph to be matched in a preset functional subgraph library; determines the computation subgraph that matches each functional subgraph to be matched in the computation graph, and the corresponding fuzzy matching degree; determines the target functional subgraph that successfully matches the target computation subgraph in each functional subgraph to be matched based on the fuzzy matching degree; determines the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph; and performs hardware scheduling execution of the target computation subgraph in the deep learning model using a multi-layer memory structure on the chip based on the kernel fusion subgraph. This solves the problems of model analysis and hardware acceleration of deep learning models. By performing fuzzy matching between the computation graph of the deep learning model and the functional subgraphs in the preset functional subgraph library, rapid model analysis of deep learning models can be achieved; and hardware acceleration of the model can be performed based on the kernel fusion subgraph corresponding to the matched functional subgraph, accelerating inference of deep learning models on the chip, which can support the generalization analysis capability and rapid inference deployment of a large number of deep learning models.

[0055] Example 2

[0056] Figure 2a This is a flowchart of a hardware scheduling and execution method for a deep learning model according to Embodiment 2 of the present invention. This embodiment is a further refinement of the above technical solution. The technical solution in this embodiment can be combined with the various optional solutions in one or more of the above embodiments.

[0057] Specifically, in an optional embodiment of the present invention, fuzzy matching is performed between the computation graph and each functional subgraph to be matched in a preset functional subgraph library; and the computation subgraph that matches each functional subgraph to be matched, and the corresponding fuzzy matching degree, are determined in the computation graph, including:

[0058] Based on the current matching result between the target computation subgraph and the functional subgraph to be matched, determine the node to be matched in the functional subgraph to be matched, the first higher-order neighborhood information corresponding to the node to be matched, the first parent node of the node to be matched in the functional subgraph to be matched, the second parent node in the computation graph that matches the first parent node, and the second higher-order neighborhood information corresponding to the second parent node.

[0059] Based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, determine the next matching result between the computation graph and the functional subgraph to be matched, and determine the corresponding fuzzy matching degree.

[0060] If all nodes in the subgraph to be matched have been fuzzy matched, then the next matching result is used as the computation subgraph to match the subgraph to be matched; otherwise, the next matching result is updated to the current matching result, and the step of determining the node to be matched is returned, until all nodes in the subgraph to be matched have been fuzzy matched.

[0061] like Figure 2a As shown, the method includes:

[0062] Step 210: Obtain the computation graph of the target deep learning model.

[0063] Step 220: Based on the current matching results between the computation graph and the functional subgraph to be matched, determine the node to be matched in the functional subgraph to be matched, the first higher-order neighborhood information corresponding to the node to be matched, the first parent node of the node to be matched in the functional subgraph to be matched, the second parent node in the computation graph that matches the first parent node, and the second higher-order neighborhood information corresponding to the second parent node.

[0064] In an optional embodiment of the present invention, the functional subgraph includes: a block cell subgraph composed of at least one operator, and a layer cell subgraph composed of at least one block cell.

[0065] In this embodiment of the invention, each functional subgraph in a preset functional subgraph library can be sequentially determined as a functional subgraph to be matched, and fuzzy matching can be performed with a portion of the computation graph, i.e., the computation subgraph, to determine whether a match can be made and the degree of fuzzy matching when a match is made. The node to be matched can be the node currently to be matched in the functional subgraph to be matched. The node to be matched can be determined by taking the input as the root node, following breadth-first order, and the current matching result in the functional subgraph to be matched. The first higher-order neighborhood information can be the multi-order neighboring nodes in the functional subgraph to be matched, following the computational flow direction, and the information corresponding to the node. For example, the first higher-order neighborhood information can include: the first-order neighboring nodes, second-order neighboring nodes, the operator type of the node, and the input and output information of the node, etc. The first parent node can be the predecessor node of the functional subgraph to be matched, following the computational flow direction. The second parent node can be a node in the computation graph that has already been successfully matched with the first parent node. The second higher-order neighborhood information can be the multi-order neighboring nodes in the computation graph, following the computational flow direction, and the information corresponding to the node. For example, the second higher-order neighborhood information may include: the first-order neighboring nodes and second-order neighboring nodes of the second parent node, the operator type of the node, and the input and output information of the node.

[0066] Step 230: Based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, determine the next matching result between the computation graph and the functional subgraph to be matched, and determine the corresponding fuzzy matching degree.

[0067] In this embodiment of the invention, it can be determined whether the node to be matched can be directly matched with the target node in the computation graph based on the actual situation of the functional subgraph; or, it can be determined whether the first higher-order neighborhood information of the node to be matched can be matched with the target node; or, it can be determined whether the node to be matched can be matched with the second higher-order neighborhood information, etc. Different matching situations can correspond to different degrees of fuzzy matching. Through the fuzzy matching provided by this embodiment of the invention, the problem that the computation graph cannot be accurately matched with each functional subgraph to be matched in the preset functional subgraph library when it is transformed from a typical functional subgraph can be solved.

[0068] Specifically, in an optional embodiment of the present invention, the next matching result between the computation graph and the functional subgraph to be matched is determined based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, and the corresponding fuzzy matching degree is determined. This includes: if there is a first target node in the first-order or second-order neighborhood of the second parent node that matches the node to be matched, then the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the current matching value is determined, and the corresponding fuzzy matching degree is determined based on the current matching value; ... based on the current matching value; if there is a first target node in the first-order or second-order neighborhood of the second parent node that matches the node to be matched, then the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the next matching result between the computation graph and the functional subgraph to be matched is determined based on the first target node, and the next matching result between the computation graph and the functional If there is no first target node and the in-degree and out-degree of the node to be matched are both 1, then delete the node to be matched, update the first-order neighbor node of the node to be matched to the node to be matched, and check whether there is a second target node in the first-order neighbor of the second parent node that matches the node to be matched, and determine the next matching result of the computation graph and the functional subgraph to be matched, as well as the corresponding fuzzy matching degree; if there is no second target node in the first-order neighbor of the second parent node, then check whether there is a third target node in the second-order neighbor of the second parent node that matches the node to be matched, and determine the next matching result of the computation graph and the functional subgraph to be matched, as well as the corresponding fuzzy matching degree.

[0069] For example, the node to be matched is u_i, its first parent node is u_p, and its second parent node matching u_p in the computation graph is v_p. In fuzzy matching, the conditions for node matching are that the node operator type and in-degree are the same, and the predecessor node operator type is the same. Special case: When u_i has no parent node, the target node v_i in the computation graph only needs to have the same operator type as u_i; there are no requirements regarding the predecessor node or in-degree. Fuzzy matching can be discussed in detail depending on the specific case.

[0070] Scenario 1: When u_i has no parent node and no target node matching u_i can be found in the computation graph, the current matching value can be determined, and matching can be performed on nodes after u_i. For example, the current matching value is -1.

[0071] Scenario 2: If a first target node in the first-order neighborhood of v_p in the computation graph matches the node u_i to be matched, then the next matching result can be determined based on the first target node, and the matching value for this instance can be determined. For example, the matching value for this instance is 0.

[0072] Case 3: If Case 2 is not satisfied, and there exists a first target node in the second-order neighborhood of v_p in the computation graph that matches the node u_i to be matched, then the next matching result can be determined based on the first target node, and the matching value for this instance can be determined. For example, the matching value for this instance is -1. This case can be a fuzzy matching method when the computation subgraph has one more node than the subgraph to be matched. For example, Figure 2bThis is a schematic diagram of fuzzy matching when adding nodes to a functional subgraph according to Embodiment 2 of the present invention. Figure 2b As shown, in the subgraph of functions to be matched, operator Sub is followed by operator Pow, while in the computation subgraph, the first-order neighborhood of operator Sub is operator Cast, and the second-order neighborhood is operator Pow. Through the fuzzy matching method of form 3 provided in this embodiment of the invention, functions such as... Figure 2b The subgraph to be matched and the computation subgraph shown are successfully matched, enabling rapid model analysis of the deep learning model.

[0073] In this embodiment of the invention, to better optimize the deep learning model based on the fuzzy matching results, a local temporary subgraph can be constructed based on the fuzzy matching results. Then, the target functional subgraph in the deep learning model can be scheduled and executed using a multi-layer memory structure based on the local temporary subgraph and the kernel fusion subgraph. Specifically, a local temporary subgraph can be constructed by temporarily inserting, deleting, or replacing nodes on the functional subgraph to be matched. For example, Figure 2b In the case of the above, the operator Cast can be temporarily inserted between the operator Sub and the operator Pow of the functional subgraph to be matched to construct a local temporary subgraph.

[0074] Case 4: If Case 3 is also not satisfied, and the in-degree and out-degree of the node to be matched u_i are both 1, then delete u_i, and take u_s, a first-order neighbor of u_i, as the node to be matched; and take u_p, the parent node of u_i, as the parent node of u_s; if there is a second target node in the first-order neighborhood of v_p in the computational subgraph that matches u_s, then the next matching result can be determined based on the second target node, and the matching value for this time can be determined. For example, the matching value for this time is -1. This case can be a fuzzy matching method when the computational subgraph has one less node than the functional subgraph to be matched. For example, Figure 2c This is a schematic diagram of fuzzy matching when a functional subgraph lacks nodes, provided according to Embodiment 2 of the present invention. Figure 2c As shown, in the subgraph of functions to be matched, operator Sub is followed by operator Cast, and operator Cast is followed by operator Pow; while in the computation subgraph, the first-order neighborhood of operator Sub is operator Pow. Through the fuzzy matching method of form 4 provided in this embodiment of the invention, functions such as... Figure 2c The subgraph to be matched and the computation subgraph shown are successfully matched, enabling rapid model analysis of the deep learning model. Figure 2c In the case of the above, the operator Cast can be temporarily deleted between the operator Sub and the operator Pow in the subgraph to be matched, and a local temporary subgraph can be constructed.

[0075] Case 5: If Case 4 is also not satisfied, then if a third target node exists in the second-order neighborhood of v_p in the computation subgraph that matches u_s, the next matching result can be determined based on the third target node, and the matching value for this instance can be determined. For example, the matching value for this instance is -1. This case can be a fuzzy matching method when a node in the computation subgraph and the functional subgraph to be matched are different. For example, Figure 2d This is a schematic diagram of fuzzy matching during functional subgraph node replacement according to Embodiment 2 of the present invention. Figure 2d As shown, in the subgraph of functions to be matched, operator Sub is followed by operator Pow; while in the computation subgraph, the first-order neighborhood of operator Sub is operator Mul. Through the fuzzy matching method of form 5 provided in this embodiment of the invention, functions such as... Figure 2d The subgraph to be matched and the computation subgraph shown are successfully matched, enabling rapid model analysis of the deep learning model. Figure 2d In the case of Pow, the operator Pow of the subgraph to be matched can be temporarily replaced with the operator Mul to construct a local temporary subgraph.

[0076] It should be noted that in the fuzzy matching process, not only can operators be matched, but also the edges between operators. The fuzzy matching process for edges and operators can be similar, and will not be elaborated here.

[0077] It should also be noted that during fuzzy matching, a node in the functional subgraph to be matched may match multiple nodes in the computation subgraph. When this happens, the next matching result can be generated for each matching node, and subsequent matching can continue. If subsequent matching is successful, multiple alternative matching results can be generated. If any subsequent matching fails, the current matching result can be discarded.

[0078] Specifically, when a node in the subgraph to be matched matches multiple nodes in the computation subgraph, fuzzy matching can be performed as follows: Multiple matching nodes in the computation subgraph are arranged in BFS order. The node to be matched in the subgraph, the currently matched node in the computation subgraph, and the node pairs already matched between the subgraph and the computation subgraph constitute the current state. The node to be matched in the subgraph, the remaining matching nodes in the computation subgraph, and the node pairs already matched between the subgraph and the computation subgraph constitute the remaining matching states. After the matching search of the node to be matched in the subgraph is completed, if no matching node is found in the computation subgraph, or if a matching node is found but the fuzzy matching degree of the current search path or state is less than a preset matching threshold, the current search path ends, and the search returns to the previous remaining matching state. After the matching search of the node to be matched in the subgraph is completed, and a corresponding matching node is found in the computation subgraph, the next node in the subgraph to be matched is selected in BFS order to continue the matching search.

[0079] The above process can take into account all kinds of situations in the matching, improving the comprehensiveness and reliability of the model structure analysis of deep learning models.

[0080] Step 240: If all nodes in the subgraph to be matched have been fuzzy matched, then the next matching result is used as the calculation subgraph to match the subgraph to be matched; otherwise, the next matching result is updated to the current matching result, and the process returns to the step of determining the node to be matched, until all nodes in the subgraph to be matched have been fuzzy matched.

[0081] In an optional embodiment of the present invention, determining the target functional subgraph that successfully matches the target computation subgraph in each of the functional subgraphs to be matched, based on the fuzzy matching degree, includes: if there are at least two candidate functional subgraphs that match the target computation subgraph in each of the functional subgraphs to be matched, then a unique target functional subgraph is determined based on the fuzzy matching degree between the target computation subgraph and each of the matched candidate functional subgraphs. For example, the candidate functional subgraph corresponding to the highest fuzzy matching value among the candidate functional subgraphs that match the target computation subgraph can be used as the target functional subgraph.

[0082] Step 250: Take the kernel fusion subgraph corresponding to the target functional subgraph as the kernel fusion subgraph corresponding to the target computation subgraph.

[0083] Step 260: Perform hardware scheduling execution of the target computation subgraph in the deep learning model based on the kernel fusion subgraph using the chip's multi-layer memory structure.

[0084] The technical solution of this embodiment involves obtaining the computation graph of the target deep learning model; determining the node to be matched, the first higher-order neighborhood information corresponding to the node to be matched, the first parent node of the node to be matched, the second parent node matching the first parent node in the computation graph, and the second higher-order neighborhood information corresponding to the second parent node in the functional subgraph to be matched based on the current matching result between the computation graph and the functional subgraph to be matched; determining the next matching result between the computation graph and the functional subgraph to be matched based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, and determining the corresponding fuzzy matching degree; if all nodes in the functional subgraph to be matched have been fuzzily matched, then the next matching result is used as the computation subgraph to be matched with the functional subgraph to be matched; otherwise, the next matching result is used as the computation subgraph to be matched with the functional subgraph to be matched. The process is updated to reflect the current matching results and returns the steps for determining the nodes to be matched, continuing until all nodes in the functional subgraphs to be matched have undergone fuzzy matching. The kernel fusion subgraph corresponding to the target functional subgraph is used as the kernel fusion subgraph corresponding to the target computation subgraph. Based on the kernel fusion subgraph, the target computation subgraph in the deep learning model is executed using a multi-layer memory structure on the chip, solving the problems of model analysis and hardware acceleration for deep learning models. By performing fuzzy matching between the deep learning model computation graph and the functional subgraphs in the preset functional subgraph library, rapid model analysis of deep learning models can be achieved. Thus, hardware acceleration of the model is performed based on the kernel fusion subgraph corresponding to the matched functional subgraph, accelerating inference of deep learning models on the chip, and supporting the generalization analysis capability and rapid inference deployment of a large number of deep learning models.

[0085] Example 3

[0086] Figure 3 This is a schematic diagram of the structure of a hardware scheduling and execution device for a deep learning model according to Embodiment 3 of the present invention. Figure 3 As shown, the device includes: a computation graph acquisition module 310, a fuzzy matching module 320, a kernel fusion subgraph determination module 330, and a hardware scheduling execution module 340.

[0087] in:

[0088] The computation graph acquisition module 310 is used to acquire the computation graph of the target deep learning model.

[0089] The fuzzy matching module 320 is used to perform fuzzy matching between the computation graph and each functional subgraph to be matched in the preset functional subgraph library; and to determine the computation subgraph that matches each functional subgraph to be matched in the computation graph, and the corresponding fuzzy matching degree.

[0090] The kernel fusion subgraph determination module 330 is used to determine the target functional subgraph that successfully matches the target computation subgraph in each functional subgraph to be matched based on the fuzzy matching degree; and to determine the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph.

[0091] The hardware scheduling and execution module 340 is used to perform hardware scheduling and execution of the target computation subgraph in the deep learning model based on the kernel fusion subgraph and the chip's multi-layer memory structure.

[0092] Optional, the fuzzy matching module 320 includes:

[0093] The information determination unit is used to determine the node to be matched in the functional subgraph to be matched, the first higher-order neighborhood information corresponding to the node to be matched, the first parent node of the node to be matched in the functional subgraph to be matched, the second parent node in the computation graph that matches the first parent node, and the second higher-order neighborhood information corresponding to the second parent node, based on the current matching result between the computation graph and the functional subgraph to be matched.

[0094] The matching result determination unit is used to determine the next matching result between the computation graph and the functional subgraph to be matched based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, and to determine the corresponding fuzzy matching degree.

[0095] The loop detection unit is used to, if all nodes in the subgraph to be matched have been fuzzily matched, use the next matching result as the calculation subgraph to be matched with the subgraph to be matched; otherwise, update the next matching result to the current matching result and return to the step of determining the node to be matched, until all nodes in the subgraph to be matched have been fuzzily matched.

[0096] Optional, the matching result determination unit is specifically used for:

[0097] If there is a first target node that matches the node to be matched in the first or second order neighborhood of the second parent node, then determine the next matching result between the computation graph and the functional subgraph to be matched based on the first target node, and determine the matching value for this time, and determine the corresponding fuzzy matching degree based on the matching value for this time.

[0098] If the first target node does not exist in the first or second order neighborhood of the second parent node, and the in-degree and out-degree of the node to be matched are both 1, then delete the node to be matched, update the first order neighborhood node of the node to be matched to the node to be matched, and check whether there is a second target node in the first order neighborhood of the second parent node that matches the node to be matched, determine the next matching result of the computation graph and the functional subgraph to be matched, and the corresponding fuzzy matching degree.

[0099] If the second target node does not exist in the first-order neighborhood of the second parent node, then it is checked whether there is a third target node in the second-order neighborhood of the second parent node that matches the node to be matched, and the next matching result of the computation graph and the functional subgraph to be matched, as well as the corresponding fuzzy matching degree, is determined.

[0100] Optionally, the kernel fusion subgraph determination module 330 includes:

[0101] The target functional subgraph determination module is used to determine a unique target functional subgraph based on the fuzzy matching degree between the target computational subgraph and each matched candidate functional subgraph if there are at least two candidate functional subgraphs that match the target computational subgraph.

[0102] Optionally, the kernel fusion subgraph determination module 330 includes:

[0103] The kernel fusion subgraph determination unit is used to take the kernel fusion subgraph corresponding to the target functional subgraph as the kernel fusion subgraph corresponding to the target computation subgraph.

[0104] Optional, the fuzzy matching module 320 includes:

[0105] The fuzzy matching unit is used to perform fuzzy matching between the computation graph and the nodes in each functional subgraph to be matched, with the input as the root node and in a breadth-first order.

[0106] Optionally, the functional subgraph includes: a block cell subgraph consisting of at least one operator, and a layer cell subgraph consisting of at least one block cell.

[0107] The hardware scheduling and execution device for the deep learning model provided in this embodiment of the invention can execute the hardware scheduling and execution method for the deep learning model provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

[0108] Example 4

[0109] Figure 4 A schematic diagram of an electronic device 10 that can be used to implement embodiments of the present invention is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0110] like Figure 4 As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 may also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.

[0111] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0112] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as the hardware scheduling execution method for deep learning models.

[0113] In some embodiments, the hardware scheduling execution method for the deep learning model can be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 10 via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the hardware scheduling execution method for the deep learning model described above can be performed. Alternatively, in other embodiments, processor 11 can be configured to execute the hardware scheduling execution method for the deep learning model by any other suitable means (e.g., by means of firmware).

[0114] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0115] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0116] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0117] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0118] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0119] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.

[0120] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0121] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A hardware scheduling and execution method for a deep learning model, characterized in that, include: Obtain the computation graph of the target deep learning model; The computation graph is fuzzy matched with each functional subgraph to be matched in the preset functional subgraph library; And in the computation graph, determine the computation subgraph that matches each of the functional subgraphs to be matched, and the corresponding fuzzy matching degree; wherein, performing fuzzy matching between the computation graph and each functional subgraph to be matched in the preset functional subgraph library includes: determining whether the nodes of the computation graph and the functional subgraph to be matched are the same, and whether the in-degrees of the nodes are the same; when the nodes are different, determine the nodes before and after the current node, and whether the in-degrees of each node are the same, based on the higher-order neighborhood information of the node. Based on the fuzzy matching degree, the target functional subgraphs that successfully match the target computation subgraphs in each of the functional subgraphs to be matched are determined; and the kernel fusion subgraph corresponding to the target computation subgraph is determined based on the target functional subgraphs. The target computation subgraph in the deep learning model is executed by hardware scheduling using a multi-layer memory structure based on the kernel fusion subgraph.

2. The method according to claim 1, characterized in that, The computation graph is fuzzy matched with each functional subgraph to be matched in the preset functional subgraph library; And in the computation graph, determine the computation subgraph that matches each of the functional subgraphs to be matched, and the corresponding fuzzy matching degree, including: Based on the current matching result between the computation graph and the functional subgraph to be matched, determine the node to be matched in the functional subgraph to be matched, the first higher-order neighborhood information corresponding to the node to be matched, the first parent node of the node to be matched in the functional subgraph to be matched, the second parent node in the computation graph that matches the first parent node, and the second higher-order neighborhood information corresponding to the second parent node. Based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, determine the next matching result between the computation graph and the functional subgraph to be matched, and determine the corresponding fuzzy matching degree. If all nodes in the subgraph to be matched have been fuzzy matched, then the next matching result is used as the computational subgraph to match the subgraph to be matched; otherwise, the next matching result is updated to the current matching result, and the process returns to the step of determining the node to be matched, until all nodes in the subgraph to be matched have been fuzzy matched.

3. The method according to claim 2, characterized in that, Based on the node to be matched, the first higher-order neighborhood information, the first parent node, the second parent node, and the second higher-order neighborhood information, determine the next matching result between the computation graph and the functional subgraph to be matched, and determine the corresponding fuzzy matching degree, including: If there is a first target node in the first-order or second-order neighborhood of the second parent node that matches the node to be matched, then the next matching result between the computation graph and the functional subgraph to be matched is determined according to the first target node, and the current matching value is determined, and the corresponding fuzzy matching degree is determined according to the current matching value. If the first target node does not exist in the first-order or second-order neighborhood of the second parent node, and the in-degree and out-degree of the node to be matched are both 1, then the node to be matched is deleted, the first-order neighborhood node of the node to be matched is updated to the node to be matched, and it is detected whether there is a second target node in the first-order neighborhood of the second parent node that matches the node to be matched, and the next matching result of the computation graph and the functional subgraph to be matched, as well as the corresponding fuzzy matching degree, are determined. If the second target node does not exist in the first-order neighborhood of the second parent node, then it is detected whether there is a third target node in the second-order neighborhood of the second parent node that matches the node to be matched, and the next matching result of the computation graph and the functional subgraph to be matched, as well as the corresponding fuzzy matching degree, are determined.

4. The method according to claim 2, characterized in that, Based on the fuzzy matching degree, the target functional subgraphs that successfully match the target computational subgraph in each of the functional subgraphs to be matched are determined, including: If there are at least two candidate functional subgraphs that match the target computation subgraph in each of the said functional subgraphs to be matched, then a unique target functional subgraph is determined according to the fuzzy matching degree between the target computation subgraph and each of the matched candidate functional subgraphs.

5. The method according to claim 4, characterized in that, Determining the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph includes: The kernel fusion subgraph corresponding to the target functional subgraph is used as the kernel fusion subgraph corresponding to the target computation subgraph.

6. The method according to claim 1, characterized in that, The computation graph is fuzzy matched with each functional subgraph to be matched in the preset functional subgraph library, including: Using the input as the root node, perform fuzzy matching between the nodes in the computation graph and each of the functional subgraphs to be matched, following a breadth-first order.

7. The method according to claim 1, characterized in that, The functional subgraph includes: a block cell subgraph consisting of at least one operator, and a layer cell subgraph consisting of at least one block cell.

8. A hardware scheduling and execution device for a deep learning model, characterized in that, include: The computation graph acquisition module is used to acquire the computation graph of the target deep learning model; The fuzzy matching module is used to perform fuzzy matching between the computation graph and each functional subgraph to be matched in the preset functional subgraph library; And in the computation graph, determine the computation subgraph that matches each of the functional subgraphs to be matched, and the corresponding fuzzy matching degree; wherein, performing fuzzy matching between the computation graph and each functional subgraph to be matched in the preset functional subgraph library includes: determining whether the nodes of the computation graph and the functional subgraph to be matched are the same, and whether the in-degrees of the nodes are the same; when the nodes are different, determine the nodes before and after the current node, and whether the in-degrees of each node are the same, based on the higher-order neighborhood information of the node. The kernel fusion subgraph determination module is used to determine, based on the fuzzy matching degree, the target functional subgraphs in each of the functional subgraphs to be matched that successfully match the target computation subgraph; and to determine the kernel fusion subgraph corresponding to the target computation subgraph based on the target functional subgraph. The hardware scheduling and execution module is used to perform hardware scheduling and execution of the target computation subgraph in the deep learning model based on the kernel fusion subgraph using the chip's multi-layer memory structure.

9. An electronic device, characterized in that, The electronic device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to execute the hardware scheduling execution method of the deep learning model according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that are used to cause a processor to execute the hardware scheduling execution method of the deep learning model according to any one of claims 1-7.