Method and apparatus for optimizing neural network model execution policy

By optimizing the execution order of computational layers and multi-level grouping in the neural network model, the problem of low memory efficiency under low hardware resources is solved, thereby improving memory utilization efficiency and maintaining model performance. The optimization process does not change the model architecture or accuracy.

CN116306797BActive Publication Date: 2026-06-26UNITED MICROELECTRONICS CENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNITED MICROELECTRONICS CENT CO LTD
Filing Date
2021-12-02
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In a low-hardware-resource architecture, how can we optimize the execution strategy of neural networks to improve the efficiency of limited memory usage, especially reducing the average and peak usage of runtime memory, without changing the architecture and performance of the neural network model?

Method used

By optimizing the execution order of computational layers in a neural network model, constraining the execution order of each computational layer based on data dependencies, and using a multi-level grouping approach to divide nodes into branch groups, candidate execution orders are determined through top-down traversal and bottom-up iterative optimization to shorten the time that the output of the computational layer resides in memory.

Benefits of technology

It improves memory utilization efficiency, reduces the average and peak memory usage during neural network model runtime, without affecting model accuracy and architecture performance, and the optimization process can be performed offline without requiring online real-time processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116306797B_ABST
    Figure CN116306797B_ABST
Patent Text Reader

Abstract

The present application relates to a method and apparatus for optimizing a neural network model execution strategy. A method for optimizing a neural network model execution strategy is disclosed, comprising: receiving a neural network model, the neural network model comprising a plurality of nodes, wherein each node corresponds to at least one operation layer of the neural network model; identifying data dependency relationships between the nodes; determining constraint conditions for a plurality of node execution orders based on the identified data dependency relationships between the nodes; performing at least one search algorithm to determine possible candidate node execution orders based on satisfying the constraint conditions for the execution orders; estimating storage requirements related to execution of the plurality of nodes of the neural network model based on each candidate node execution order; selecting a node execution order from the candidate node execution orders based on the estimated storage requirements related to each candidate node execution order and according to a predetermined memory usage efficiency indicator, wherein changing the node execution order can change a time at which a corresponding output of one or more nodes of the plurality of nodes of the neural network model resides in a memory during execution of the neural network model; and outputting the selected node execution order of the neural network model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure generally relates to the field of machine learning in artificial intelligence, and more specifically, this disclosure relates to methods and apparatus for implementing strategies based on artificial intelligence neural network models. Background Technology

[0002] In recent years, neural network models have shown a trend of increasing computational scale and complexity. At the same time, the application market for Artificial Intelligence of Things (AIoT) and other technologies is continuing to grow, with high efficiency, low power consumption, and high cloud integration becoming development trends.

[0003] For architectures with low hardware resources, one problem is the limitation of memory capacity. Therefore, how to optimize the execution strategy of neural networks to improve the efficiency of limited memory utilization has become an urgent issue.

[0004] US20190303762A1 relates to a method for optimizing a neural network computation graph. This computation graph is used by a computing platform to perform neural network computations. The computing platform reads the data required for computation from off-chip memory. This patent application defines rules for identifying horizontally and / or vertically adjacent layers, and selects fusionable layers based on these optimization rules to reduce the frequency of data exchange between the computing platform and off-chip memory. The defined fusion rules are used to determine different fusion strategies for the neural network model, and simulations are performed to evaluate the time cost and computational efficiency of different fusion strategies, selecting the one with the best performance.

[0005] US10699186B2 relates to a method for determining the execution order of a neural network. This method first determines the amount of available memory in a given memory space for running network inference, then determines the memory requirements of each operation in the neural network, and proposes several graph search algorithms to determine the execution order of neural network operations such that memory usage does not exceed the available memory.

[0006] CN112346877B relates to a memory allocation method for effectively accelerating deep learning computation. This application proposes first calculating the required memory space size based on a multi-branch operation layer, determining the target operation order of the multi-branch operation layer, and then determining the memory allocation scheme for executing the multi-branch operation layer based on the target memory allocation scheme and the determined target operation order. Summary of the Invention

[0007] In view of this, this application proposes to improve memory utilization efficiency by reducing the time the outputs of computational layers in a neural network model reside in memory as a whole. Based on this concept, one embodiment of this application proposes to reduce the total time the outputs of each computational layer in the neural network model reside in memory by optimizing the execution order of the computational layers in the neural network model, while satisfying the constraints of the data dependencies between the computational layers of the neural network model on the execution order of each computational layer, thereby reducing the average runtime memory usage and peak usage.

[0008] To improve the efficiency of execution order processing in search optimization operations, one embodiment of this application proposes to group nodes in a neural network model into multiple levels based on branch nodes. At least two nodes directly following a branch node are designated as the starting nodes of the branch group. Each starting node of a branch group, along with nodes whose data depends on that starting node, is assigned to the same branch group. The number of candidate execution orders is reduced by permuting and combining these branch groups, thereby improving processing efficiency. A preferred embodiment of this application proposes to traverse the sorting of each level of grouping from the top root node closest to the input in the neural network, proceeding from top to bottom. Each traversal optimizes the execution order of the nodes with the highest dependency level from the bottom of the neural network, and through multiple traversals starting from the node with the highest dependency level at the bottom of the neural network, global optimization of the execution order of nodes in the neural network model is achieved in a bottom-up manner.

[0009] The method for optimizing the execution strategy of a neural network model according to the embodiments of this application can improve the hardware performance efficiency of computer devices, especially reducing the average and peak memory usage during neural network model runtime, while not changing the neural network model architecture and having no impact on the performance of the neural network algorithm, particularly the model accuracy. Furthermore, the processing to obtain the optimized execution order can be performed offline in one go based on the trained neural network model, without requiring online real-time processing.

[0010] This application discloses a method for optimizing the execution strategy of a neural network model, comprising: receiving a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one operational layer of the neural network model; identifying data dependencies between the nodes; determining constraints on the execution order of the multiple nodes based on the identified data dependencies between the nodes; performing at least one search algorithm to determine possible candidate node execution orders based on satisfying the constraints on the execution order; estimating the storage requirements related to the execution of the multiple nodes of the neural network model based on each candidate node execution order; selecting a node execution order from the candidate node execution orders based on the estimated storage requirements related to each candidate node execution order and according to a predetermined memory usage efficiency index, wherein changing the node execution order can change the time that the corresponding outputs of one or more nodes among the multiple nodes reside in memory during the execution of the neural network model; and outputting the selected node execution order of the neural network model.

[0011] This application, in another aspect, discloses an apparatus for optimizing the execution strategy of neural network nodes, comprising: a receiving unit configured to receive a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one operational layer of the neural network model; an identification unit configured to identify data dependencies between the nodes; a determination unit configured to determine constraints on the execution order of the multiple nodes based on the identified data dependencies between the nodes; a search unit configured to perform at least one search algorithm to determine possible candidate node execution orders based on satisfying the constraints on the execution order; an estimation unit configured to estimate the storage requirements related to the execution of the multiple nodes of the neural network model based on each candidate node execution order; an optimization unit configured to select a node execution order from the candidate node execution orders based on the estimated storage requirements related to each candidate node execution order and according to a predetermined memory usage efficiency index, wherein changing the node execution order can change the time that the corresponding outputs of one or more nodes among the multiple nodes reside in memory during the execution of the neural network model; and an output unit configured to output the selected node execution order of the neural network model.

[0012] Another aspect of this application discloses a computer-readable storage medium having a computer program stored thereon, which, when run on a computing device, causes the computing device to execute a method for optimizing a neural network model according to the present application. Attached Figure Description

[0013] Further details, features, and advantages of this disclosure are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which:

[0014] Figure 1 Examples of network descriptors in pseudocode and graphical form are illustrated schematically.

[0015] Figure 2 This illustration shows the data dependencies between the various operational layers of a neural network model.

[0016] Figure 3 This illustration shows examples of how different orders of execution affect memory usage in a neural network model;

[0017] Figure 4 A flowchart illustrating the process of obtaining the execution order of optimized nodes in a neural network model according to an embodiment of the present disclosure is shown.

[0018] Figure 5 A schematic diagram of the branch node is shown;

[0019] Figure 6 The flowchart schematically illustrates a preferred embodiment of optimizing the node execution order based on a branch-based neural network model;

[0020] Figures 6A-6B This illustration schematically demonstrates the process of identifying branch nodes and grouping in a neural network model;

[0021] Figure 6C-6I This schematically illustrates the specific process of dynamic grouping;

[0022] Figure 7 This diagram illustrates the local optimization of the execution order of nodes in a neural network model.

[0023] Figure 8A and 8B A schematic diagram illustrating the determination of node dependency levels is shown.

[0024] Figure 9 A block diagram of an exemplary apparatus 900 for optimizing the execution strategy of neural network nodes according to an embodiment of the present disclosure is shown schematically.

[0025] Figure 10A and 10B The comparison of memory usage between neural network model example 1, YOLO, and neural network model example 2, InceptionV1, is shown;

[0026] Figure 11 A block diagram of a computing device 1100 according to an embodiment of the present disclosure is shown schematically. Detailed Implementation

[0027] Before detailing the embodiments of this disclosure, some related concepts will first be explained.

[0028] A Convolutional Neural Network (CNN) model consists of an input layer and multiple computational layers. These layers are interconnected to form the entire neural network. The output of each layer can serve as the input to one or more other connected layers. The output of each layer can include an output feature map. Each layer's output must reside in memory until it is no longer accessed by any other layer. The lifetime of a layer's output (Feature Map Lifetime) refers to the duration between its generation and release from memory. The lifetime of a layer's output indirectly affects the overall system's memory usage.

[0029] Convolutional Neural Network (CNN) models typically also include network descriptors. These descriptors define the neural network architecture, specify the connections between operational layers, define the types of operational layers, and define the parameters of each operational layer. Operational layer types can include convolution, activation, pooling, and element-wise addition. Operational layer parameters can include the number of output channels, kernel size, stride, and padding. In this paper, "operational layer" refers to a single operational layer in a CNN or a fused operational layer. This application applies some analytical methods from graph theory to CNNs. To facilitate the description of CNN operations using graph-based network descriptors, the graph theory term "node" is used to represent an "operational layer," with each node corresponding to at least one operational layer in the neural network model. Figure 1 Examples of network descriptors in pseudocode and graphical forms are shown respectively. Figure 1 In the diagram, the numbers ①, ②, and ③ represent three operational layers, respectively. Figure 1 The left figure shows a network descriptor in pseudocode form, where code segments ①, ②, and ③ define the execution order of three operation layers ①, ②, and ③, respectively. Each code segment describes the operation layer type and parameters for that operation layer. For example, Figure 1 The left figure defines the name of the operational layer ① as "backbone_feature_5.0.0", its type as convolution, and its parameters include: number of output channels num_output: 32, kernel size kernel_size: 3, stride: 2, etc. Data dependencies can exist between the operational layers of a neural network model. Figure 2This diagram illustrates the data dependencies between operational layers in a neural network model. In general, if two or more operational layers have data dependencies, these layers must be executed in the order of these dependencies. Conversely, if the operational layers are data-independent, the order in which they are executed will not affect the architecture of the neural network model. For example, if operational layer 1 is connected to operational layer 2, and the input of operational layer 2 comes from the output of operational layer 1, these two operational layers are data-dependent, and operational layer 1 must be executed "sequentially" before operational layer 2, according to the data dependencies. If two or more operational layers are not connected and use mutually independent data, there are no data dependencies between these layers, and the order in which they are executed will not affect the architecture of the neural network model. For example, as... Figure 2 As shown, the input to computation layer 202 comes from the output of computation layer 201. Computation layers 201 and 202 have a data dependency; computation layer 201 must be executed first, followed by computation layer 202. Similarly, the input to computation layer 203 comes from the output of computation layer 201. Therefore, computation layers 201 and 203 have a data dependency; computation layer 201 must be executed first, followed by computation layer 203. However, computation layers 202 and 203 are data independent. For these two computation layers, executing computation layer 202 or 203 in the first place will not affect the architecture or performance of the neural network model.

[0030] Based on this, this application proposes to optimize the execution order of each operation layer of a neural network model while satisfying the constraint of the data dependency relationship between each operation layer on the execution order of each operation layer.

[0031] Figure 3 Examples are shown illustrating the impact of executing computational layers in a neural network model in different orders on memory usage. Figure 3 In the neural network shown, operational layers 301 and 305 both follow operational layer 300. Therefore, the inputs of operational layers 301 and 305 depend on the output of operational layer 300, specifically the operational layer named fpn.merge1.2. In other words, the feature map output by operational layer 300 can only be released from memory after both operational layers 301 and 305 have completed their operations. The node names corresponding to each operational layer in the diagram, such as fpn.merge1.2 for operational layer 300 and ssh1.conv5X5_1.0 for operational layer 301, are only for reference. In actual applications, each operational layer may have different names, or these names may be modified.

[0032] Figure 3The left figure shows the execution of each operation layer of the neural network model in the first execution order, while the right figure shows the execution of each operation layer of the neural network model in a second order, which is different from the first order. Compared with the first execution order, the output feature map of operation layer 300 can be released from memory earlier. Figure 3 The example in the left figure executes operation layers 301, 302, 303, 304, 305, 306, and 307 in the first order, i.e., according to the numbers ①-⑦. When these operation layers are executed in the first order, the top operation layer 300 is executed first, and the output feature map of operation layer 300 is stored in memory. Since the input of operation layer 305 depends on operation layer 301, the output feature map of operation layer 301 can only be released from memory after operation layer 305 has been executed.

[0033] The following is a timing sequence of the execution of each computation layer 301-307 and the generation and release of the output feature map of computation layer 300 when the computation layers 301-307 are executed in the first order:

[0034] • Generate the output feature map of computation layer 300

[0035] • ① Execute the computation layer 301

[0036] • ② Execute the operation layer 302

[0037] • ③ Execute the operation layer 303

[0038] • ④ Execute the computation layer 304

[0039] • ⑤ Execute the computation layer 305

[0040] • Release the output feature map of computation layer 300

[0041] • ⑥ Execute the computation layer 306

[0042] • ⑦ Execute the computation layer 307

[0043] Figure 3 The right-hand diagram illustrates the invocation of the output feature map of computation layer 300 when the same neural network model as the left-hand diagram is executed in a second order. When executed in this second order, after the feature map of computation layer 300 is generated, computation layer 305 is executed first, followed by computation layer 301. After computation layer 301 completes its computation, no other computation layer needs to call the output feature map of computation layer 300, so the output feature map of computation layer 300 can be released from memory at this point. Then, computation layers 304, 302, 303, 306, and 307 are executed sequentially.

[0044] The following is a timing sequence of the execution of each computation layer 301-307 according to the second order, as well as the generation and release of the output feature map of computation layer 300:

[0045] • Generate feature maps of the 300-layer computational layer output.

[0046] • ① Execute the computation layer 305

[0047] • ② Execute the computation layer 301

[0048] • Release the output feature map of computation layer 300

[0049] • ③ Execute operation layer 304

[0050] • ④ Execute the computation layer 302

[0051] • ⑤ Execute the computation layer 303

[0052] • ⑥ Execute the computation layer 306

[0053] • ⑦ Execute the computation layer 307

[0054] As can be seen from the timing diagram above, when the operation layers 301-307 are executed in the second order, the output feature map of operation layer 300 can be released from memory before the execution of operation layers 302, 303, and 304. Compared with executing operation layers 301-307 in the first order, the output feature map of operation layer 300 can be released from memory earlier. Since executing the operation layers in the second order shortens the lifespan of the output feature map of operation layer 300, the time that the output feature map of operation layer 300 resides in memory is shortened, thereby improving memory utilization efficiency.

[0055] Figure 4 A flowchart illustrating an embodiment of obtaining an optimized execution order of nodes in a neural network model according to this application is shown. The optimization of the execution order of nodes in a neural network model can be performed offline by a computer based on a trained neural network model and an initial execution order.

[0056] In step S401, the trained neural network model is received. In one embodiment, the initial execution strategy of the neural network model may also be received. The initial execution strategy of the neural network model can be, for example... Figure 1Defined in the network descriptor shown. The initial execution strategy may include the initial execution order of multiple nodes in the neural network model, and the initial execution order of multiple nodes may be random. Each node corresponds to at least one operational layer of the neural network model. In one embodiment, some consecutive operational layers of the network can be merged into one operational layer as a node. In step S402, data dependencies between multiple nodes are identified. In step S403, constraints on the execution order of multiple nodes are determined based on the identified data dependencies between multiple nodes. As mentioned above, if the input of one node depends on the output of another node, then the two nodes are data dependent, and there are constraints on their execution order; the two nodes must be executed "sequentially" according to the dependency. In step S404, based on satisfying the constraints on the execution order between nodes, at least one search algorithm is performed to determine possible candidate node execution orders. In step S405, memory requirements associated with the execution of multiple nodes are estimated based on each candidate node execution order. In step S406, a node execution order is selected from the candidate node execution orders based on the estimated execution orders of each candidate node and according to a predetermined memory usage efficiency index. Memory usage efficiency metrics may include average memory usage and / or peak memory usage when running the neural network model. Those skilled in the art will understand that the node execution order that satisfies different memory usage efficiency metrics can be selected based on estimated storage requirements according to actual needs. In one embodiment, the candidate node execution order with the lowest storage requirements can be selected as the optimized node execution order. For example, the candidate node execution order with the lowest average memory usage and / or lowest peak memory usage when running the neural network model is selected as the optimized node execution order. In step S407, the updated execution strategy of the neural network model, including the optimized node execution order, is output.

[0057] The optimized node execution order obtained by following the above steps reduces the lifetime of one or more corresponding output feature maps of multiple nodes during execution, shortening the time the output feature maps reside in memory, thereby improving memory utilization efficiency. The method for optimizing the execution order of computational layers proposed in this invention satisfies the constraints of data dependencies between computational layers. Therefore, executing the neural network model based on the optimized node execution order does not change the architecture of the original neural network model, nor does it affect its performance. Furthermore, the optimized node execution order can be obtained offline based on the trained neural network model, without requiring real-time online modifications or retraining.

[0058] Those skilled in the art can employ various search algorithms to determine the possible execution order of candidate nodes. For example, the optimal execution order can be found by changing the execution order of nodes one by one and then evaluating the memory required for all possible execution orders. However, for large-scale, highly complex neural network models, changing the execution order of nodes one by one results in a very large number of possible node permutations, such as millions. This requires a very large amount of computation and a very long computation time to obtain the optimal node execution order.

[0059] To reduce the number of candidate node execution orders and shorten processing time, this application proposes grouping multiple nodes in a neural network model based on branch nodes, and finding all possible candidate execution orders for each branch group through permutations and combinations. This significantly reduces the number of candidate node execution orders and lowers the computational load in obtaining the optimal node execution order. Compared to the millions of candidate node execution orders obtained through exhaustive search, the number of candidate node execution orders can be reduced to tens to hundreds, while obtaining a better node execution order.

[0060] The basic concept of sorting branch groups based on "branch nodes" introduced in this application is that there are many data dependencies between nodes within the same branch, while the branches of a branch node are generally more independent and have fewer data dependencies. Once all the nodes on a branch have completed their operations, the feature maps associated with those nodes can be released from memory. Therefore, the overall memory usage time of the output feature maps of each node can be shortened, thereby improving memory utilization efficiency.

[0061] Figure 5 A schematic diagram of a branch node is shown. In this application, a node directly followed by at least two single-source type nodes is defined as a branch node. For example... Figure 5 As shown, node 501 directly follows nodes 502-1, 502-2, 502-3...502-N, which are single-source type nodes. Therefore, node 501 is a branch node. Figure 5 In the illustrated scenario, the output feature map of branch node 501 is invoked by the directly following nodes 502-1, 502-2, 502-3...502-N, and the output feature map of branch node 501 is released from memory after these nodes have finished running. Single-source type nodes are nodes that accept the output from only one node as their input. Single-source type nodes can include convolutional nodes, activation nodes, and pooling nodes, etc. Multi-source type nodes refer to those nodes that accept the output from more than one node as their input. Multi-source type nodes include elements-wise addition and concatenation nodes, etc.

[0062] Figure 6 A flowchart is shown for a preferred embodiment of optimizing the execution order of nodes in a neural network model based on branch nodes.

[0063] In step S601, a neural network model with a network descriptor is received, and an initial node execution order included in the network descriptor is obtained. In step S602, each node in the neural network model is scanned to identify all branch nodes, so that nodes directly following or following a branch node are grouped and sorted based on the identified branch nodes. In some embodiments, node grouping can be dynamic, and how branch grouping is performed dynamically is described in detail below. In step S603, two or more nodes directly following a branch node, and nodes whose data depend on each of the two or more nodes, are assigned to branch group 1.1, branch group 1.2... branch group 1.N, respectively. In step S604, all possible branch group candidate execution orders for branch group 1.1, branch group 1.2... branch group 1.N are arranged and combined. In step S605, the neural network model is run according to each branch group candidate execution order, and memory requirement estimation is performed for each candidate execution order. In step S606, based on the memory requirement estimation results, the candidate execution order with the minimum memory requirement is determined as the preferred node execution order for the current branch node. The initial node execution order can be updated based on this preferred node execution order, thus obtaining a preferred node execution order based on the current branch node. Compared to the initial node execution order, this reduces the overall residence time of the output feature maps of each node in memory.

[0064] In step S607, it is determined whether each branch group of the current node includes a lower-level branch node. If so, in step S608, the nodes in the lower-level branch node are further grouped, and steps S603-S607 are executed again until all branch nodes in the neural network model are traversed.

[0065] In a preferred embodiment, the execution order of each branch node can be optimized by traversing from the top root node (closest to the neural network input) in a top-down manner. Each traversal optimizes the execution order of the nodes with the highest dependency level (closest to the neural network output). However, since the optimization of low-dependency branch groups is performed first, the execution order of higher-dependency nodes is not optimized during the first optimization. Therefore, it is necessary to iterate and repeat steps S603-S607 multiple times to achieve global optimization of the node execution order. This allows for bottom-up optimization of the node execution order, starting from the node with the highest dependency level, ultimately achieving global optimization of the neural network model's node execution order.

[0066] In step S610, the number of traversal iterations of the target can be determined based on the level of the node with the highest dependency level. In step S609, it is determined whether the target number of iterations has been reached. If the target number of iterations has not been reached, S603-S609 can be executed repeatedly until the target number of iterations is reached to achieve global optimization of the node execution order.

[0067] The following is about Figure 6 Detailed explanation of each step in the embodiments.

[0068] In step S601, a neural network model with a network descriptor is received, and an initial node execution order included in the network descriptor is obtained. The initial node execution order can be random. In step S602, each node in the neural network model is scanned to identify all branch nodes, so that some nodes are grouped and sorted based on the identified branch nodes. Branch nodes can be identified based on a predetermined identification method, such as pattern matching. In one embodiment, if a node is directly followed by at least two single-source type nodes, the node is identified as a branch node, wherein the output feature map of the branch node is invoked by the at least two directly followed nodes, and the output feature map of the branch node is released from memory after the at least two nodes have finished running.

[0069] In step S603, at least two nodes that directly follow the branch node and nodes that are data dependent on the at least two nodes are assigned to branch group 1.1, branch group 1.2, ..., branch group 1.N of the first-level branch.

[0070] Figure 6A and Figure 6B This illustrates the process of identifying branch nodes and grouping them in a neural network model. (For example...) Figure 6AAs shown, node 1 directly follows two nodes, 2 and 3, which are of the same single-source type. Therefore, node 1 is identified as a branch node. Similarly, nodes 2, 3, and 10 all directly follow two or more nodes of the same single-source type. Therefore, nodes 2, 3, and 10 are identified as branch nodes. Figure 6B It shows the basis Figure 6A The process of branch node 1 being grouped with following nodes 2 and 3 is shown. Figure 6A As shown, nodes 4, 5, and 9 directly or indirectly depend on node 2, therefore nodes 2, 4, 5, and 9 are assigned to branch group 1.1. Similarly, nodes 6, 7, 8, 10, 11, 12, and 13 directly or indirectly depend on node 3, therefore nodes 3, 6, 7, 8, 10, 11, 12, and 13 are assigned to branch group 1.2.

[0071] In step S604, all possible branch group execution orders are arranged and combined. For example, for Figure 6B Branch groups 1.1 and 1.2 have two possible execution orders: order 01: branch group 1.1, branch group 1.2; and order 02: branch group 1.2, branch group 1.1. It's understandable that if a branch node has 3 branch groups, there are 6 possible execution orders; with 4 branch groups, there are 24 possible execution orders. For N branch groups, there are N! = N*(N-1)*(N-2)*…*3*2*1 possible execution orders.

[0072] In step S605, the neural network model is run in all possible execution orders obtained in step S604 to estimate memory requirements. For example, Figure 6B In the example, the neural network nodes run the neural network model and estimate memory requirements in sequence 01 and sequence 02 respectively. In this example, sequence 01, i.e., branch group 1.1 and branch group 1.2, have lower memory requirements.

[0073] In step S606, based on the estimation results in step S605, the initial node execution order is updated node by node in order of smaller memory requirements.

[0074] In step S607, if the next-level branch node is reached while updating a node in a branch group, the update is stopped. At least two nodes that directly follow the next-level branch node, as well as nodes that are data dependent on these at least two nodes, are assigned to branch groups 2.1, 2.2, ..., 2.N of the second-level branch.

[0075] For example, in updating Figure 6B When branch group 1.1 reaches branch node 2, the update stops. At this point, the updated sorting result is nodes 1, 2. For example... Figure 6AAs shown, branch node 2 directly follows nodes 4 and 5. Node 4, which directly follows node 2, and nodes whose data depends on node 4 are assigned to branch group 2.1, and node 5 and nodes whose data depends on node 5 are assigned to branch group 2.2. Then, steps S605 and S606 are repeated for branch groups 2.1 and 2.2 until all nodes in all branch groups of branch node 2 are updated.

[0076] like Figure 6A As shown, node 9's data depends on two nodes, 4 and 5. In this case, node 9 needs to be dynamically grouped. That is, in each execution order, node 9 is assigned to the last branch group executed in branch group 2.1 and branch group 2.2. In other words, node 9 can only be executed after nodes 4 and 5 have both been executed.

[0077] For example, such as Figure 6C As shown, for branch group 2.1 and branch group 2.2, there are two possible execution orders:

[0078] Sequence 01: Branch group 2.1, Branch group 2.2;

[0079] Sequence 02: Branch group 2.2, Branch group 2.1

[0080] When executed in order 01, branch group 2.2 is executed last, so node 4 is assigned to branch group 2.1, and nodes 5 and 9 are assigned to branch group 2.2. When executed in order 02, branch group 2.1 is executed last, so nodes 4 and 9 are assigned to branch group 2.1, and node 5 is assigned to branch group 2.2. In this example, node 9, whose data depends on both nodes 4 and 5, is dynamically assigned to the last executed branch group in different branch group execution orders.

[0081] Next, the neural network model is run in sequence 01 and sequence 02 to estimate the memory requirements respectively. The estimation result is that sequence 02 has a smaller memory requirement. Therefore, the nodes in branch group 2.1 and branch group 2.2 are reordered in sequence 02 and the updated nodes are sorted as: nodes 1, 2, 5, 4, 9.

[0082] After updating the node order in branch group 1.1, we move on to branch group 1.2, reaching the next-level branch node 3. The updated sorting result is now: nodes 1, 2, 5, 4, 9, 3. Figure 6D and Figure 6EAs shown, branch node 3 directly follows nodes 6, 7, and 8. Therefore, node 6, which directly follows node 3, and nodes whose data depends on node 6 are assigned to branch group 2.3; node 7 and nodes whose data depends on node 7 are assigned to branch group 2.4; and node 8 and nodes whose data depends on node 8 are assigned to branch group 2.5. For these three branch groups, there are the following 6 possible execution orders:

[0083] Sequence 01: Branch group 2.3, Branch group 2.4, Branch group 2.5;

[0084] Sequence 02: Branch group 2.3, Branch group 2.5, Branch group 2.4;

[0085] Sequence 03: Branch group 2.4, Branch group 2.3, Branch group 2.5;

[0086] Sequence 04: Branch group 2.4, Branch group 2.5, Branch group 2.3;

[0087] Sequence 05: Branch group 2.5, Branch group 2.3, Branch group 2.4;

[0088] Sequence 06: Branch group 2.5, Branch group 2.4, Branch group 2.3.

[0089] Since the number of node 13 depends on nodes 6 and 7, node 13 needs to be dynamically grouped. That is, node 13 is assigned to the last executed branch group in branch group 2.3 and branch group 2.4. Figure 6F The following lists the cases in sequence 01-06 where node 13 is dynamically assigned to different branch groups.

[0090] Next, the neural network models are run sequentially from 01 to 06 to estimate memory requirements. The estimation result shows that 05 has a smaller memory requirement. Therefore, the nodes in branch groups 2.3 to 2.5 are reordered according to 05 and the node order is updated.

[0091] The node in branch 2.3 is updated to the next-level branch node 10. The updated sorting result is: nodes 1, 2, 5, 4, 9, 3, 8, 6, 7, 10. (Example...) Figure 6G and Figure 6H As shown, nodes 11 and 12 directly follow node 10. Therefore, node 11, which directly follows node 10, and nodes whose data depends on node 11, are assigned to branch group 3.1, and node 12 and nodes whose data depends on node 12 are assigned to branch group 3.2. Since node 13's data depends on both nodes 11 and 12, node 13 needs to be dynamically grouped. (See diagram below.) Figure 6HAs shown, for branch group 3.1 and branch group 3.2, there are two possible execution orders:

[0092] Sequence 01: Branch group 3.1, Branch group 3.2;

[0093] Sequence 02: Branch group 3.2, Branch group 3.1.

[0094] When executed in sequence 01, branch group 3.2 is executed last, so node 11 is assigned to branch group 3.1, and nodes 12 and 13 are assigned to branch group 3.2. When executed in sequence 02, branch group 3.1 is executed last, so nodes 11 and 13 are assigned to branch group 3.1, and node 12 is assigned to branch group 3.2.

[0095] Next, the neural network model is run in order 01 and order 02 to estimate the storage requirements respectively. The estimation result is that order 02 has a smaller storage requirement. Therefore, the nodes in branch group 3.1 and branch group 3.2 are reordered in order 02 to obtain the updated node sorting result: nodes 1,2,5,4,9,3,8,6,7,10,12,11,13.

[0096] When Figure 7 When all nodes 1-13 in the neural network model shown complete the sorting and updating of all nodes according to the above steps S602-S609, the first optimized node sorting of the neural network model is obtained.

[0097] After the first optimization of node sorting, the node execution order is reordered to achieve local optimization. For example... Figure 7 As shown, starting from branch node 2, the execution order of 2, 5, 4, 9 is the optimal execution order based on analysis. Similarly, starting from branch node 10, the execution order of 10, 12, 11 is also the optimal execution order based on analysis.

[0098] Based on the first optimized node execution order: 1,{2,5,4,9},3,8,6,7,{10,12,11},13, all nodes 1-13 in the neural network model can be executed. Figure 6Steps S601-S610, as shown, are for the second iteration. As mentioned earlier, the optimized execution order of each branch group can be obtained by traversing each level of branch nodes from the top root node in a top-down manner. Each traversal optimizes the execution order of the node with the highest dependency level at the bottom. However, since the optimization sorting of low-dependency branch groups is performed first, the execution order of higher-dependency nodes is not optimized during the first optimization sorting. Therefore, it is necessary to repeatedly execute S603-S607 multiple times to achieve global optimization of the node execution order. This allows us to optimize the execution order of nodes at one dependency level at a time from the bottom up, starting from the node with the highest dependency level at the bottom, ultimately achieving global optimization of the node execution order of the neural network model.

[0099] To achieve global optimization of the execution order of all nodes, it is necessary to... Figure 6 The target number of iterations is determined at step S609. The target number of iterations depends on the maximum dependency level of the nodes in the neural network model. If the maximum dependency level of the nodes in the neural network model is M, then the target number of iterations is also M. For branch nodes, the dependency level of the current branch node is the highest dependency level among its parent branch nodes + 1; for non-branch nodes, the dependency level of the current non-branch node is the lowest dependency level among its parent branch nodes. Figure 8A The example shows Figure 6 The neural network model has 13 nodes belonging to three dependency levels: among which

[0100] Dependency level (1) includes node 1;

[0101] Dependency level (2) includes nodes 2, 3, 4, 5, 6, 7, 8, 9, and 13;

[0102] Dependency level (3) includes nodes 10, 11, and 12.

[0103] for Figure 8A Node 13 in the diagram is a non-branch node. Non-branch node 13 has a first parent branch node 10 and a second parent branch node 3. Since the dependency level of the first parent branch node 10 is (3) and the dependency level of the second parent branch node 3 is (2), the dependency level of non-branch node 13 is determined to be (2), which is the lowest dependency level (2) between the first parent branch node 10 and the second parent branch node 3 of non-branch node 13.

[0104] Figure 8BAn exemplary neural network model including nodes 1-15 is shown. In this neural network, two nodes are added immediately after node 13, so node 13 becomes a branch node. Branch node 13 has a first parent branch node 10 and a second parent branch node 3. Since the dependency level of the first parent branch node 10 is (3) and the dependency level of the second parent branch node 3 is (2), the dependency level of branch node 13 in this example is determined to be (4), that is, the highest dependency level (3) of the first parent branch node 10 and the second parent branch node 3 of branch node 13 plus (1).

[0105] After each iteration, the optimal execution order of the nodes with the highest dependency level can be obtained. For example, for the neural network model shown in Figure 8, after the first iteration, the execution order of nodes 10, 11, and 12 with the highest dependency level of 3 is optimized. After the second iteration, the execution order of nodes 2, 3, 4, 5, 6, 7, 8, 9, and 13 with dependency level of 2 is optimized. After the third iteration, the execution order of node 1 is optimized, thus achieving global execution order optimization.

[0106] Figure 9 The diagram schematically illustrates an apparatus 900 for optimizing the execution strategy of neural network nodes according to an embodiment of the present disclosure, in which various methods described herein can be implemented. Figure 9 As shown, the apparatus for optimizing the execution strategy of neural network nodes includes a receiving unit 901 configured to receive a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one computational layer of the neural network model; an identification unit 902 configured to identify data dependencies between the nodes; a determination unit 903 configured to determine constraints on the execution order of the multiple nodes based on the identified data dependencies between the nodes; a search unit 904 configured to perform at least one search algorithm to determine possible candidate node execution orders based on satisfying the execution order constraints; an estimation unit 905 configured to estimate the storage requirements related to the execution of the multiple nodes of the neural network model based on each candidate node execution order; an optimization unit 906 configured to select a node execution order from the candidate node execution orders based on the estimated storage requirements related to each candidate node execution order and according to a predetermined memory usage efficiency index; and an output unit 907 configured to output the selected node execution order of the neural network model. The apparatus 900 for optimizing the execution strategy of neural network nodes may also optionally include an update unit 908 configured to update the node execution order of each node based on the selected node execution order.

[0107] The above about Figure 9The various unit modules described can be implemented in hardware or in hardware in combination with software and / or firmware. For example, these modules can be implemented as computer program code / instructions configured to execute in one or more processors and stored in a computer-readable storage medium. Alternatively, these modules can be implemented as hardware logic / circuit. For example, in some embodiments, one or more of the receiving unit 901, identification unit 902, determination unit 903, search unit 904, estimation unit 905, optimization unit 906, output unit 907, and update unit 908 can be implemented together in a system-on-a-chip (SoC). The SoC may include an integrated circuit chip (which includes a processor (e.g., a central processing unit (CPU), microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and / or one or more components of other circuitry) and may optionally execute the received program code and / or include embedded firmware to perform functions. The techniques described herein are carrier-independent, meaning that these techniques can be implemented on a variety of computing platforms with various processors.

[0108] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as a computer program. For example, embodiments of this disclosure provide a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing at least one step in the method embodiments of this disclosure. Figure 10A The analysis illustrates memory usage based on neural network models Example 1 (Yolo) and Example 2 (InceptionV1). Both Example 1 (Yolo) and Example 2 (InceptionV1) underwent multiple iterative traversals as described in the preferred embodiments above to obtain an optimized node execution order. The two graphs in Figure 10 compare the memory usage of running the neural network models using the initial node execution order and reordering them using the optimized execution order. For Example 1 (Yolo), the average memory usage was reduced by 7% compared to the initial execution order. For Example 2 (InceptionV1), the average memory usage was reduced by 26% compared to the initial execution order. It is understood that the degree of optimization of the execution order depends on the number of branches in the neural network model structure. More branches mean more room for optimization.

[0109] Figure 10BThe diagram shows the memory usage recorded while running the neural network model. The dark shading represents memory usage using the initial execution order. The light shading represents memory usage after reordering using the optimized execution order. It can be seen that the reordered neural network model reduces both average and peak memory usage. Understandably, reducing peak memory usage allows for smaller memory footprints, lowering the system's BOM cost. Reduced average memory usage enables the system to run multiple network applications simultaneously, achieving higher frame rates and higher throughput.

[0110] Figure 11 A block diagram of a computing device 1100 according to an embodiment of the present disclosure is shown schematically. The computing device 1100 represents... Figure 9 The apparatus 900 for optimizing the execution strategy of neural network nodes includes a receiving unit 901, an identification unit 902, a determination unit 903, a search unit 904, an estimation unit 905, an optimization unit 906, an output unit 907, and an update unit 908.

[0111] The computing device 1100 can be of various types, such as a server computer, a device associated with a client (e.g., a client device), a system-on-a-chip, and / or any other suitable computing device or computing system.

[0112] The computing device 1100 may include at least one processor 1102, at least two memories 1104, a communication interface 1106, a display device 1108, other input / output (I / O) devices 1110, and one or more mass storage devices 1112 that are capable of communicating with each other, such as by connecting to each other via a system bus 1114 or other suitable means.

[0113] Processor 1102 may be a single processing unit or at least two processing units, and all processing units may include a single or at least two computing units or at least two cores. Processor 1102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and / or any device that manipulates signals based on operating instructions. Among other capabilities, processor 1102 may be configured to acquire and execute computer-readable instructions stored in memory 1104, mass storage device 1112, or other computer-readable media, such as program code of operating system 1116, program code of application program 1118, program code of other program 1120, etc., to implement the method of optimizing the execution strategy of neural network models provided in the embodiments of this disclosure.

[0114] Memory 1104 and mass storage device 1112 are examples of computer storage media for storing instructions that are executed by processor 1102 to perform the various functions described above. For example, memory 1104 may generally include both volatile and non-volatile memory (e.g., RAM, ROM, etc.). Furthermore, mass storage device 1112 may generally include hard disk drives, solid-state drives, removable media (including external and removable drives), memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network storage, storage area networks, etc. Both memory 1104 and mass storage device 1112 may be collectively referred to herein as memory or computer storage media, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that can be executed by processor 1102 as a specific machine configured to perform the operations and functions described in the examples herein.

[0115] At least two program modules may be stored on mass storage device 1112. These programs include operating system 1116, one or more application programs 1118, other programs 1120, and program data 1122, and they may be loaded into memory 1104 for execution. Examples of such application programs or program modules may include, for example, computer program logic (e.g., computer program code or instructions) for implementing the various processing units of this disclosure.

[0116] Although Figure 11 The modules 1116, 1118, 1120, and 1122, or portions thereof, are illustrated as being stored in memory 1104 of computing device 1100; however, modules 1116, 1118, 1120, and 1122 may be implemented using any form of computer-readable medium accessible by computing device 1100. As used herein, “computer-readable medium” may include one or more types of computer-readable media, such as computer storage media and / or communication media.

[0117] Computer storage media includes volatile and non-volatile, removable and non-removable media implemented by any method or technology for storing information, such information as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, DVD, or other optical storage devices, magnetic cartridges, magnetic tapes, disk storage devices or other magnetic storage devices, or any other non-transfer medium that can be used to store information for access by computing devices.

[0118] In contrast, communication media can embody computer-readable instructions, data structures, program modules, or other data within modulated data signals such as carrier waves or other transmission mechanisms. Computer storage media as defined herein do not include communication media.

[0119] The computing device 1100 may also include one or more communication interfaces 1106 for exchanging data with other devices, such as via a network, direct connection, etc. The communication interface 1106 can facilitate communication across various network and protocol types, including wired networks (e.g., LAN, cable, etc.) and wireless networks (e.g., WLAN, cellular, satellite, etc.), the Internet, etc. The communication interface 1106 can also provide communication with external storage devices (not shown), such as storage arrays, network storage, storage area networks, etc.

[0120] In some examples, a display device 1108, such as a monitor, may be included for displaying information and images. Other I / O devices 1110 may be devices that receive various inputs from the user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input / output devices, and so on.

[0121] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, functional units, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0122] By studying the accompanying drawings, the disclosure, and the appended claims, those skilled in the art can understand and implement variations of the disclosed embodiments in practicing the claimed subject matter. In the claims, the words "A and / or B" refer to A, B, or A and B; the word "comprising" does not exclude other elements or steps; and the indefinite articles "a" or "an" do not exclude a plurality; the words "first," "second," "third," and "fourth" are used merely to distinguish elements or steps and do not indicate the order of the elements or steps. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be beneficial.

[0123] The following are several examples of embodiments disclosed in this application.

[0124] Example 1. A method for optimizing the execution strategy of a neural network model, comprising:

[0125] Receive a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one operational layer of the neural network model;

[0126] Identify the data dependencies between the nodes;

[0127] Constraints on the execution order of multiple nodes are determined based on the identified data dependencies between the nodes.

[0128] Based on the constraints of the execution order, at least one search algorithm is used to determine the possible execution order of candidate nodes;

[0129] Based on the execution order of each candidate node, estimate the storage requirements associated with the execution of multiple nodes in the neural network model;

[0130] Based on the estimated storage requirements associated with each candidate node execution order, a node execution order is selected from the candidate node execution orders according to a predetermined memory usage efficiency index, wherein changing the node execution order can change the time that the corresponding outputs of one or more nodes among multiple nodes reside in memory during the execution of the neural network model; and

[0131] Output the node execution order of the selected neural network model.

[0132] Example 2. According to the method described in Example 1, wherein if the input of a node comes from the output of another node, then the data of a node depends on the other node, and the constraint on the execution order of the two nodes is determined as follows: the other node is computed first and then the first node is computed; and each node generates a corresponding output after it runs, and the generated output resides in memory until all nodes that call the output have finished running and are released from memory.

[0133] Example 3. The method according to Example 1, wherein the memory usage efficiency index includes average memory usage and / or peak memory usage, and the step of selecting a node execution order from the candidate node execution order according to a predetermined memory usage efficiency index includes selecting the candidate node execution order with the lowest storage requirement as the optimized node execution order, including selecting the candidate node execution order with the lowest average memory usage and / or lowest peak memory usage when running the neural network model as the optimized node execution order.

[0134] Example 4. According to the method described in Example 1, the step of identifying data dependencies among the plurality of nodes includes:

[0135] Step A: Scan multiple nodes in the neural network model based on a predetermined identification method to identify at least one branch node, wherein if a node is directly followed by at least two single-source type nodes, the node is identified as a branch node, wherein the output of the branch node is invoked by the at least two directly followed nodes, and the output of the branch node is released from memory after the at least two nodes have finished running.

[0136] Step B involves identifying at least two nodes that directly follow the identified branch node as the starting nodes of at least two branch groups, and assigning each branch group starting node, along with nodes whose data depends on that branch group starting node, to the same branch group. The starting nodes of at least two branch groups, along with the nodes of the corresponding branch group starting nodes, are then assigned to branch group 1, branch group 2, ..., branch group N.

[0137] Example 5. According to the method described in Example 4, the steps of determining the possible candidate node execution order and selecting the node execution order include:

[0138] Step C: Determine the possible candidate execution order of all branch groups from branch group 1 to branch group N, and estimate the storage requirements related to the execution of the neural network model based on each candidate execution order; and

[0139] Step D: Based on the estimated storage requirements associated with each candidate execution order of branch groups, and according to a predetermined memory usage efficiency metric, select one branch group execution order from the candidate execution orders as the preferred execution order for the branch group based on the identified branch node.

[0140] Specifically, the node execution order is updated based on the preferred execution order of the branch group.

[0141] Example 6. The method described in Example 4, wherein the single-source type running node accepts only the output from one node as its input.

[0142] Example 7. The method according to Example 6, wherein the single-source type running node includes convolution node, activation node and pooling node types.

[0143] Example 8. The method according to Example 5, wherein multiple nodes in the neural network model are scanned to identify all branch nodes, and steps B to D are performed for the first branch node scanned from the top root node closest to the input in the neural network model, the method further comprising:

[0144] Step E: If the next branch node is reached when updating the execution order of nodes in branch group 1 to branch group N, the update is stopped, and at least two nodes that directly follow the next branch node are identified as at least two branch group start nodes. The at least two branch group start nodes and the nodes that are data dependent on the at least two branch group start nodes are respectively designated as branch group 1, branch group 2, ..., branch group N based on the next branch node.

[0145] Repeat step CE until all identified branch nodes have been processed; and

[0146] Step F: Output the optimized node execution order.

[0147] Example 9. According to the method described in Example 8, step B further includes, if a node's data depends on two or more branch group start nodes, in each branch group candidate execution order, dynamically assigning the node to the branch group containing the last executed branch group start node whose data depends in that branch group candidate execution order.

[0148] Example 10. The method according to Example 9 further includes iteratively repeating step BF based on the optimized node execution order output in step F.

[0149] Example 11. The method according to Example 10, wherein the number of times step BF is repeated is determined based on the maximum dependency level M of the nodes in the neural network model, wherein the dependency level of each node is determined by the following steps:

[0150] For a branch node, the dependency level of the current branch node is the highest dependency level among its parent branch nodes plus 1; and

[0151] For non-branch nodes, the dependency level of the current non-branch node is the lowest dependency level among its parent branch nodes.

[0152] Example 12. According to the method described in Example 11, the optimal execution order of nodes with the highest dependency level (Mth level) is obtained by first executing the optimized node execution order updated in step E, the optimal execution order of nodes with the highest dependency level (M-1th level) is obtained by executing the optimized node execution order updated in step E a second time, and the optimal execution order of nodes with the highest dependency level (Mth level) is obtained by executing the optimized node execution order updated in step E a third time.

[0153] Example 13. An apparatus (900) for optimizing the execution strategy of neural network nodes, comprising:

[0154] The receiving unit (901) is configured to receive a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one operational layer of the neural network model;

[0155] The identification unit (902) is configured to identify the data dependencies between the nodes;

[0156] The determining unit (903) is configured to determine constraints on the execution order of multiple nodes based on the identified data dependencies between the nodes;

[0157] The search unit (904) is configured to perform at least one search algorithm to determine the possible execution order of candidate nodes based on the constraints of the execution order.

[0158] The estimation unit (905) is configured to estimate the storage requirements associated with the execution of multiple nodes of the neural network model based on the execution order of each candidate node;

[0159] The optimization unit (906) is configured to select a node execution order from the candidate node execution orders based on estimated storage requirements associated with each candidate node execution order and according to a predetermined memory usage efficiency index, wherein changing the node execution order can change the time that the corresponding outputs of one or more nodes among a plurality of nodes reside in memory during the execution of the neural network model; and

[0160] The output unit (907) is configured to output the node execution order of the selected neural network model.

[0161] Example 14. The apparatus according to Example 13, wherein if the input of one node comes from the output of another node, then the data of one node depends on the other node, and the constraint on the execution order of the two nodes is determined as follows: the other node is computed first and then the first node is computed; and each node generates a corresponding output after it runs, and the generated output resides in memory until all nodes that call the output have finished running and are released from memory.

[0162] Example 15. The apparatus according to Example 13, wherein the memory usage efficiency metrics include average memory usage and / or peak memory usage, and the optimization unit is further configured to select the candidate node execution order with the lowest memory requirements as the optimized node execution order, including selecting the candidate node execution order with the lowest average memory usage and / or lowest peak memory usage when running the neural network model as the optimized node execution order.

[0163] Example 16. The apparatus according to Example 13, wherein the identification unit (902) is further configured to perform:

[0164] Step A: Scan multiple nodes in the neural network model based on a predetermined identification method to identify at least one branch node, wherein if a node is directly followed by at least two single-source type nodes, the node is identified as a branch node, wherein the output of the branch node is invoked by the at least two directly followed nodes, and the output of the branch node is released from memory after the at least two nodes have finished running.

[0165] Step B involves identifying at least two nodes that directly follow the identified branch node as the starting nodes of at least two branch groups, and assigning each branch group starting node, along with nodes whose data depends on the corresponding branch group starting node, to the same branch group. The starting nodes of at least two branch groups, along with the nodes of the corresponding branch group starting node, are assigned to branch group 1, branch group 2, ..., branch group N.

[0166] Example 17. The apparatus according to Example 16, wherein the search unit (904) is further configured to perform:

[0167] Step C: Determine the possible candidate execution order of all branch groups from branch group 1 to branch group N, and estimate the storage requirements related to the execution of the neural network model based on each candidate execution order; and

[0168] The selection unit (906) is also configured to perform:

[0169] Step D: Based on the estimated storage requirements associated with each candidate execution order of branch groups, and according to a predetermined memory usage efficiency metric, select one branch group execution order from the candidate execution orders as the preferred execution order for the branch group based on the identified branch node.

[0170] The device further includes an update unit (908) configured to update the node execution order of each node based on the preferred execution order of the branch group.

[0171] Example 18. The apparatus according to Example 16, wherein the single-source type running node accepts only the output from one node as its input.

[0172] Example 19. The apparatus according to Example 18, wherein the single-source type running node includes convolutional nodes, activation nodes, and pooling node types.

[0173] Example 20. The apparatus according to Example 17, wherein the identification unit (902) is further configured to scan a plurality of nodes in the neural network model to identify all branch nodes, performing steps B to D for the first branch node scanned from the top root node near the input in the neural network model, and the search unit (904) is further configured to perform:

[0174] Step E: If the next branch node is reached when updating the execution order of nodes in branch group 1 to branch group N, the update is stopped, and at least two nodes that directly follow the next branch node are identified as at least two branch group start nodes. The at least two branch group start nodes and the nodes that are data dependent on the at least two branch group start nodes are respectively designated as branch group 1, branch group 2, ..., branch group N based on the next branch node.

[0175] The device is also configured to repeatedly execute step CE until all identified branch nodes have been processed, and

[0176] The output unit (907) is further configured to perform: step F, outputting the optimized node execution order.

[0177] Example 21. The apparatus according to Example 20, wherein the identification unit (902) is further configured to, in each branch group candidate execution order, dynamically assign the node to the branch group in which the last executed branch group start node of the branch group to which its data depends is located if a node data depends on two or more branch group start nodes.

[0178] Example 22. The apparatus according to Example 21, wherein the apparatus is further configured to iteratively repeat step BF based on the optimized nodes output in step F.

[0179] Example 23. The apparatus according to Example 21, wherein the number of times step BF is repeated is determined based on the maximum dependency level M of the nodes in the neural network model, wherein the dependency level of each node is determined by the following steps:

[0180] For a branch node, the dependency level of the current branch node is the highest dependency level among its parent branch nodes plus 1; and

[0181] For non-branch nodes, the dependency level of the current non-branch node is the lowest dependency level among its parent branch nodes.

[0182] Example 24. According to the apparatus of Example 23, the optimal execution order of nodes with the highest dependency level M is obtained by first executing the optimized node execution order updated in step E; the optimal execution order of nodes with the highest dependency level M is obtained by executing the optimized node execution order updated in step E a second time; the optimal execution order of nodes with the highest dependency level M-1 is obtained by executing the optimized node execution order updated in step E a third time; and the optimal execution order of nodes with the highest dependency level 1 is obtained by executing the optimized node execution order updated in step E a third time.

[0183] Example 25. A computer-readable storage medium having a computer program stored thereon that, when run on a computing device, causes the computing device to perform the method according to any one of Examples 1 to 12.

Claims

1. A method for optimizing the execution strategy of a neural network model, comprising: Receive a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one operational layer of the neural network model; Identify the data dependencies between the nodes; Constraints on the execution order of multiple nodes are determined based on the identified data dependencies between the nodes. Based on the constraints of the execution order, at least one search algorithm is used to determine the possible execution order of candidate nodes; Based on the execution order of each candidate node, estimate the storage requirements associated with the execution of multiple nodes in the neural network model; Based on the estimated storage requirements associated with the execution order of each candidate node, a node execution order is selected from the candidate node execution orders according to a predetermined memory usage efficiency index. Changing the node execution order can alter the lifetime of the corresponding output feature maps of one or more nodes among multiple nodes during the execution of the neural network model, shortening the time the output resides in memory. Output the execution order of the nodes in the selected neural network model; The steps for identifying the data dependencies between the multiple nodes include: Step A: Scan multiple nodes in the neural network model based on a predetermined identification method to identify at least one branch node, wherein if a node is directly followed by at least two single-source type nodes, the node is identified as a branch node, wherein the output of the branch node is invoked by the at least two directly followed nodes, and the output of the branch node is released from memory after the at least two nodes have finished running. Based on the branch nodes in the neural network model, multiple nodes in the neural network are grouped, and all possible candidate execution orders for each branch group are found by arranging and combining each branch group.

2. The method according to claim 1, wherein, If the input of one node comes from the output of another node, then the data of one node depends on the other node, and the constraint that determines the execution order of the two nodes is: the other node is computed first and then the first node is computed; and each node generates a corresponding output after it runs, and the generated output resides in memory until all nodes that call the output have finished running and are released from memory.

3. The method according to claim 1, wherein, The memory usage efficiency metrics include average memory usage and / or peak memory usage. The step of selecting a node execution order from the candidate node execution order according to the predetermined memory usage efficiency metrics includes selecting the candidate node execution order with the lowest storage requirements as the optimized node execution order, including selecting the candidate node execution order with the lowest average memory usage and / or lowest peak memory usage when running the neural network model as the optimized node execution order.

4. The method according to claim 1, wherein, After step A, identifying the data dependencies between the multiple nodes also includes: Step B involves identifying at least two nodes that directly follow the identified branch node as the starting nodes of at least two branch groups, and assigning each branch group starting node, along with nodes whose data depends on that branch group starting node, to the same branch group. The starting nodes of at least two branch groups, along with the nodes of the corresponding branch group starting nodes, are then assigned to branch group 1, branch group 2, ..., branch group N.

5. The method according to claim 4, wherein, The steps for determining the possible candidate node execution order and selecting the node execution order include: Step C: Determine the possible candidate execution order of all branch groups from branch group 1 to branch group N, and estimate the storage requirements related to the execution of the neural network model based on each candidate execution order; and Step D: Based on the estimated storage requirements associated with each candidate execution order of branch groups, and according to a predetermined memory usage efficiency metric, select one branch group execution order from the candidate execution orders as the preferred execution order for the branch group based on the identified branch node. Specifically, the node execution order is updated based on the preferred execution order of the branch group.

6. The method according to claim 4, wherein, The single-source type running node only accepts the output from one node as its input.

7. The method according to claim 6, wherein, The single-source type of running nodes includes convolutional nodes, activation nodes, and pooling node types.

8. The method according to claim 5, wherein, The method further includes scanning multiple nodes in the neural network model to identify all branch nodes, and performing steps B to D on the first branch node scanned from the top root node closest to the input in the neural network model. Step E: If the next branch node is reached when updating the execution order of nodes in branch group 1 to branch group N, the update is stopped, and at least two nodes that directly follow the next branch node are identified as at least two branch group start nodes. The at least two branch group start nodes and the nodes that are data dependent on the at least two branch group start nodes are respectively designated as branch group 1, branch group 2, ..., branch group N based on the next branch node. Repeat step CE until all identified branch nodes have been processed; and Step F: Output the optimized node execution order.

9. An apparatus (900) for optimizing the execution strategy of neural network nodes, comprising: The receiving unit (901) is configured to receive a neural network model, the neural network model including multiple nodes, wherein each node corresponds to at least one operational layer of the neural network model; The identification unit (902) is configured to identify the data dependencies between the nodes; The determining unit (903) is configured to determine constraints on the execution order of multiple nodes based on the identified data dependencies between the nodes; The search unit (904) is configured to perform at least one search algorithm to determine the possible execution order of candidate nodes based on the constraints of the execution order. The estimation unit (905) is configured to estimate the storage requirements associated with the execution of multiple nodes of the neural network model based on the execution order of each candidate node; The optimization unit (906) is configured to select a node execution order from the candidate node execution orders based on estimated storage requirements associated with each candidate node execution order and according to a predetermined memory usage efficiency index. Changing the node execution order can alter the lifetime of the corresponding output feature maps of one or more nodes among multiple nodes during the execution of the neural network model, shortening the time the output resides in memory. The output unit (907) is configured to output the selected node execution order of the neural network model; The steps for identifying the data dependencies between the multiple nodes include: Step A: Scan multiple nodes in the neural network model based on a predetermined identification method to identify at least one branch node, wherein if a node is directly followed by at least two single-source type nodes, the node is identified as a branch node, wherein the output of the branch node is invoked by the at least two directly followed nodes, and the output of the branch node is released from memory after the at least two nodes have finished running. Based on the branch nodes in the neural network model, multiple nodes in the neural network are grouped, and all possible candidate execution orders for each branch group are found by arranging and combining each branch group.

10. A computer-readable storage medium having a computer program stored thereon that, when run on a computing device, causes the computing device to perform the method according to any one of claims 1 to 8.