Neural network graph partitioning system and method

The multi-stage neural network graph partitioning system solves the problem of efficient partitioning of neural network graphs on resource-constrained hardware, optimizes inference time and memory bandwidth, improves hardware utilization, and is applicable to a variety of accelerators.

CN115829013BActive Publication Date: 2026-06-26BLACK SESAME TECH (CHONGQING) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BLACK SESAME TECH (CHONGQING) CO LTD
Filing Date
2022-11-22
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

When running neural networks on resource-constrained hardware systems, existing technologies suffer from limitations in computing power and memory, making it difficult to efficiently partition neural network graphs. This results in low hardware utilization, insufficient optimization of memory bandwidth and inference time, and existing methods are either highly complex or dependent on real hardware models.

Method used

A multi-stage neural network graph partitioning system is adopted. By creating whitelists and blacklists, nodes are generated and boundaries are partitioned based on cost functions to optimize subgraph generation. It supports partitioning of heterogeneous accelerators and provides a solution with the best hardware utilization and low time complexity.

Benefits of technology

It enables efficient partitioning of neural network graphs on resource-constrained hardware, optimizes inference time, memory bandwidth and power consumption, improves hardware utilization, reduces time complexity, and is applicable to various accelerators such as CPU, GPU, and ASIC.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115829013B_ABST
    Figure CN115829013B_ABST
Patent Text Reader

Abstract

A neural network graph partitioning system and method are disclosed. The graph partitioning system is used to partition a neural network graph into a series of subgraphs and allows multiple subgraphs to be executed in available hardware subsystems. The system estimates the computation time and memory bandwidth of the partitioned subgraphs based on a cost function. The graph partitioning system is a hardware cycle estimation model that can be run quickly and implements parameterization of memory latency. The graph partitioning system supports heterogeneous partitioning for different types of accelerators, such as central processing units, graphics processing units, application specific integrated circuits, etc. A method of partitioning a neural network graph into a series of subgraphs is also disclosed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of neural networks. More specifically, this application relates to a system and method for partitioning a graph into a series of subgraphs for running neural networks on resource-constrained hardware systems. Background Technology

[0002] The increasing use of neural networks in ubiquitous microcomputing environments may lead to resource constraints in terms of platform computing power and memory. Typically, in cases involving large nodes (e.g., highly deep convolutional networks), AI chips may not be able to load the entire graph onto the chip. One possible solution for running neural networks on resource-constrained hardware systems is to partition the original neural network graph into multiple subgraphs, allowing multiple subgraphs to be executed by available subsystems. Graph partitioning can provide a possible solution for resource-constrained AI applications.

[0003] The motivation for designing graph partitioning algorithms is based on several factors, as described below. The first factor is the requirement that different operators run on different hardware backends (such as arrays, digital signal processors (DSPs), etc.). The second factor is the need to divide the large graph into multiple smaller subgraphs so that these subgraphs can meet hardware constraints. The final factor is the need to design and determine the execution order of these subgraphs on resource-constrained hardware systems.

[0004] As neural networks grow in scale, optimizing neural network inference on resource-constrained embedded devices becomes a challenging problem. Due to hardware limitations in computing power and memory, it is necessary to finely divide the neural network model graph into small subgraphs and break down computations into small chunks that can be executed efficiently on hardware. This optimizes the entire processing in terms of inference time, memory bandwidth, and memory footprint, thereby also optimizing power consumption.

[0005] Graph partitioning optimization addresses several factors, including: First, changes in model topology affect the hardware computation order. Second, hardware limitations in operational support and internal buffer constraints on model parameters. Third, the availability of graph partitioning algorithms impacts non-ideal external Double Data Rate (DDR) memory access. Fourth, the need to generalize graph partitioning optimization methods to different models rather than a single model. Finally, the interaction between graph partitioning optimization and parameter values. The ultimate consequence of these factors is a very large search and optimization space that is difficult to describe using parameter-based graph partitioning methods.

[0006] Several alternative methods can be used to perform graph partitioning and its optimization, as described below. The first known method for graph partitioning is a manual approach. This method is based on a configuration with graph boundary information. This approach requires experience and understanding of hardware limitations, manually determining how to partition the graph into subgraphs, and determining the stripe size based on the tensor shape. The limitation of this approach is that it is not scalable and is verbose.

[0007] The second known method for graph partitioning is based on parametric formulas. In this method, the graph is partitioned into subgraphs based on a global parametric formula. The limitation of this method is that the formula is applied globally and is suboptimal. Another limitation is that the formulas depend on order and can conflict with each other. The third method for graph partitioning is the limit partitioning method. This method is based on a configuration with information about the graph's nodes. In this method, the graph is partitioned into subgraphs, each containing only one node. In this method, the number of nodes in the original graph equals the number of subgraphs. The limitation of this method is its low hardware utilization.

[0008] The fourth method for graph partitioning is dynamic programming partitioning. This method relies on the availability of a subgraph checker. In this method, the subgraph checker function is used to check the rules that validate the validity of the subgraph. The start and end points of the subgraph are obtained using a dynamic programming algorithm. The limitation of this graph partitioning method is its high time complexity due to the use of dynamic programming. The final method for graph partitioning is the exhaustive search method. In this method, graph partitioning is performed by trying all parameter values ​​associated with the graph partitioning. The limitation of this method is that it requires real hardware or a periodically accurate model. The search space is also too large.

[0009] Numerous existing technologies describe graph partitioning methods in detail. US Patent 8,364,615, attributed to Microsoft Corporation, discloses a local graph partitioning method using evolutionary set processing. In this patent, the transformation algorithm can expand or shrink the analyzed set of nodes based on node features at the boundaries of the analyzed set. Therefore, when the analyzed set of nodes becomes large, this method achieves significant processing efficiency by transforming the set or determining conductance using node features at the boundaries, rather than analyzing all nodes in the set. Although the transformation algorithm described herein is based on node-partitioned graphs, its processing speed and efficiency are not high on resource-constrained hardware.

[0010] Another U.S. Publication, 20190266191, attributed to Huawei Technologies Co., Ltd., discloses a graph partitioning method. In this patent, graph boundaries are randomly assigned to different devices. While this method achieves relatively high graph partitioning efficiency, a drawback is that it requires real hardware or a periodically accurate model to run the subgraph.

[0011] Therefore, a multi-stage graph partitioning system is needed to divide a graph into a series of subgraphs, allowing available subsystems (e.g., resource-constrained hardware) to execute multiple subgraphs. A graph partitioning method is required to address resource-constrained artificial intelligence applications.

[0012] It is evident that numerous graph partitioning methods and systems have been developed in the related art, applicable to various purposes. Furthermore, while these inventions may be suitable for their specific purposes, they may not be suitable for the purposes of this invention as described above. Therefore, there is a need for an efficient multi-stage graph partitioning system that supports heterogeneous partitioning of different types of accelerators (such as central processing units (CPUs), processors, chips, etc.) and provides users with optimal hardware utilization. Summary of the Invention

[0013] According to the present invention, by providing a multi-stage neural network graph partitioning system for running neural networks on resource-constrained hardware systems, the drawbacks and limitations of the prior art are largely avoided. The neural network graph partitioning system includes a list creation unit configured to create whitelists and blacklists. The list creation unit includes details regarding the partitioning boundaries of the neural network graph, the whitelist, and the blacklist.

[0014] The neural network graph partitioning system further includes a node generation unit configured to perform hard partitioning of the graph boundaries to generate multiple nodes in a whitelist and multiple nodes in a blacklist. The nodes in the blacklist represent partition boundaries. The neural network graph partitioning system also includes an optimization unit configured to apply optimized grouping to the multiple nodes in the whitelist. The optimization unit applies optimization to known good partition points.

[0015] The neural network graph partitioning system further includes a partitioning generation unit configured to partition the neural network graph based on a defined cost function and hardware memory constraints. The partitioning unit generates partition boundaries based on the cost function between two nodes in the neural network graph.

[0016] Furthermore, the partitioning generation unit allocates the lowest-cost path between two nodes among the multiple nodes. The partitioning generation unit generates multiple partitioning paths ordered according to the cost function among the multiple nodes. The neural network graph partitioning system also includes a subgraph generator configured to generate multiple subgraphs based on the partitioning paths, wherein the partitioning paths are ordered based on the cost function or cost function values.

[0017] The neural network graph partitioning system also includes a subgraph optimizer configured to optimize a subgraph by searching for hyperparameter computation node constraints within the subgraph. When selecting scheduling parameters, the subgraph optimizer assigns at least one workflow to multiple subgraphs.

[0018] The main objective of this invention is to provide a system for dividing a neural network graph into a series of subgraphs that support heterogeneous partitioning of different types of accelerators, such as CPUs, graphics processing units (GPUs), application-specific integrated circuits (ASICs), accelerator integrated circuits, digital signal processors, microprocessor chips, neural network chips, or artificial intelligence chips.

[0019] Another objective of this invention is to provide a multi-stage system for partitioning neural network graphs based on a cost function, whereby the cost function is defined as the computation time and memory bandwidth required to estimate the partitioning of the subgraph.

[0020] Another object of the present invention is to provide a system for partitioning neural network graphs that provides users with optimal hardware utilization and defines a cycle estimation model for hardware that can run quickly and parameterize memory latency.

[0021] Another object of the present invention is to provide a system for partitioning neural network graphs that automatically meets hardware constraints, finds the best possible partitioning pattern, and reports the best partitioning pattern to the user.

[0022] Another objective of this invention is to provide a system for partitioning neural network graphs based on a cost function, which has lower time complexity compared to complex algorithms such as dynamic programming.

[0023] Another object of the present invention is to provide a system for partitioning graphs, which provides a possible solution for resource-constrained artificial intelligence applications.

[0024] In a preferred embodiment of the present invention, the neural network graph partitioning system breaks down computation into smaller graphs or subgraphs that can be efficiently executed on hardware, thereby optimizing the entire processing in terms of inference time, memory bandwidth and memory usage, and consequently optimizing power consumption.

[0025] In one embodiment of the invention, the system for partitioning neural network graphs provides optimized graph partitioning support for neural network chips, artificial intelligence chips, and different types of accelerators (such as CPUs, GPUs, ASICs, integrated circuits, digital signal processors, and microprocessor chips).

[0026] In another embodiment of the invention, the system for partitioning a neural network graph automatically satisfies hardware limitations. The system for partitioning a neural network graph into a series of subgraphs that can run on resource-constrained hardware systems divides the original neural network graph into multiple subgraphs, enabling these subgraphs to be run on multiple available subsystems.

[0027] In another embodiment of the invention, a method for dividing a neural network graph into a series of subgraphs is described. The method includes the step of obtaining the neural network graph from a neural network chip or an artificial intelligence chip. The method also includes the step of listing whitelists and blacklists, as well as the division boundaries of the neural network graph.

[0028] The method further includes the step of applying hard partitioning of the graph boundaries to generate multiple nodes in the blacklist and whitelist. Further, the method includes the step of optimizing the grouping of multiple nodes in the whitelist and blacklist to obtain known good split points. Further, the method includes the step of generating split boundaries within the neural network graph based on a cost function between two nodes for partitioning.

[0029] The method further includes the step of calculating the lowest-cost path from a start node to any end node in the subgraph. Further, the method includes the step of assigning the lowest-cost path between two nodes in a plurality of nodes. Further, the method includes generating multiple partition paths ordered according to a cost function between the plurality of nodes. Further, the method includes the step of generating multiple subgraphs from the partition paths ordered according to a cost function between the plurality of nodes. The method also includes assigning a single workflow to the corresponding subgraph when selecting scheduling parameters.

[0030] Embodiments of the present invention may employ any or all of the exemplary aspects described above. Those skilled in the art will further understand the above-described features and advantages, as well as other important aspects of the present invention, upon reading the following detailed description taken in conjunction with the accompanying drawings, which exemplify features according to embodiments of the present invention. The content of this invention is not intended to limit the scope of the invention; the invention is defined only by the appended claims. To achieve the above and related objectives, the present invention may be implemented in the form shown in the accompanying drawings; however, please note that the drawings are merely illustrative and modifications may be made to the specific structures shown and described within the scope of the appended claims.

[0031] Although the invention has been described above with reference to various exemplary embodiments and implementations, it should be understood that the various features, aspects, and functions described in one or more individual embodiments are not limited to their applicability to the specific embodiments described, but can be applied individually or in various combinations to one or more embodiments of the invention, whether or not those embodiments are described and whether or not these features are presented as part of the described embodiments. Therefore, the breadth and scope of the invention should not be limited by any of the exemplary embodiments described above.

[0032] In certain circumstances, “one or more,” “at least,” “but not limited to,” or other similar extended terms should not be interpreted as an intentional or necessary use of a narrow term in situations where such extended terms might not exist. Attached Figure Description

[0033] The objects and features of the present invention will become clearer from the following description and appended claims in conjunction with the accompanying drawings. It is to be understood that these drawings depict only exemplary embodiments of the invention and should not be construed as limiting its scope. The invention will now be described and explained with additional specific features and details using the accompanying drawings, wherein:

[0034] Figure 1 A neural network graph partitioning system according to the present invention is shown for partitioning a graph into a series of subgraphs.

[0035] Figure 2 A partitioning generation unit of the graph partitioning system according to the present invention is shown.

[0036] Figure 3 A method for partitioning neural network graphs according to the present invention is shown. Detailed Implementation

[0037] This invention provides a multi-stage, cost function-based parameter search system for neural network graph partitioning with loose design constraints. The neural network graph partitioning system is used to run neural networks on resource-constrained hardware systems. The system partitions the original neural network graph into multiple subgraphs, enabling these subgraphs to be run by multiple available subsystems. The system for partitioning neural network graphs provides optimized graph partitioning support for neural network chips, artificial intelligence chips, and various types of accelerators (such as CPUs, GPUs, ASICs, integrated circuits, digital signal processors, and microprocessor chips).

[0038] Figure 1 A neural network graph partitioning system for dividing a graph into a series of subgraphs according to the present invention is illustrated. A multi-stage system 100 is used to partition a neural network graph into a series of multiple subgraphs. System 100 includes a list creation unit 200 for creating a whitelist and a blacklist. The list creation unit 200 includes details of the partition boundaries, a whitelist, and a blacklist of the neural network graphs to be partitioned or divided.

[0039] The graph partitioning system 100 for partitioning neural network graphs also includes a node generation unit 300, wherein the node generation unit is configured to generate multiple nodes in a blacklist and multiple nodes in a whitelist by utilizing hard partitioning of graph boundaries. The nodes in the blacklist represent partition boundaries.

[0040] The graph partitioning system 100 for partitioning neural network graphs also includes an optimization unit 400 configured to apply optimization grouping to multiple nodes in a whitelist. The optimization unit applies optimization to obtain known good partition points. According to the rules of the whitelist, nodes in the whitelist cannot be considered as boundaries of the graph partition.

[0041] The graph partitioning system 100 for partitioning neural network graphs also includes a partitioning generation unit 500 configured to partition the neural network graph based on a defined cost function and hardware memory constraints. The partitioning generation unit 500 generates partition boundaries between two nodes in the neural network graph based on the cost function. The cost function is defined as the estimated computation time and memory bandwidth for partitioning the subgraph.

[0042] By using this cost function, the minimum cost between the starting node and any ending node in the subgraph can be easily calculated. The minimum cost for any node can be obtained using an efficient sorting algorithm. The idea behind this algorithm is to calculate the minimum cost path from the starting point to any ending point and exclude longer paths during updates. Furthermore, the partitioning generation unit 500 assigns the minimum cost path between two nodes among the multiple nodes. The partitioning generation unit 500 generates multiple partition paths sorted according to the cost function between the multiple nodes.

[0043] In one embodiment of the present invention, the partitioning generation unit 500 generates all costs between two nodes generated by the node generation unit 200 based on a graph cost function partitioning algorithm. First, it loops from the first node to all the remaining N-1 nodes. Further, the current cost corresponding to each node and node 0 is stored in a database.

[0044] The partitioning generation unit 500 again iterates from the second node using the graph cost function partitioning algorithm, and iterates through all the remaining N-2 nodes. Furthermore, it identifies the current cost for each node. For each node, the partitioning generation unit 500 compares the current cost with the stored costs and obtains the partitioning path with the lowest cost obtained from the sorting.

[0045] The partitioning generation unit 500 iterates through the graph cost function partitioning algorithm to the last node of the graph. Furthermore, the partitioning generation unit 500 obtains a list of partitioning paths sorted by cost. Based on these cost-sorted partitioning paths, the algorithm partitions the graph into several subgraphs that can adapt to hardware limitations.

[0046] The system 100 for partitioning neural network graphs also includes a subgraph generator 600 configured to generate a series of subgraphs based on partitioning paths, wherein the partitioning paths are generated by a partitioning generation unit 500 based on a graph cost function partitioning algorithm.

[0047] The system 100 for partitioning a neural network graph also includes a subgraph optimizer 700 configured to optimize a subgraph by searching for hyperparameter-based node constraints. A valid split is determined because the current split does not violate any existing subgraph checker rules. If the lowest-cost path does not break any subgraph checker rules and is a valid split, a series of checks based on that split are performed on that split until the end of the graph.

[0048] In one embodiment of the invention, the generated subgraph may also be referred to as a segment. The segment checker may be rule-based. Rules are registered and grouped by category. The segment checker checks for overflows in byte tensor move masks and weights, and for overflows in byte tensor memory access inputs and outputs. The segment checker also checks for overflows in the overlap buffer size, and for the data buffers of node input values ​​and node output values.

[0049] The segment checker performs checks to ensure that the number of input and output tensors is less than a predetermined limit. The segment checker also checks memory access rules. Segment or subgraph outputs are not used by segment nodes. Additionally, the segment checker checks the shape of the input tensors. The segment checker also checks the stripe size of the segment's output tensor, returning an error if it fails.

[0050] The subgraph optimizer 700 also assigns at least one workflow to multiple subgraphs when selecting scheduling parameters. For example, scheduling parameters include the stripe size assigned to a subgraph. The stripe size workflow can include setting the stripe size of the subgraph, expressed as the stripe size of the direct memory access input tensor.

[0051] In a directed acyclic graph (DAG), a subgraph has direct memory access (DMI) input tensors and DMI output tensors defined at the beginning. Workflows can determine hardware attributes based on input tensor strip size, output tensor strip size, input node strip size, output node strip size, byte tensor shift mask and weight usage, overlap buffer size, data buffers for node input values, and data buffers for node output values. Workflows can allocate segment input tensor stripes based on the segment's first input tensor. Normal stripe sizes can be set for both input and output node stripe sizes.

[0052] Figure 2 A partitioning generation unit according to the present invention is illustrated. The partitioning generation unit 500 is configured to partition a neural network graph based on a defined cost function and hardware memory limitations. The partitioning generation unit 500 generates partition boundaries within the neural network graph based on the cost function between two nodes in the neural network graph.

[0053] The partitioning generation unit 500 generates all costs between two nodes generated by the node generation unit 200 based on a graph cost function partitioning algorithm. First, it loops from the first node to all the remaining N-1 nodes. Further, the current cost between each node and node 0 is stored in a database.

[0054] The partitioning generation unit 500 again iterates from the second node using the graph cost function partitioning algorithm, and iterates through all the remaining N-2 nodes. Furthermore, it identifies the current cost for each node. For each node, the partitioning generation unit 500 compares the current cost with the stored costs and obtains the partitioning path with the lowest cost obtained from the sorting.

[0055] The partitioning generation unit 500 iterates through the graph cost function partitioning algorithm to the last node of the graph. Furthermore, the partitioning generation unit 500 obtains a list of partitioning paths sorted by cost. Based on these cost-sorted partitioning paths, the algorithm partitions the graph into several subgraphs that can adapt to hardware limitations.

[0056] The cost function is defined as the estimated computation time and memory bandwidth for partitioning a subgraph. Using this cost function, the minimum cost between the starting node and any ending node in the subgraph can be easily calculated. By using an efficient sorting algorithm, the minimum cost of any node can be obtained. The graph cost function partitioning algorithm calculates the minimum cost path from the starting node to any ending node and excludes longer paths during updates. Further, the partitioning generation unit 500 assigns the minimum cost path between two nodes among the multiple nodes. The partitioning generation unit 500 generates multiple partitioning paths sorted according to the cost function between the multiple nodes.

[0057] Figure 3 A method for partitioning a neural network graph according to the present invention is illustrated. Method 300 includes: step 302, listing a whitelist of the neural network graph and a blacklist of partition boundaries. Method 300 further includes: step 304, hard-splitting the graph boundaries to generate multiple nodes in the blacklist and multiple nodes in the whitelist. The multiple nodes in the blacklist represent partition boundaries.

[0058] The method 300 further includes: step 306, grouping multiple nodes in the whitelist. Specifically, optimizing the grouping of multiple nodes in the whitelist and blacklist to obtain known good partitioning points. The method 300 further includes: step 308, partitioning multiple nodes in the neural network graph based on multiple cost functions. That is, partitioning is performed by generating partition boundaries within the neural network graph based on the cost function between two nodes.

[0059] The method 300 further includes: step 310, generating multiple partition paths among multiple nodes, ordered based on multiple cost functions. Specifically, multiple partition paths are generated according to the cost functions among multiple nodes, and multiple subgraphs are generated based on the partition paths ordered by the cost functions among multiple nodes. The method 300 also includes: step 312, assigning at least one workflow to its respective subgraph when selecting scheduling parameters.

[0060] For example, scheduling parameters assigned to a subgraph include the stripe size. The stripe size workflow involves setting the stripe size of the subgraph, represented as the direct memory access input tensor stripe size. In a directed acyclic graph, the subgraph has direct memory access input tensors and direct memory access output tensors defined at the beginning.

[0061] Workflows can determine hardware attributes based on input tensor strip size, output tensor strip size, input node strip size, output node strip size, byte tensor shift mask and weight usage, overlap buffer size, and data buffers for node input and output values. Workflows can allocate segment input tensor stripes based on the segment's first input tensor. Normal stripe sizes can be set for both input and output node stripe sizes.

[0062] It should be understood that the application embodiments disclosed herein are illustrative of the principles of the embodiments of this application. Other modifications that may be adopted are also within the scope of this application. Therefore, alternative configurations of the embodiments of this application may be used as examples and not limitations, in accordance with the guidance herein. Therefore, the embodiments of this application are not limited to the embodiments shown in the figures and described herein.

[0063] Various embodiments of the invention have been described above in a detailed description. While these descriptions directly depict the above embodiments, it should be understood that modifications and / or variations to the specific embodiments shown and described herein can be conceived by those skilled in the art. Any such modifications or variations falling within the scope of this specification should also be included therein. Unless specifically stated otherwise, the words and phrases used by the inventors in the specification and claims have the common and customary meanings of those skilled in the art.

[0064] Descriptions of various embodiments of the invention known to the applicant at the time of filing the application have been presented for illustration and description. This specification is not intended to be exhaustive, nor to limit the invention to the specific forms disclosed. The described embodiments are used to explain the principles of the invention and its practical application, and to enable those skilled in the art to utilize the invention in various embodiments. Therefore, the invention is not limited to the specific embodiments disclosed for carrying out the invention.

[0065] While specific embodiments of the invention have been shown and described, it will be apparent to those skilled in the art that changes and modifications may be made based on the teachings herein without departing from the invention and its broader aspects, and therefore the appended claims are intended to include all such changes and modifications within the true spirit and scope of the invention.

Claims

1. A system for dividing a neural network graph into multiple subgraphs using a cost function-based parameter search, characterized in that, The system is used to run neural networks on resource-constrained hardware systems, and the system includes: A list creation unit is used to create a whitelist and a blacklist, wherein the list creation unit includes multiple boundaries of the whitelist and multiple boundaries of the blacklist with respect to the neural network graph; A node generation unit is used to generate multiple nodes in the blacklist and multiple nodes in the whitelist using hard partitioning, wherein the multiple nodes in the blacklist represent the partition boundary; The optimization unit is used to provide grouping results corresponding to multiple nodes in the whitelist; A partitioning generation unit is used to partition among multiple nodes in the neural network graph based on a cost function, and generate multiple partitioning paths sorted based on the cost function value, wherein the cost function is the estimated computation time and memory bandwidth of the partitioning subgraph; A subgraph generator is used to generate multiple subgraphs based on the multiple partitioning paths; And a subgraph optimizer for optimizing the plurality of subgraphs by searching hyperparameter computation node constraints, wherein the subgraph optimizer assigns at least one workflow to the plurality of subgraphs to divide the neural network graph into the plurality of subgraphs; The step of partitioning the neural network graph based on a cost function and generating multiple partition paths sorted by cost function values ​​includes: based on the cost function, starting from the first node, looping through all the remaining N-1 nodes and storing the current cost function value between each node and node 0 in a database; then starting from the second node, looping through all the remaining N-2 nodes, identifying the current cost function value corresponding to each node, comparing the current cost function value with the stored cost function value, and obtaining the partition path with the lowest sorted cost function value; and looping to the last node of the graph to obtain the partition path sorted by cost function value.

2. The system according to claim 1, characterized in that, The at least one workflow determines hardware attributes based on at least one of the following: input tensor strip size, output tensor strip size, input node strip size, output node strip size, byte tensor shift mask and weight utilization, overlap buffer size, data buffer for node input values, and data buffer for node output values.

3. The system according to claim 2, characterized in that, The system further includes a subgraph inspection unit configured to inspect the byte tensor shift mask, the overflow of the weights, the byte tensor direct memory access input and output overflow, the overflow of the overlapping buffer size, the data buffer of the node input value, and the memory access rules of the multiple subgraphs.

4. The system according to claim 2, characterized in that, The system further includes a subgraph inspection unit configured to inspect the size of the output tensor stripes and the shape of the input tensors of the plurality of subgraphs.

5. The system according to claim 3 or 4, characterized in that, The subgraph inspection unit operates according to multiple preset rules, which are categorized.

6. The system according to claim 1, characterized in that, The system supports accelerators and chips, including digital signal processor chips, neural network chips, and artificial intelligence chips.

7. A method for dividing a neural network graph into multiple subgraphs using parameter search based on a cost function, characterized in that, The method is used to run a neural network on a resource-constrained hardware system, and the method includes: List the whitelist and the blacklist with dividing boundaries for the neural network graph; Hard segmentation of the graph boundary is performed to generate multiple nodes in the blacklist and multiple nodes in the whitelist; Group the multiple nodes in the whitelist; Based on the cost function, partitioning is performed among multiple nodes in the neural network graph; Multiple partitioning paths are generated among the nodes, sorted by cost function values; Multiple sub-graphs are generated based on the multiple partitioning paths; And when selecting scheduling parameters, at least one workflow is assigned to the plurality of subgraphs in order to divide the neural network graph into the plurality of subgraphs; The process involves partitioning the network graph among multiple nodes based on a cost function; and generating multiple partitioning paths among these nodes, ordered by cost function values, including: Based on the cost function, starting from the first node, loop through all the remaining N-1 nodes and save the current cost function value between each node and node 0 in the database; then, starting from the second node, loop through all the remaining N-2 nodes, identify the current cost function value corresponding to each node, and compare the current cost function value with the saved cost function value to obtain the partition path with the lowest cost function value obtained by sorting; and finally, loop to the last node of the graph to obtain the partition path sorted by cost function value.

8. A computer-usable storage medium comprising computer program logic for enabling at least one processor in a computer system to partition a neural network graph into multiple subgraphs, the computer program logic comprising: List the whitelist and the blacklist with dividing boundaries for the neural network graph; Hard segmentation of the graph boundary is performed to generate multiple nodes in the blacklist and multiple nodes in the whitelist; Group the multiple nodes in the whitelist; Based on the cost function, partitioning is performed among multiple nodes in the neural network graph; Multiple partitioning paths are generated among the nodes, sorted by cost function values; Multiple sub-graphs are generated based on the multiple partitioning paths; And when selecting scheduling parameters, at least one workflow is assigned to the plurality of subgraphs in order to divide the neural network graph into the plurality of subgraphs; The partitioning is performed among multiple nodes in the neural network graph based on a cost function; Multiple partitioning paths are generated among the plurality of nodes, sorted by cost function values, including: Based on the cost function, starting from the first node, loop through all the remaining N-1 nodes and save the current cost function value between each node and node 0 in the database; Then, starting from the second node, loop through all the remaining N-2 nodes, identify the current cost function value corresponding to each node, compare the current cost function value with the saved cost function value, and obtain the partition path with the lowest cost function value obtained by sorting; and loop to the last node of the graph to obtain the partition path sorted by cost function value.