Joint optimization method and device for graph replacement and parallelization of deep neural network inference

By performing graph substitution and parallel processing on the computation graph of deep neural networks, a joint optimization graph for the target is determined, which solves the problems of long inference time and low hardware utilization in existing technologies, and achieves more efficient hardware performance utilization and inference speed.

CN118228762BActive Publication Date: 2026-06-12TSINGHUA UNIVERSITY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2024-03-22
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Even after optimization, the inference time of existing deep neural networks is still relatively long, and existing methods fail to fully utilize hardware performance, resulting in low hardware utilization.

Method used

By determining multiple computational subgraphs based on the initial computational graph, graph substitution and parallelization are performed on each subgraph to determine the target joint optimization graph corresponding to each computational subgraph. The optimization results are used to characterize that the inference time meets the preset requirements.

🎯Benefits of technology

By maximizing hardware performance, the inference time of deep neural networks is significantly reduced, improving hardware utilization and inference speed.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118228762B_ABST
    Figure CN118228762B_ABST
Patent Text Reader

Abstract

The application relates to a deep neural network inference graph replacement and parallelization joint optimization method and device, the method comprising: determining a plurality of calculation subgraphs based on an initial calculation graph; performing graph replacement processing and parallelization processing on each calculation subgraph respectively to determine a target joint optimization graph corresponding to each calculation subgraph; and determining an optimization result of the initial calculation graph according to each target joint optimization graph. The application first uses graph replacement and then performs automatic parallelization, effectively eliminates redundant flows in the parallelization scheme, thereby maximizing the performance of the hardware and maximizing the reduction of the DNN inference time.
Need to check novelty before this filing date? Find Prior Art