A model training method and apparatus

By constructing a directed acyclic graph to optimize the deployment of sub-models and sub-data in the cluster, the problem of data interaction time in distributed training is solved, thereby improving the efficiency and speed of model training.

CN122310101APending Publication Date: 2026-06-30HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2024-12-31
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In distributed model training, the amount of data interaction between different domains affects the model training speed. How to optimize the allocation of sub-models and sub-data to reduce the data interaction time has become an urgent problem to be solved.

Method used

By constructing a directed acyclic graph through control nodes, and based on data transmission bandwidth and model dependencies, the deployment locations of sub-models and sub-data in multiple clusters are determined, and data transmission paths are optimized to reduce the amount of interactive data.

Benefits of technology

It improves the efficiency of model training, shortens data transmission time and computation latency, and optimizes the model training process.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122310101A_ABST
    Figure CN122310101A_ABST
Patent Text Reader

Abstract

A model training method and apparatus are disclosed, relating to the field of artificial intelligence technology. A control node constructs a directed acyclic graph (DAG) using sub-models, sub-data, and the dependencies between sub-models during training. Based on the DAG, the control node determines the deployment locations of sub-data and sub-models across multiple clusters. Thus, the control node determines deployment locations based on the DAG, taking into account the dependencies between sub-models, ensuring a reduction in the amount of data exchanged between computing nodes / clusters. Furthermore, the control node determines deployment locations based on the data transmission bandwidth between clusters and between computing nodes within a cluster, ensuring a larger data transmission bandwidth is allocated between computing nodes / clusters with significant data interaction, shortening the time required for data transmission, and improving model training efficiency.
Need to check novelty before this filing date? Find Prior Art