Expert-parallel processing method and system, electronic device, and storage medium

By employing expert replication in the large language model, experts are distributed across multiple computing cards, thus resolving the issue of unbalanced expert load and improving the processing efficiency of the computing cards and overall inference performance.

WO2026066864A9PCT designated stage Publication Date: 2026-06-25HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-08-22
Publication Date
2026-06-25

Smart Images

  • Figure CN2025116473_25062026_PF_FP_ABST
    Figure CN2025116473_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present application discloses an expert-parallel processing method and system, an electronic device, and a storage medium, which solve the problem of load imbalance on computing cards that carry expert execution, thereby improving the processing efficiency of the computing cards. In the expert-parallel processing system, W experts are deployed from a host node to N computing cards by means of expert replication, wherein each of the N computing cards comprises at least two experts having different expert weights. In an embodiment of the present application, on the basis of text processing units respectively activated by the W experts, M text processing units are assigned to at least two experts on each of the N computing cards, so as to obtain text processing units respectively activated by the at least two experts on each of the N computing cards. The total number of text processing units activated by the at least two experts and corresponding to a same expert weight across different computing cards is equal to the number of text processing units corresponding to the same expert weight among the W experts obtained by routing partitioning.
Need to check novelty before this filing date? Find Prior Art