Expert-parallel processing method and system, electronic device, and storage medium
By employing expert replication in the large language model, experts are distributed across multiple computing cards, thus resolving the issue of unbalanced expert load and improving the processing efficiency of the computing cards and overall inference performance.
WO2026066864A9PCT designated stage Publication Date: 2026-06-25HUAWEI TECH CO LTD
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2025-08-22
- Publication Date
- 2026-06-25
Smart Images

Figure CN2025116473_25062026_PF_FP_ABST
Abstract
The present application discloses an expert-parallel processing method and system, an electronic device, and a storage medium, which solve the problem of load imbalance on computing cards that carry expert execution, thereby improving the processing efficiency of the computing cards. In the expert-parallel processing system, W experts are deployed from a host node to N computing cards by means of expert replication, wherein each of the N computing cards comprises at least two experts having different expert weights. In an embodiment of the present application, on the basis of text processing units respectively activated by the W experts, M text processing units are assigned to at least two experts on each of the N computing cards, so as to obtain text processing units respectively activated by the at least two experts on each of the N computing cards. The total number of text processing units activated by the at least two experts and corresponding to a same expert weight across different computing cards is equal to the number of text processing units corresponding to the same expert weight among the W experts obtained by routing partitioning.
Need to check novelty before this filing date? Find Prior Art