Expert-parallel processing method and system, electronic device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing expert replication in the large language model, experts are distributed across multiple computing cards, thus resolving the issue of unbalanced expert load and improving the processing efficiency of the computing cards and overall inference performance.

WO2026066864A9PCT designated stage Publication Date: 2026-06-25HUAWEI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-08-22
Publication Date: 2026-06-25

Application Information

Patent Timeline

22 Aug 2025

Application

25 Jun 2026

Publication

WO2026066864A9

IPC: G06F9/50

CPC: G06F9/50

AI Tagging

Application Domain

Resource allocation

Technology Topics

Processing elementReplication (computing)

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN2025116473_25062026_PF_FP_ABST

Patent Text Reader

Abstract

The present application discloses an expert-parallel processing method and system, an electronic device, and a storage medium, which solve the problem of load imbalance on computing cards that carry expert execution, thereby improving the processing efficiency of the computing cards. In the expert-parallel processing system, W experts are deployed from a host node to N computing cards by means of expert replication, wherein each of the N computing cards comprises at least two experts having different expert weights. In an embodiment of the present application, on the basis of text processing units respectively activated by the W experts, M text processing units are assigned to at least two experts on each of the N computing cards, so as to obtain text processing units respectively activated by the at least two experts on each of the N computing cards. The total number of text processing units activated by the at least two experts and corresponding to a same expert weight across different computing cards is equal to the number of text processing units corresponding to the same expert weight among the W experts obtained by routing partitioning.

Need to check novelty before this filing date? Find Prior Art