A large model association fault positioning method and system for heterogeneous training scenarios

By combining periodic polling and packet capture techniques with a structural causal model, the problem of accurately locating faults in heterogeneous clusters was solved, achieving low-latency, high-performance fault monitoring and diagnosis, and improving the system's usability and reliability.

CN122220133APending Publication Date: 2026-06-16INESA (GRP) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INESA (GRP) CO LTD
Filing Date
2026-03-11
Publication Date
2026-06-16

Smart Images

  • Figure CN122220133A_ABST
    Figure CN122220133A_ABST
Patent Text Reader

Abstract

The application relates to a large model connection fault positioning method and system for a heterogeneous training scene, and the method comprises the following steps: periodically polling a cluster management platform to monitor and identify a new training task and metadata thereof, and parallelly capturing a connection data packet thereof and extracting connection key parameters, obtaining an original data stream after association, generating a structured feature sequence by performing real-time processing and feature extraction on the original data stream; comparing the structured feature sequence with an expected norm to determine and locate a fault node, classifying and differentially analyzing the fault node according to its propagation characteristics to obtain analysis strategy decisions of each connection fault; constructing a fault chain based on a specific node by adopting a hierarchical aggregation strategy, and performing causal analysis on the fault chain by a structure causal model according to the analysis strategy decisions of each connection fault to obtain fault roots of each connection fault. Compared with the prior art, the application realizes non-invasiveness of fault positioning, improves the accuracy of fault positioning and reduces the operation overhead.
Need to check novelty before this filing date? Find Prior Art