The invention provides a method and
system for determining faults of a heterogeneous
system based on
machine learning, and the method comprises the steps: carrying out the analysis of the safety of historical
system faults and major events, preliminarily building case
library data and a fault tree model, arranging and analyzing index data and labeling data, and respectively training data models ofdifferent application scenes; according to the collected current index data and the
data model, calculating and analyzing the operation
health condition of the system, and triggering fault diagnosisand alarm for the grabbed abnormal index data; and automatically diagnosing a fault reason according to a
relation graph established by
machine learning and the collected abnormal stack
annotation data, determining a fault repair scheme according to the fault reason, and triggering fault repair. According to the method and the system, the dependence of operation and maintenance personnel on professional
business knowledge is reduced, intelligent and rapid fault discovery and fault generation reason diagnosis are realized through
machine learning, self-repair is automatically completed, and theoperation safety and stability of the distributed heterogeneous system are greatly improved.