An apparatus and method is provided for efficiently determining the source of problems in a
complex system based on
observable events. By splitting the
problem identification process into two separate activities of (1) generating efficient codes for
problem identification and (2) decoding the problems at runtime, the efficiency of the
problem identification process is significantly increased. Various embodiments of the invention contemplate creating a
causality matrix which relates
observable symptoms to likely problems in the
system, reducing the
causality matrix into a minimal
codebook by eliminating redundant or unnecessary information, monitoring the
observable symptoms, and decoding problems by comparing the observable symptoms against the minimal
codebook using various best-fit approaches. The minimal
codebook also identifies those observable symptoms for which the greatest benefit will be gained if they were monitored as compared to others.By defining a distance measure between symptoms and codes in the codebook, the invention can tolerate a loss of symptoms or spurious symptoms without failure. Changing the
radius of the codebook allows the
ambiguity of problem identification to be adjusted easily. The invention also allows probabilistic and temporal correlations to be monitored. Due to the degree of
data reduction prior to runtime, extremely large and complex systems involving many observable events can be efficiently monitored with much smaller computing resources than would otherwise be possible.