The invention belongs to the field of
computer software fault tolerance and relates to a
system for detecting and removing a fault of
software in operation and a method for detecting and removing the fault of the
software in operation. The
system is mainly composed of a monitored procedure and a monitoring
server. The monitored procedure comprises a function intercepting component and a fault
processing component. The monitoring
server comprises a regular file, a regular transferring component and a fault reasoning component. The regular file is read by the monitoring
server, the regular file is transferred to a reasoning self-actor through the regular transferring component, and an event sent by the monitored procedure is waited; the function intercepting component and the fault
processing component are combined with the monitored procedure in a
source code plug-in mounting mode; in the process of operation of the monitored procedure, the event is sent to the monitoring server through the function intercepting component, reasoning is conducted on the event through the fault reasoning component, a conclusion is obtained, and a handling method is returned; the handling method is executed by the monitored procedure through the fault
processing component. The
system for detecting and removing the fault of the
software in operation and the method for detecting and removing the fault of the software in operation are suitable for the
fault tolerance of the C / C++ procedure of a known
source code and
error analysis and repair can be conducted conveniently in the process of software operation.