A method of correcting an error in an ECC protected mechanism of a computer
system, such as a cache or
system bus, by applying data with a number of bits N to an error
correction code (ECC) matrix to yield an error detection syndrome, wherein the ECC matrix has a plurality of rows and columns with a given column corresponding to a respective one of the
data bits, and selected bits are set in the ECC matrix along each column and each row such that encoding for the ECC matrix allows N-bit error correction and (N−1)-bit error detection. In the illustrative embodiment, the ECC matrix has an odd number of bits set in each row thereof. In the case of an ECC protected mechanism such as a memory device, these properties facilitate the use of an inversion bit for correcting hard faults in the stored data. When an error is detected and after it is corrected, the corrected data is inverted and then rewritten to the cache array. The corresponding inversion bit for this entry is accordingly set to indicate that the data as currently stored is inverted. Thereafter, the data is re-read from the array, and if the error was due to a hard fault (stuck bit), it will appear correct (after applying the polarity indicated by the inversion bit), since the inversion will have changed the value of the defective bit to the stuck value. The inversion bit may be part of the data itself. In this case, one of the columns in the ECC matrix corresponds to the inversion bit, and each bit in that column of the matrix is set. In the case of an ECC protected mechanism such as a
system bus, once a stuck bit condition is detected, the sending device can elect to send data such that the polarity of the data for that bit is always flipped to match the
logic level of the stuck value on the wire. This approach allows for full single-bit correct, double-bit detect even in the presence of a stuck bit.