A matrix by vector multiplication
processing system (1) comprises a compression engine (2) for receiving and dynamically compressing a
stream of elements of a matrix; in which the matrix elements are clustered, and in which the matrix elements are in numerical
floating point format, and a memory (SDRAM, 3) for storing the compressed matrix. It also comprises a decompression engine (4) for dynamically decompressing elements retrieved from the memory (3), and a processor (10) for dynamically receiving decompressed elements from the decompression engine (3), and comprising a vector cache (13, 19), and multiplication logic (12, 21) for dynamically multiplying elements of the vector cache with the matrix elements. There is a cache (13) for vector elements to be multiplied by matrix elements to one side of a
diagonal, and a separate cache or register (19) for vector elements to be multiplied by matrix elements to the other side of the
diagonal. A control mechanism (16, 17, 18) multiplies a single
matrix element by a corresponding element in one vector cache and separately by a corresponding element in the other vector cache. The compression engine and the decompression logic are circuits within a single
integrated circuit, and the compression engine (2) performs
matrix element address compression by generating a relative address for a plurality of clustered elements.