A DMA engine, includes, in part, a DMA controller, an associative memory buffer, a request FIFO accepting data transfer requests from a programmable engine, such as a CPU, and a response FIFO that returns the completion status of the transfer requests to the CPU. Each request includes, in part, a target external memory address from which data is to be loaded or to which data is to be stored; a block size, specifying the amount of data to be transferred; and context information. The associative buffer holds data fetched from the external memory; and provides the data to the CPUs for processing. Loading into and storing from the associative buffer is done under the control of the DMA controller. When a request to fetch data from the external memory is processed, the DMA controller allocates a block within the associative buffer and loads the data into the allocated block.