A
system and method enables de-duplication in a storage
system architecture comprising one or more volumes distributed across a plurality of nodes interconnected as a cluster. De-duplication is enabled through the use of file offset indexing in combination with
data content redirection. File offset indexing is illustratively embodied as a Locate by offset function, while
data content redirection is embodied as a novel Locate by content function. In response to input of, inter alia, a data container (file) offset, the Locate by offset function returns a data container (file) index that is used to determine a storage
server that is responsible for a particular region of the file. The Locate by content function is then invoked to determine the storage
server that actually stores the requested data on disk. Notably, the content function ensures that data is stored on a volume of a storage
server based on the content of that data rather than based on its offset within a file. This aspect of the invention ensures that all blocks having identical
data content are served by the same storage server so that it may implement de-duplication to conserve storage space on disk and increase cache efficiency of memory.