When a message having at least one attachment is obtained for indexing, it is indexed as N+1 separate documents, where N is the number of attached documents. If the message is part of a message thread, then information regarding the last message in the thread is retrieved, and search index attachment meta data for the last message is extracted. A
unique identifier is computed for the newly obtained attachments, and used to search for matches in the attachments for the last message in the thread. If there is a match, then the newly obtained attachment is not indexed, but the
unique identifier of the previously indexed matching attachment is added to a body index document for the new message. A
unique identifier associated with the new message body is also added to a
list of parent identifiers associated with the attachment. If a search is subsequently issued that matches the contents of the attachment, all documents whose parent identifiers are listed in the attachment document meta data will be returned as matches. If an attachment is obtained for a message is not part of a previous message thread, or if a newly obtained attachment is not a match with any previously obtained attachment within the message thread to which it belongs, then the attachment is indexed into the search index, and its unique identifier is included in the index document for the newly obtained message body.