The invention provides a large-scale document similarity detection method. The method comprises the steps of S1, calculating the similarity of other information of documents in a document set; S2, enabling each document content to correspond to a signature S and a f-dimensional vector V; S3, performing word segmentation processing on the document content; S4, comprehensively calculating a weight of a feature word x; S5, mapping the feature word x into a signature h by using a hash function, traversing all bits of the h, and adjusting the V; S6, traversing the V, adjusting the signature S, andfinally generating a signature value, corresponding to the document content, of the signature S; S7, dividing the signature value corresponding to the document content into n blocks, mapping the blocks to a bucket by using the hash function, and judging whether double hash is performed or not; S8, taking the documents of the same bucket as a candidate pair, and calculating the similarity; and S9,judging whether the documents are similar documents or not. The method is high in detection accuracy and high in executive efficiency, and can be widely used in the internet large-scale data mining.