The invention discloses a Chinese web page text deduplication system and a Chinese web page text deduplication method. The deduplication system comprises an index server and a search server, wherein the index server comprises a web page text preprocessing module, a combined characteristic sentence extraction module and a digital signature calculation module; and the search server comprises a web page text capture module and a Hash query module. The deduplication method comprises the following steps of: normalizing a web page text; extracting a combined characteristic sentence of the text; calculating a digital signature of the combined characteristic sentence; and comparing the digital signature with the existing digital signature in a Hash table, and judging whether the digital signature is duplicated or not. By the deduplication system and the deduplication method, a search engine can quickly and accurately determine and remove a large number of Chinese web pages with duplicated contents in the Internet; and when the search engine captures a new web page, the digital signature of the web page is calculated and compared with the digital signature of the web page, which has been stored by the search engine, whether the web page is duplicated or not is judged, and the web page is not stored if the web page is duplicated, so that the waste of a storage space is avoided, and the search accuracy of the search engine is improved simultaneously.