A document desensitization system based on big data is used to desensitize Word, Excel, PPT, TXT, PDF, XML format documents desensitization processing, the system is mainly composed of system management module, data source management module, sensitive data discovery module, desensitization task management module, desensitization configuration management module, desensitization verification module,multi-level management module, security audit module seven modules. By means of natural language processing and semantic analysis, the invention solves the identification problem of sensitive data ina document, and the identification accuracy is high. The invention provides a method for solving static desensitization and dynamic desensitization of unstructured data such as documents, which ensures the safety of sharing and exchanging documents under the environment of big data. By analyzing the document, identifying the sensitive data in the document and desensitizing the document, the invention ensures that the original format of the document is not destroyed, and effectively solves the difficulty of desensitizing the document.