The invention relates to a sensitive information desensitization method and
system for
data sharing. According to the method, by adopting statistics, a
natural language processing technology and a
machine learning technology, the protection of sensitive data in the whole process from
data publishing to
data application and usage, and automatic identification of sensitive information of such as named entities and addresses is put forward on the basis of the establishment of a sensitive information keyword
library, named entities, addresses and the like is put forward; a
Sigmoid function is used for calculating the correlation degree of sensitive attributes; a desensitization strategy is carried out in a combined mode of establishing a sensitive attribute generation rule
library and adopting a
named entity desensitization rule and a core desensitization
algorithm; deep desensitization calculation is conducted in combination with numerical sensitive attributes and classified sensitive attributes separately, the desensitization degree of a whole
data set is obtained, and the controlled output of data and the like are achieved by downloading a link address hash; thus, the safety of data sensitive information and a sensitive
information processing strategy which meets analysis and mining requirements to the maximize degree can be ensured, and the method and
system have the advantages of being good in desensitization effect, high in reliability and the like.