Text desensitization method and device, electronic equipment and computer readable storage medium

A computer program and text technology, applied in computer security devices, calculations, electrical digital data processing, etc., can solve problems such as reducing the accuracy of sensitive data identification, reducing system availability and ease of use, and missing sensitive data

Inactive Publication Date: 2020-11-27
ZICT TECH CO LTD
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the existing sensitive data identification methods are based on rule discovery and manual definition. The rule-based discovery method can effectively identify sensitive data that conforms to the rule definition, but it will miss a large number of irregular sensitive data, reducing the accuracy of sensitive data identification. ; On the other hand, when the amount of data is relatively large, the method based on manual definition will increase the burden on users and reduce the usability and ease of use of the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text desensitization method and device, electronic equipment and computer readable storage medium
  • Text desensitization method and device, electronic equipment and computer readable storage medium
  • Text desensitization method and device, electronic equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] like figure 1 As shown, according to the embodiment of the first aspect of the present invention, a text desensitization method is proposed, the method includes:

[0044] Step 102, obtaining the text to be processed and the Hidden Markov Model;

[0045] Step 104, performing word segmentation processing on the text to be processed according to the word segmentation database to obtain vocabulary information;

[0046] Step 106, according to the vocabulary information and the Hidden Markov Model, determine the context information corresponding to the vocabulary information;

[0047] Step 108, whether the context information satisfies the preset context information, if so, go to step 110, if not, go to step 112;

[0048] Step 110, desensitizing the vocabulary information;

[0049] Step 112, no desensitization treatment is performed.

[0050] In this embodiment, the text to be processed is segmented in combination with the word segmentation library to obtain vocabulary in...

Embodiment 2

[0052] like figure 2 As shown, according to an embodiment of the present invention, a text desensitization method is proposed, the method includes:

[0053] Step 202, obtaining the text to be processed and the Hidden Markov Model;

[0054] Step 204, performing word segmentation processing on the text to be processed according to the word segmentation database to obtain vocabulary information;

[0055] Step 206, according to the vocabulary information and the Hidden Markov Model, determine the context information corresponding to the vocabulary information;

[0056] Step 208, whether the context information satisfies the preset context information, if yes, go to step 210, if not, go to step 212;

[0057] Step 210, whether the vocabulary text in the vocabulary information conforms to the privacy vocabulary, if so, go to step 214, if not, go to step 212;

[0058] Step 212, do not perform desensitization treatment;

[0059] Step 214, mark the vocabulary text as sensitive data...

Embodiment 3

[0065] like image 3 As shown, according to an embodiment of the present invention, a text desensitization method is proposed, the method includes:

[0066] Step 302, obtaining the target text;

[0067] Step 304, using the maximum matching algorithm to perform word segmentation processing on the target text to obtain the second target vocabulary;

[0068] Step 306, counting the frequency of occurrence of the second target vocabulary in the target text;

[0069] Step 308, updating the thesaurus according to the second target vocabulary whose frequency of occurrence is greater than or equal to the preset frequency;

[0070] Step 310, performing word segmentation processing on the target text according to the word segmentation database, to obtain the first target vocabulary, the vocabulary position and semantics corresponding to the first target vocabulary;

[0071] Step 312, according to the first target vocabulary, vocabulary position, semantics and context pattern library, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text desensitization method and device, electronic equipment and a computer readable storage medium. The text desensitization method comprises the steps of obtaining a to-be-processed text and a hidden Markov model; performing word segmentation processing on the to-be-processed text according to the word segmentation library to obtain vocabulary information; determining context information corresponding to the vocabulary information according to the vocabulary information and a hidden Markov model; and performing desensitization processing on the vocabulary informationif the context information satisfies the preset context information. Method proposed by the invention, recognizing the context of the unstructured text through a hidden Markov model; enabling furtherscreening of private words, according to the method, the recognition precision of the private words is improved, the desensitization requirements of different users are met, the recognition processing efficiency of the private words is effectively improved, the private data is prevented from being searched in a regular mode, the user does not need to be forced to edit any data rule, the workloadof the user is reduced, and meanwhile manual errors caused by manual annotation are prevented.

Description

technical field [0001] The present invention relates to the technical field of electronic equipment, in particular to a text desensitization method, a text desensitization device, an electronic device and a computer-readable storage medium. Background technique [0002] In the prior art, in order to ensure the safety of data use, a desensitization method is generally used to replace private data. Most of the existing desensitization methods are for structured data, such as databases. Use rules to identify, such as specifying the field names of database tables, etc. for desensitization. [0003] As industry data privacy protection becomes increasingly important, the desensitization methods used by industry users have the following problems: most data processing methods currently focus on structured data, and most semi-structured data use regular expression pattern matching methods. Find key data for desensitization. Most of the existing sensitive data identification methods...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06F40/279G06F40/216G06F16/335G06F21/62
CPCG06F40/289G06F40/30G06F40/279G06F40/216G06F16/335G06F21/6254
Inventor 代庆国罗英群吕令广
Owner ZICT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products