Keyword semantic classification method and system for sensitive data leakage detection

A leakage detection and sensitive data technology, applied in the field of keyword semantic classification methods and systems, can solve the problems of untargeted matching, high computing resource usage, long time period, etc., and achieve targeted search and recovery, accurate Classification and grading, the effect of improving work efficiency

Pending Publication Date: 2020-09-22
SHANGHAI GUAN AN INFORMATION TECH
View PDF8 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But there are still some important sensitive words missing
[0007] To sum up, the method of using keywords to match sensitive data in the pr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword semantic classification method and system for sensitive data leakage detection
  • Keyword semantic classification method and system for sensitive data leakage detection
  • Keyword semantic classification method and system for sensitive data leakage detection

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0043] In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention. Obviously, the described embodiments are part of the present invention. Examples, not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0044] A keyword semantic classification method for sensitive data leakage detection, such as figure 1 As shown, the specific steps are as follows:

[0045] Step 1. Enter the sensitive keyword library;

[0046] Step 2, word vectorization, using natural language processing technology to vectorize the keyword database;

[0047] Step 3: Data dimensionality reduction, dimensionality ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a keyword semantic classification method and system for sensitive data leakage detection. The keyword semantic classification method comprises the following specific steps: 1,inputting a sensitive keyword library; 2, vectorizing the keyword library by using a natural language processing technology; 3, carrying out dimensionality reduction on the vector data corresponding to each keyword; 4, performing clustering analysis on the vector data subjected to dimension reduction; 5, for the keyword vectors subjected to clustering analysis, identifying keyword categories in combination with services; 6, performing keyword grouping optimization on each category according to the category labels which have been identified; and 7, outputting the category of the sensitive keyword library. The vocabularies are expressed in a vectorization form to achieve classification of the vocabularies; vocabulary classification optimization is realized by using similarity calculation based on vocabularies of a specific category; and a huge keyword library is classified and refined, so that the working efficiency of a user and the data matching accuracy of a specified field are improved.

Description

technical field [0001] The invention relates to the technical field of computer data security, in particular to a keyword semantic classification method and system for sensitive data leakage detection. Background technique [0002] Sensitive data generally refers to information and data with a high degree of confidentiality of enterprises, organizations or individuals. [0003] With the rapid development of the Internet in recent years, information security has become particularly important, and the risk of sensitive data leakage faced by enterprises is also increasing. For such problems, the most widely used method is to use sensitive word matching method to discover Internet sensitive information, and the sensitive word lexicon used in it will become larger and more complicated as the sample data gradually increases. [0004] There are many types of sensitive data, and the extracted sensitive keywords have the characteristics of a wide range and variety. How to effectivel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F16/9532G06F40/30G06K9/62
CPCG06F16/35G06F40/30G06F16/9532G06F18/23213
Inventor 陶景龙梁淑云刘胜马影王启凡魏国富殷钱安余贤喆周晓勇
Owner SHANGHAI GUAN AN INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products