Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for extracting sensitive information from unstructured data

A technology for unstructured data and sensitive information, applied in the field of information security, which can solve problems such as the difficulty of collecting contextual information

Inactive Publication Date: 2021-07-02
SICHUAN UNIV
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Context-based semantic analysis is usually based on machine learning, using data context features to extract sensitive information in data. This method does not need to directly detect sensitive information, but it is difficult to collect context information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting sensitive information from unstructured data
  • Method for extracting sensitive information from unstructured data
  • Method for extracting sensitive information from unstructured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0038] This embodiment provides a method for extracting sensitive data from unstructured data.

[0039] In this embodiment, according to the national standard GB / T 35273-2017 "Information Security Technology Personal Information Security Specification", the types of sensitive information specifically include personal basic information, personal identity information, network identity information, personal health and physiological information, Personal education and work information, personal property information, personal communication information, contact information, personal Internet access records, personal frequently used device information, and personal location information.

[0040] Exemplarily, the basic personal information includes personal name, date of birth, gender, ethnicity, nationality, family relationship, address, personal phone number, and email address.

[0041] Exemplarily, the personal identity information includes ID card, military officer card, passport,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of information security, and provides a method for extracting sensitive information from unstructured data. The sensitive information covers personal sensitive information types contained in the GB / T 35273-2017 Personal Information Security Standards of Information Security Technology. The method comprises the following steps: analyzing various non-structural document texts by using an analysis tool to obtain text contents in the non-structural document texts; preprocessing the unstructured text, specifically including special information replacement, text cleaning and text segmentation to obtain a text sequence; and adopting a sequence labeling model (BERT-BiLSTM-Attention) based on deep learning to carry out automatic labeling on the sensitive information in the text sequence. According to the method, a semantic analysis technology based on text content and context is combined, and sensitive information can be extracted more comprehensively and accurately.

Description

technical field [0001] The invention relates to a method for extracting sensitive information from unstructured data, belonging to the technical field of information security. Background technique [0002] With the popularization of the Internet and people's dependence on the Internet, a large amount of sensitive information involving personal privacy is stored and disseminated on the Internet, and large-scale sensitive information leakage incidents emerge in endlessly. Once such information is leaked, illegally provided or misused, it may lead to significant contract or legal liability, seriously damage personal image and reputation, and endanger personal and property safety. However, most of the data containing sensitive information is unstructured data with irregular or incomplete data structure, such as text, image, audio, video and other formats and types of files. Therefore, to protect sensitive information, the first thing to do is to find a sensitive information ext...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/289G06F40/30G06K9/62G06N3/04G06F16/33
CPCG06F40/30G06F40/211G06F40/289G06F16/3344G06N3/044G06F18/214
Inventor 黄诚郭勇延刘嘉勇
Owner SICHUAN UNIV