Unlock instant, AI-driven research and patent intelligence for your innovation.

Government affair big data anonymity system and method based on Spark

A technology of big data and government affairs, applied in the field of data management, it can solve the problems of increasing inaccuracy, not considering the situation of global data association, desensitizing global data speculation, etc.

Inactive Publication Date: 2021-03-02
浪潮卓数大数据产业发展有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, based on the generalization lattice to find the generalization path with less information loss, the system sampling method in the equal probability sampling is selected for sampling, and the sample is used instead of the data set to find the target generalization path on the generalization lattice, and finally the data is analyzed on this path. There are still deficiencies in the generalization of the set, and the medium probability sampling still has a certain degree of contingency, which increases the inaccuracy of data generalization, and, for local generalization, it does not take into account the association of global data applications Scenarios can easily lead to speculation on global data after desensitization of certain data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Government affair big data anonymity system and method based on Spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Hadoop is an open source distributed computing platform under the Apache Foundation. It uses the distributed file system HDFS and the MapReduce algorithm as the core, and provides users with a distributed infrastructure with transparent details of the underlying system. The Hadoop platform includes the following two core components: 1) HDFS: a distributed file system that stores massive amounts of data. It is a scalable, fault-tolerant, high-performance distributed file system, asynchronous replication, one write multiple reads, mainly responsible for storage; 2) MapReduce: parallel processing framework to achieve task decomposition and scheduling. Contains map (mapping) and reduce (reduction) process, responsible for computing on HDFS. Hadoop has the following characteristics: 1) High scalability: it can reliably store and process gigabytes of data, theoretically unlimited; 2) Low cost: learn from Google, it can distribute and process data through a server group compos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a government affair big data anonymity system and method based on Spark, and relates to the technical field of data management. For government affair data, a Spark calculationengine of a Hadoop platform is used for carrying out data analysis, anonymization data grouping is carried out according to the data type and the principle that different data in an anonymization dataset cannot be recognized and distinguished, and according to the data type of the data in the anonymization data set, a corresponding data processing mode is selected for anonymization processing. Adata set is formed from the processed data, and the data set is developed. According to the method, grouping anonymization of data is carried out on various types of data by combining the characteristics of large data volume and complex data structure at present, so that the speculability of the data can be integrally controlled, and the speculation of global data due to desensitization of some data is avoided.

Description

technical field [0001] The invention discloses a system and a method, and relates to the technical field of data management, in particular to a spark-based government big data anonymity system and method. Background technique [0002] Among the existing data anonymity methods, K-anonymity is a commonly used technology for information privacy protection, but the use of K-anonymity technology will inevitably cause information loss of published data. To this end, a local generalization algorithm based on sampling path——SPOLG algorithm is proposed. The algorithm is based on the generalization lattice to find the generalization path with less information loss. In order to reduce the path-finding time, the idea of ​​equal probability sampling is introduced, and the systematic sampling method in equal probability sampling is selected for sampling, and the samples are used to replace the data set in the generalization lattice. Find the target generalization path on the path, and fi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F21/62G06F16/182G06Q50/26
CPCG06F21/6254G06Q50/26G06F16/182
Inventor 杨勤王勇庆叶秋萍
Owner 浪潮卓数大数据产业发展有限公司