Secondary blocking method in privacy-preserving record links

A privacy protection and block technology, applied in the field of data integration and data privacy, can solve the problems of few PPRL technology research, matching calculation, loss record group, etc., to improve the precision rate, high recall rate, and improve the fault tolerance rate Effect

Inactive Publication Date: 2019-02-05
NORTHEASTERN UNIV
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] There are two deficiencies in the existing PPRL technology at present: 1) Most of the existing PPRL technologies are suitable for the record link between two data parties, but there are few researches on PPRL technology between multiple data parties. The multi-party PPRL method must ensure that the data party The increase in the number will not seriously affect the recall and precision
2) None of the existing block methods can make PPRL achieve high recall rate and high precision rate at the same time, mainly due to the following two aspects: On the one hand, too many unmatched real cases will be generated after block Candidate record groups, resulting in additional calculation costs; on the other hand, the real matching record groups are lost after block, and matching calculations are not performed on them

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Secondary blocking method in privacy-preserving record links
  • Secondary blocking method in privacy-preserving record links
  • Secondary blocking method in privacy-preserving record links

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0048] P 1 , P 2 , P 3 They are citizen pharmacy drug purchase data set, resident gene data set and hospital outpatient information data set. The purpose of PPRL is to identify the ternary record group representing the same user in the three data sets, and the resident name attribute shared by the three data sets as the classification block properties. Implement the use case for P 1 , P 2 , P 3 The tens of thousands of records in the database implement the secondary block method proposed by the present invention, and divide the three similar records into the same final block.

[0049] Example analysis P 1 , P 2 , P 3 The middle refers to 3 parties of 4 specific users with 3 records in total and 9 records in total to prove the effect of the block method proposed by the present invention. Table 1 lists these 9 records and their name attribute values, record r ij for P i The jth entry of the 3 records, record r 11 , r 21 , r 31 On behalf of user 1, record r 13 , r ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a secondary partitioning method in privacy protection record link, The invention belongs to the field of data integration and data privacy, in particular to the Bloom Filter coding of the records by each data source. Then, the following two steps are carried out: (1) LSH combined with suffix quadratic partitioning method, and block dispersion degree is introduced to adjusttwice partitioning. (2) Multi-block merging based on sliding window can improve the fault tolerance of links. The PPRL block dividing method of the invention has the characteristics of high recall rate of the LSH method and fast division of a large data set, and simultaneously effectively improves the precision rate.

Description

technical field [0001] The invention belongs to the field of data integration and data privacy, and mainly relates to a secondary block method applied in privacy protection record linking. Background technique [0002] With the advent of the big data era, there is an increasing need to mine valuable information by analyzing datasets containing millions of records, and analyzing large datasets usually requires integrating data from multiple sources. At the same time, many organizations do not allow other organizations to share their datasets due to regulations and laws. To this end, the 'Privacy-Preserving Record Linkage (PPRL)' technology is presented, which refers to identifying records representing the same entity in different datasets without revealing the privacy of the entity. In the PPRL project, if a record of a data party is judged to match the records of other data parties, the data party agrees to inform other participants or additional researchers of some attribu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/62
CPCG06F21/6227G06F21/6245
Inventor 申德荣彤丹妮聂铁铮寇月于戈
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products