A fully data de-identification method and system

By generating a combination of global identifiers and utilizing AES and MD5 encryption algorithms, the problem of insufficient network identifier privacy in data transmission is solved, enabling secure transmission and exchange of sensitive data and improving data security and integrity.

CN115765974BActive Publication Date: 2026-06-26SHANGHAI FOURIER INTELLIGENCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI FOURIER INTELLIGENCE CO LTD
Filing Date
2022-11-17
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In current data transmission processes, the privacy of network identifiers lacks effective protection, leading to the leakage of personal identifiers and the exposure of sensitive information, which affects data security.

Method used

A global identifier is generated by combining a unique identifier and a domain identifier. The association is established in the database through a private domain server. Data mapping and access control are performed between the public domain server and the private domain server. AES and MD5 encryption algorithms are used to ensure data security.

Benefits of technology

It enables the secure transmission of sensitive data, prevents identity leaks and exposure of sensitive information, and supports data exchange between public and private domains, thereby improving data security and integrity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115765974B_ABST
    Figure CN115765974B_ABST
Patent Text Reader

Abstract

The application discloses a kind of complete data desensitization method and system, solve the technical problem that the privacy of network identification lacks effective protection in the existing data transmission process.Method includes: obtaining data request initiated by user and generating unique identifier;The generated unique identifier is sent to the target server of storing data;Sensitivity information in data is generated by the target server to generate domain identifier, and the global identifier is merged with the unique identifier and the domain identifier, and the association relationship with the sensitive information is established, and stored in the database of the target server;Return data and map the sensitive information in data with global identifier.The application realizes the reliable protection of sensitive private data by data transformation of sensitive information through desensitization rule.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data security transmission technology, and in particular to a complete data desensitization method and system. Background Technology

[0002] In existing internet technologies, end-to-end data transmission, regardless of whether domain name or address resolution is performed, always includes precise identification information for both the sender and receiver of the message's valid payload. Due to the definitions of network transmission protocols and network topology, the expression of this precise identification information exhibits two main characteristics: one is a hierarchical expression, such as using a format similar to a generic domain name; the other is sniffability, where a protocol-transparent message structure allows interception and the acquisition of sender and receiver identifiers, leading to identity leakage.

[0003] Sensitive information depends on the specific business scenario and security dimensions. Taking an individual as an example, sensitive fields for a user include, but are not limited to: name, ID number, mobile phone number, email address, and address. In a medical system, for a patient, this may also include medical records. When a user confirms through any means that a record in a data table belongs to a specific person, it is called a personal identifier leak. Personal identifier leaks are the most serious because once a personal identifier leak occurs, the data user can obtain the sensitive information of that specific individual. When a user learns new attribute information about a person based on the data table they access, it is called an attribute leak. Personal identifier leaks will almost certainly lead to attribute leaks. When a user can determine that another person's data exists in a data table through the data of another person, it is called a membership leak. Personal identifier leaks and attribute leaks mean that membership leaks may also have occurred. Summary of the Invention

[0004] In view of the above problems, embodiments of the present invention provide a complete data desensitization method and system to solve the technical problem that the privacy of network identifiers is not effectively protected during existing data transmission processes.

[0005] The complete data anonymization method of this invention includes:

[0006] De-identify and generate a unique identifier;

[0007] Send the generated unique identifier to the private domain server;

[0008] A domain identifier is generated for sensitive data through a private domain server. The unique identifier and the domain identifier are merged into a global identifier and an association is established with the sensitive data. The global identifier is then stored in the database of the private domain server.

[0009] In one embodiment of the present invention, the method further includes: when a public domain server accesses the database of the private domain server, sensitive data is mapped using a global identifier.

[0010] In one embodiment of the present invention, the method further includes: when the public domain server accesses the database of the private domain server through a global identifier, the sensitive data corresponding to the global identifier is obtained after confirmation by the private domain server.

[0011] In one embodiment of the present invention, the method further includes: performing data anonymization on a public domain server and generating a unique identifier.

[0012] In one embodiment of the present invention, the method further includes: setting a preset public domain server in the private domain network where the private domain server is located, and performing data anonymization and generating a unique identifier on the preset public domain server.

[0013] In one embodiment of the present invention, the method further includes: when the private domain server accesses the database, obtaining sensitive data or mapping sensitive data with a global identifier.

[0014] In one embodiment of the present invention, the method further includes: when the private domain server accesses the database through a global identifier, obtaining sensitive data corresponding to the global identifier.

[0015] In one embodiment of the present invention, the data desensitization is achieved by replacing sensitive data with a unique identifier through data tokenization or AES.

[0016] In one embodiment of the present invention, a domain identifier is generated for sensitive data via a private domain server using AES and / or MD5.

[0017] A complete data anonymization system, comprising:

[0018] The public domain data masking module is used to mask data and generate unique identifiers;

[0019] The data transmission module is used to send the generated unique identifier to the private domain server;

[0020] The private domain desensitization module is used to generate domain identifiers for sensitive data through a private domain server, merge the unique identifier and the domain identifier into a global identifier, establish an association relationship with the sensitive data, and store it in the database of the private domain server.

[0021] This invention provides a complete data anonymization method and system that quickly achieves data anonymization by replacing sensitive data with unique identifiers without compromising its security. It also regenerates domain identifiers on a private domain server, making sensitive data more secure while ensuring rapid anonymization. Furthermore, the data is stored in the database after merging the unique identifier and the domain identifier, making the data storage process more rigorous, as neither a single unique identifier nor a domain identifier can complete the data mapping. Attached Figure Description

[0022] The features of the invention described above are explained in more detail with reference to the embodiments shown in the accompanying drawings, wherein like reference numerals denote like elements, wherein Figures 1-3 An embodiment of the present invention is shown.

[0023] Figure 1 The diagram shown is a flowchart of a complete data anonymization method according to an embodiment of the present invention.

[0024] Figure 2 The diagram shown is a flowchart of a complete data anonymization method according to an embodiment of the present invention.

[0025] Figure 3 The diagram shown is a flowchart of a complete data anonymization method according to an embodiment of the present invention.

[0026] Figure 4 The diagram shown is a flowchart of a complete data anonymization method according to an embodiment of the present invention.

[0027] Figure 5 The diagram shown is an architectural schematic of a complete data desensitization system according to an embodiment of the present invention.

[0028] Figure 6 The diagram shown is a schematic representation of the usage state of a complete data desensitization system according to an embodiment of the present invention.

[0029] Figure 7 The diagram shown is a schematic representation of the usage state of a complete data desensitization system according to an embodiment of the present invention.

[0030] Figure 8 The diagram shown is a schematic representation of the usage state of a complete data desensitization system according to an embodiment of the present invention.

[0031] Figure 9 The diagram shown is a schematic diagram of a complete data desensitization system according to an embodiment of the present invention. Detailed Implementation

[0032] To make the objectives, technical solutions, and advantages of this invention clearer and more understandable, the invention will be further described below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are merely some embodiments of this invention, and not all embodiments. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention.

[0033] Example 1:

[0034] Scenario: The patient has entered their ID card information on a hospital's online platform (e.g., a WeChat mini-program). Figure 1 As shown.

[0035] A secure data anonymization method, comprising:

[0036] Step 1. The private domain server requests the public domain server to generate a unique identifier (public key a4b3);

[0037] Implementation Method 1: Use data tokenization to replace sensitive data with a unique identifier. This identifier retains all necessary data information without compromising security. It creates completely random characters in the same format, quickly achieving data desensitization.

[0038] Implementation Method 2: Data tokenization can also be replaced with AES (Advanced Encryption Standard) encryption. The AES symmetric encryption algorithm uses the same key for encryption and decryption, which is fast and suitable for encrypting large amounts of data.

[0039] Example: Plaintext P (the original, unencrypted data), key K (the cipher used to encrypt the plaintext; in symmetric encryption algorithms, the encryption and decryption keys are the same. The key is negotiated between the sender and receiver, but it cannot be transmitted directly over the network, otherwise it will lead to key leakage. Usually, the key is encrypted using an asymmetric encryption algorithm and then transmitted to the other party over the network, or the key is discussed face-to-face. The key must never be leaked; otherwise, attackers can recover the ciphertext and steal confidential data.)

[0040] AES Encryption: Let the AES encryption function be E, then C = E(K, P), where P is the plaintext, K is the key, and C is the ciphertext. In other words, by taking the plaintext P and the key K as input parameters to the encryption function, the encryption function E will output the ciphertext C.

[0041] Step 2: Send the generated unique identifier to the private domain server;

[0042] Since private domain servers generally refer to intranets or local area networks, which are not accessible to everyone, it is more secure to return the unique identifier generated in step 1 to the private domain server and store it there.

[0043] Step 3: Generate a domain identifier (c2d1) for sensitive data through the private domain server, merge the unique identifier and the domain identifier into a global identifier (private key a4b3c2d1) and establish an association with the sensitive data, and store it in the database of the private domain server.

[0044] Those skilled in the art will understand that the combination of unique identifiers and domain identifiers is not limited to a simple sequential superposition, and other rules can also be used instead.

[0045] Implementation method: Encryption is performed once on the private domain server using AES and MD5 (MD5 Message-Digest Algorithm).

[0046] Encryption is performed using AES. The sender encrypts the plaintext data X with key K using AES to obtain ciphertext Y. The ciphertext is then transmitted over the network. After receiving the ciphertext Y, the receiver decrypts it using key K using AES to obtain the plaintext X. In this way, even if the ciphertext Y is intercepted during transmission over the network, it is difficult to decipher its true meaning without the key.

[0047] The AES cipher uses a 128-bit block size and key size. The algorithm is analyzed using a 128-bit key. The processing is similar for a 128-bit key, except that for every 64-bit increase in key length, the number of loops increases by 2, resulting in 10 loops for a 128-bit key.

[0048] The AES encryption algorithm is reversible and used to protect sensitive data. Symmetric encryption algorithms use the same key for both encryption and decryption, resulting in high speed, making them suitable for encrypting large amounts of data. For small amounts of confidential data, asymmetric encryption algorithms are used. In practice, the approach is to use an asymmetric encryption algorithm to manage the key for the symmetric algorithm, and then use the symmetric encryption algorithm to encrypt the data. This integrates the advantages of both types of encryption algorithms, achieving both high encryption speed and secure and convenient key management.

[0049] An MD5 hash of a string, file, or compressed file will generate a fixed-length 128-bit string. This string is essentially unique. It transforms a byte string of arbitrary length into a hexadecimal string of a certain length. The purpose is to "compress" large amounts of information into a secure format before signing with a private key using digital signature software.

[0050] When a user registers or adds a new account, the password is encrypted using MD5 and stored in the database. This prevents malicious manipulation by those who can access the database.

[0051] MD5 is irreversible, so there is no decryption method. It is mainly used for identity verification, card numbers, or passwords to prevent information from being modified.

[0052] Those skilled in the art will understand that although the above embodiments use a combination of AES and MD5 encryption and decryption methods, the purpose of this invention should also be achieved when only one of them is used.

[0053] The characters obtained by encrypting using the AES and MD5 methods described above become the domain identifier, and are combined with the unique identifier received in step 2 to form a global identifier. A new index column is added to the database of the domain server to store the global identifier.

[0054] Those skilled in the art will understand that, since the public domain server generates a unique identifier and then passes it to the private domain server, and merges it with the domain-internal identifier generated by the private domain server to obtain a global identifier, which is stored in the database of the private domain server, the public domain server can only compare data through the comparison of unique identifiers, and cannot directly obtain the real data.

[0055] Public domain servers generally refer to public network environments such as cloud servers and the internet. In a public domain environment, anyone can access it. Previously, the public and private domains used the same key, which posed a risk of being compromised by packet sniffing. Now, public-key encryption and private-key decryption are used, employing two different keys. The public and private domain keys are different, and the private key is stored on the server side, making it generally inaccessible to hackers, thus significantly enhancing security.

[0056] Example 2:

[0057] Scenario: The patient has entered their ID card information on a hospital's online platform (e.g., a WeChat mini-program). Figure 2 As shown.

[0058] A secure data anonymization method, comprising:

[0059] Step 1. The private domain server requests the public domain server to generate a unique identifier (public key a4b3);

[0060] The implementation method can be referred to the aforementioned embodiments.

[0061] Step 2: Send the generated unique identifier to the private domain server;

[0062] Step 3: Generate a domain identifier (c2d1) for sensitive data through the private domain server, merge the unique identifier and the domain identifier into a global identifier (private key a4b3c2d1) and establish an association with the sensitive data, and store it in the database of the private domain server.

[0063] Step 4: When sending sensitive information publicly, only the global identifier is sent, and the actual data is stored in the database of the private domain server;

[0064] Step 5: When the private domain server accesses the database through the global identifier, it retrieves the sensitive data corresponding to the global identifier.

[0065] Step 6: When the public domain server accesses the database of the private domain server, sensitive data is mapped using a global identifier (private key a4b3c2d1).

[0066] Step 7: When the public domain server accesses the private domain server's database through the global identifier, it obtains the sensitive data corresponding to the global identifier after confirmation by the private domain server.

[0067] This embodiment takes into account the situation where public domain servers need actual data. Nowadays, many hospitals no longer complete the entire diagnosis and treatment process on the hospital intranet. For example, online appointment registration can be completed through apps, WeChat mini-programs, etc. Patients can also view examination and treatment results through apps or mini-programs. Therefore, steps 6 and 7 are introduced later.

[0068] When the public domain needs to obtain plaintext information (i.e., sensitive data replaced by a global identifier), the public domain server requests the data from the private domain server through the global identifier. Since in some cases the original data also needs to be obtained through the public domain server, an additional step of the public domain requesting from the private domain is added to better handle situations where the real data cannot be viewed at all in the public domain.

[0069] Upon receiving a request, if the private domain server (X) agrees, it will proactively send information to the public domain server (Y). The addition of the request check is to control that not all requests sent by Y are allowed. Since the real data exists on X, the real data will only be returned to Y after X agrees. This improves the security of sensitive data while also ensuring that the public domain is not completely unable to see the complete information.

[0070] like Figure 2 As shown, the database stores and returns a global identifier, which is mentioned in steps 4 and 6 as being passed through the global identifier:

[0071] Example 1: After registering and making an appointment through a WeChat mini-program, Patient A undergoes a physical examination at the hospital. A few days later, they want to view the examination report through the WeChat mini-program. Without logging in, entering the patient's ID card number to search will return one piece of data and show whether there is a record, but will not display the specific result. This is because the global identifier (a4b3c2d1) can be matched with the identifier (a4b3) on the public domain server. At this time, it can be determined that the patient's ID card number is entered correctly, so information such as "report generated" or "report not generated" is returned.

[0072] Example 2: Patient A finds their medical examination report has been generated on the hospital's WeChat official account and wants to view the detailed report content. At this point, they will need to log in, entering their username and password (sending a request to the private domain). If the username and password verification is successful (request approved), the full report content can be viewed. Conversely, if the username or password is incorrect, the detailed content cannot be viewed. Figure 3 As shown.

[0073] Those skilled in the art will understand that in the above steps, steps 4 and 5 can be interchanged. The original step 4 mainly involves the situation of publicly sending sensitive information, and both step 6 describe the situation in the public domain environment. After the exchange, if the organization does not involve the public domain, it only needs to perform steps 1-4 and does not need to consider the operation of steps 5-7.

[0074] like Figure 4 As shown: In a hospital with only an intranet, a server can be pre-configured within the intranet to complete the entire data anonymization and mapping process. This process can be completed without any intervention in the public domain environment. This pre-configured server acts as a public domain server, executing public domain server operations.

[0075] Note: If some of the hospital's services are subsequently connected to the public domain network environment (e.g., physical examination, registration, appointment), the preset public domain server can be directly connected to the public domain network. Information entered through the public domain network or data that needs to be returned can be achieved through steps 5, 6, and 7.

[0076] Those skilled in the art will understand that if the process of generating domain identifiers by the private domain server in step 3 of the above embodiments is omitted, the public domain server will directly obtain the unique identifier, and then only the unique identifier will be sent when sensitive information is publicly sent, which cannot meet the scenario of viewing sensitive information in the public and private domains.

[0077] If step 4 is omitted, and the data is kept in the public domain when sending sensitive information, the sender and receiver identifiers of the message can be obtained after the message is intercepted, resulting in identity leakage.

[0078] If step 5 is omitted, the data remains in the private domain. When sensitive information is sent, only the identifier is transmitted. Even if the message is intercepted, it will not lead to identity leakage. Furthermore, the data is stored in the private domain server, and only the private domain server can obtain the original data, which can achieve secure data transmission. However, the public domain server cannot obtain the original real data, and it cannot better meet the needs of more data exchange scenarios.

[0079] If steps 6 and 7 are omitted, the public domain server will be unable to view the complete original data under any circumstances, and will not be able to meet the needs of more practical scenarios. For example, if the public domain service needs complete patient information during use, it can obtain it after confirmation through the private domain server; otherwise, it will not be able to obtain complete data information.

[0080] A complete data anonymization system according to an embodiment of the present invention includes:

[0081] The memory is used to store the program code corresponding to the complete data desensitization method processing procedure of the above embodiments;

[0082] The processor is used to run the program code corresponding to the complete data desensitization method processing procedure described in the above embodiments.

[0083] The processor can be a DSP (Digital Signal Processor), an FPGA (Field-Programmable Gate Array), an MCU (Microcontroller Unit) system board, a SoC (System on a Chip) system board, or a PLC (Programmable Logic Controller) minimum system including I / O.

[0084] Program such as Figure 5 The image shown is a server-side de-identification program 12. This system is mainly implemented through the following processing steps:

[0085] 1. The public domain server 11 initiates a request 101 to the server-side de-identification program 12 and generates a unique identifier 102;

[0086] 2. The server-side de-identification program then sends the generated unique identifier 102 to the private domain server 13;

[0087] 3. Private domain server 13 generates a domain identifier 103 for each piece of data and stores the unique identifier and the domain identifier in database 14;

[0088] 4. Database 14 finally returns the combined global identifier 104 to the public domain server;

[0089] refer to Figure 6In the case of a local area network and a private domain server, either ciphertext 202 or plaintext 201 can be displayed.

[0090] refer to Figure 7 This illustrates that in scenarios involving the internet, cloud, and public domain servers, plaintext 301 cannot be displayed; only ciphertext 302 can be shown. However, if a public domain server wants to display the complete plaintext information, it can send a request to the private domain server. If the private domain server agrees, the complete plaintext information can be displayed. (See reference.) Figure 8 .

[0091] A complete data anonymization system according to an embodiment of the present invention, such as Figure 9 As shown. In Figure 9 In this embodiment, the following are included:

[0092] The public domain data masking module 1010 is used to mask data and generate unique identifiers.

[0093] Data transmission module 1020 is used to send the generated unique identifier to the private domain server;

[0094] The private domain desensitization module 1030 is used to generate domain identifiers for sensitive data through the private domain server, merge the unique identifier and the domain identifier into a global identifier and establish an association relationship with the sensitive data, and store it in the database of the private domain server.

[0095] The identifier mapping module 1040 is used to map sensitive data to a global identifier when sending sensitive data publicly. Only the global identifier is sent, and the actual data remains in the database of the private domain server.

[0096] The above description is merely a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for complete data anonymization, characterized in that, include: Data is anonymized and a unique identifier is generated, which is generated by a public domain server; Send the generated unique identifier to the private domain server; A domain identifier is generated for sensitive data through a private domain server. The unique identifier and the domain identifier are merged into a global identifier and an association is established with the sensitive data. The identifier is then stored in the database of the private domain server. When sending sensitive information publicly, only the global identifier is sent, while the actual data is stored in the database of the private domain server. The public domain server requests data from the private domain server through the global identifier. When a public domain server accesses the database of the private domain server, sensitive data is mapped using a global identifier; When the public domain server accesses the database of the private domain server through a global identifier, it obtains the sensitive data corresponding to the global identifier after confirmation by the private domain server. The data desensitization involves replacing sensitive data with unique identifiers using data tokenization or AES.

2. The complete data anonymization method according to claim 1, characterized in that, Also includes: Data anonymization and unique identifier generation are performed on public domain servers.

3. The complete data desensitization method according to claim 1, characterized in that, Also includes: A preset public domain server is set up in the private domain network where the private domain server is located, and data anonymization and unique identifiers are generated on the preset public domain server.

4. The complete data desensitization method according to claim 1, characterized in that, Also includes: When the private domain server accesses the database, it obtains sensitive data or maps sensitive data using a global identifier.

5. The complete data desensitization method according to claim 1, characterized in that, Also includes: When the private domain server accesses the database through the global identifier, it obtains the sensitive data corresponding to the global identifier.

6. The complete data desensitization method according to any one of claims 1 to 5, characterized in that, Generate domain identifiers for sensitive data via a private domain server using AES and / or MD5.

7. A complete data desensitization system for implementing the complete data desensitization method according to any one of claims 1 to 6, characterized in that, include: The public domain data anonymization module is used to anonymize data and generate unique identifiers. The data transmission module is used to send the generated unique identifier to the private domain server; The private domain desensitization module is used to generate domain identifiers for sensitive data through a private domain server, merge the unique identifier and the domain identifier into a global identifier, establish an association relationship with the sensitive data, and store it in the database of the private domain server.