Log desensitization method, device, apparatus and medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining the MD5 algorithm with a preset zero-collision hash algorithm, the problems of single algorithm and low efficiency in log de-identification are solved, achieving zero-collision de-identification results and ensuring data security and availability.

CN116186758BActive Publication Date: 2026-06-23CHINA TELECOM CORP LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA TELECOM CORP LTD
Filing Date: 2022-12-26
Publication Date: 2026-06-23

Application Information

Patent Timeline

26 Dec 2022

Application

23 Jun 2026

Publication

CN116186758B

IPC: G06F21/62; H04L9/32; H04L41/069

AI Tagging

Application Domain

User identity/authority verification Digital data protection

Technology Topics

AlgorithmMD5

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing log desensitization methods rely on single algorithms and cannot achieve zero-collision requirements. In particular, they suffer from computational complexity and low efficiency in deep message detection systems.

Method used

Sensitive information fields are processed using the MD5 algorithm combined with a preset zero-collision hash algorithm. The desensitized result is generated by mixing the MD5 value and the hash value, and the hash algorithm is optimized when necessary to avoid collisions. The historical correspondence table is used for periodic checks and updates.

Benefits of technology

It achieves zero-collision requirements for log anonymization results, ensuring data security and uniqueness, and improving the efficiency of anonymization processing and the usability of data correlation analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116186758B_ABST

Patent Text Reader

Abstract

The present disclosure provides a log desensitization method, device, equipment and medium, relating to the technical field of network security. The method comprises: extracting a sensitive information field in the log data to be desensitized; processing the sensitive information field based on a message digest algorithm MD5 to obtain an MD5 value of the sensitive information field; performing hash calculation on the sensitive information field based on a preset zero collision hash algorithm to obtain a hash value of the sensitive information field; and performing desensitization processing on the MD5 value and the hash value to obtain a desensitization result of the log data to be desensitized. The present disclosure adopts the MD5 algorithm combined with the zero collision hash algorithm, effectively solves the log desensitization processing of the sensitive information field in the log data to be desensitized, also meets the requirement of zero collision of the desensitization result, maintains the data security, and ensures the uniqueness and correlation of the data after desensitization.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of network security technology, and in particular to a log desensitization method, log desensitization device, electronic device, and computer-readable storage medium. Background Technology

[0002] Data anonymization is an important data security protection method. Data anonymization refers to processing certain private data, such as data transformation, to achieve reliable protection of private data.

[0003] In related technologies, common data masking methods primarily employ Message-Digest Algorithm (MD5) or other low-collision, high-performance hash algorithms. However, existing log masking methods suffer from limitations such as reliance on a single algorithm, the requirement of highly complex computations for some algorithms, and significant performance overhead and low efficiency with large datasets. This is particularly problematic in applications involving Deep Packet Inspection (DPI) system logs, where sensitive information is typically user-related, unique, and interconnected, making it difficult for existing data masking methods to achieve zero-collision requirements.

[0004] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0005] This disclosure provides a log desensitization method, apparatus, device, and medium, which at least to some extent overcomes the technical problems of data desensitization methods in related technologies having a single algorithm and being unable to achieve the requirement of zero collision.

[0006] Other features and advantages of this disclosure will become apparent from the following detailed description, or may be learned in part from practice of this disclosure.

[0007] According to one aspect of this disclosure, a log de-identification method is provided, comprising: extracting sensitive information fields from log data to be de-identified; processing the sensitive information fields based on the message digest algorithm MD5 to obtain the MD5 value of the sensitive information fields; performing hash calculation on the sensitive information fields based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information fields; and performing de-identification processing on the MD5 value and the hash value to obtain the de-identification result of the log data to be de-identified.

[0008] In one embodiment of this disclosure, after performing desensitization processing on the MD5 value and the hash value to obtain the desensitization result of the log data to be desensitized, the method further includes: storing the sensitive information field, the MD5 value of the sensitive information field, and the desensitization result of the log data to be desensitized in a historical correspondence table.

[0009] In one embodiment of this disclosure, the method further includes: periodically traversing the historical correspondence table to find whether there are at least two sensitive information fields with the same MD5 value but different sensitive information fields; if there are at least two sensitive information fields with the same MD5 value, then determining whether the hash values to be verified obtained by hashing the at least two sensitive information fields with the same MD5 value based on a preset zero-collision hash algorithm are the same; if so, then reselecting a zero-collision hash algorithm based on the preset zero-collision hash algorithm so that the hash values of the at least two sensitive information fields with the same MD5 value are different.

[0010] In one embodiment of this disclosure, the step of reselecting a zero-collision hash algorithm based on a preset zero-collision hash algorithm includes: selecting a hash algorithm to be optimized and an initial modulus factor based on a preset hash algorithm priority, wherein the priority of the hash algorithm to be optimized is higher than the priority of the preset zero-collision hash algorithm; performing hash calculation on at least two sensitive information fields with the same MD5 value based on the hash algorithm to be optimized and the initial modulus factor to obtain a hash value to be verified for at least two sensitive information fields with the same MD5 value; if the hash values to be verified for the at least two sensitive information fields with the same MD5 value are different, then the hash algorithm to be optimized is selected as the zero-collision hash algorithm for reselection.

[0011] In one embodiment of this disclosure, the method further includes: if at least two sensitive information fields with the same MD5 value have the same hash value to be verified, then calling the modulus factor selection function corresponding to the hash algorithm to be optimized to calculate the modulus factor; if the modulus factor of the hash algorithm to be optimized does not meet the preset condition, then performing hash calculation on the at least two sensitive information fields with the same MD5 value based on the hash algorithm to be optimized and the modulus factor, until the updated hash values to be verified of the at least two sensitive information fields with the same MD5 value are different.

[0012] In one embodiment of this disclosure, the method further includes: if the modulus factor of the hash algorithm to be optimized meets the preset condition, then a new hash algorithm is selected for calculation, wherein the priority of the new hash algorithm is higher than the priority of the hash algorithm to be optimized.

[0013] In one embodiment of this disclosure, the method further includes: if no sensitive information field with the same MD5 value is found in multiple consecutive periods, then a new zero-collision hash algorithm is selected based on a preset zero-collision hash algorithm. The priority of the new zero-collision hash algorithm is lower than that of the preset zero-collision hash algorithm, so that the hash values of some sensitive information fields in the historical relationship correspondence table are different for the new zero-collision hash algorithm; and the preset zero-collision hash algorithm is updated with the new zero-collision hash algorithm.

[0014] In one embodiment of this disclosure, the preset zero-collision hash algorithm includes at least one of the following: SDBM hash algorithm, RS hash algorithm, JS hash algorithm, and BKDR hash algorithm; the preset hash algorithm priorities from low to high are SDBM hash algorithm, RS hash algorithm, JS hash algorithm, and BKDR hash algorithm.

[0015] In one embodiment of this disclosure, the step of de-identifying the MD5 value and the hash value to obtain the de-identified log data includes: mixing the MD5 value and the hash value through a concatenation method or an alternating misalignment method to obtain the de-identified log data.

[0016] According to another aspect of this disclosure, a log de-identification device is also provided, comprising: an information extraction module for extracting sensitive information fields from log data to be de-identified; a first calculation module for processing the sensitive information fields based on the Message Digest Algorithm (MD5) to obtain the MD5 value of the sensitive information fields; a second calculation module for performing hash calculation on the sensitive information fields based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information fields; and a de-identification processing module for performing de-identification processing on the MD5 value and the hash value to obtain the de-identification result of the log data to be de-identified.

[0017] According to another aspect of this disclosure, an electronic device is also provided, including a processor and a memory for storing executable instructions of the processor, wherein the processor is configured to perform the above-described log de-identification method by executing the executable instructions.

[0018] According to another aspect of this disclosure, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed by a processor, implements the above-described log desensitization method.

[0019] According to another aspect of this disclosure, a computer program product or computer program is also provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the log desensitization method described above.

[0020] In the embodiments of this disclosure, sensitive information fields are extracted from the log data to be anonymized; the sensitive information fields are processed based on the MD5 algorithm to obtain the MD5 value of the sensitive information fields; the sensitive information fields are hashed based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information fields; the MD5 value and the hash value are anonymized to obtain the anonymized result of the log data to be anonymized. The embodiments of this disclosure use the MD5 algorithm combined with a simple and effective zero-collision hash algorithm, which effectively solves the log anonymization processing of sensitive information fields in the log data to be anonymized, and also achieves the requirement of zero collision in the anonymization result. This maintains data security, ensures the uniqueness and relevance of the anonymized data, ensures the availability of subsequent data association analysis applications, and improves the efficiency of the anonymization processing.

[0021] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0022] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0023] Figure 1 This diagram illustrates an exemplary system architecture of a log desensitization method provided in an embodiment of this disclosure.

[0024] Figure 2 This diagram illustrates a log desensitization method provided in an embodiment of the present disclosure.

[0025] Figure 3 This illustration shows a flowchart of processing sensitive information fields using MD5 according to an embodiment of the present disclosure;

[0026] Figure 4 This diagram illustrates the desensitization process performed using the connection method or alternating misalignment method provided in the embodiments of this disclosure.

[0027] Figure 5 This diagram illustrates another log desensitization method provided in an embodiment of the present disclosure;

[0028] Figure 6 This diagram illustrates a flowchart of yet another log desensitization method provided in an embodiment of this disclosure;

[0029] Figure 7 This diagram illustrates a flowchart of yet another log desensitization method provided in an embodiment of this disclosure;

[0030] Figure 8 This diagram illustrates a flowchart of yet another log desensitization method provided in an embodiment of this disclosure;

[0031] Figure 9 A flowchart illustrating a specific example of a log desensitization method provided in this disclosure embodiment is shown.

[0032] Figure 10 This diagram illustrates a specific example flowchart of an automated zero-collision hash algorithm provided in an embodiment of this disclosure.

[0033] Figure 11 This diagram illustrates the structure of a log desensitization device according to an embodiment of the present disclosure.

[0034] Figure 12 This diagram illustrates a structural block diagram of an electronic device provided in an embodiment of the present disclosure. Detailed Implementation

[0035] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided so that this disclosure will be more comprehensive and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

[0036] Furthermore, the accompanying drawings are merely illustrative of this disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted. Some block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.

[0037] The specific implementation methods of the embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.

[0038] Figure 1 A schematic diagram of an exemplary system architecture that can be applied to the log desensitization method of embodiments of this disclosure is shown.

[0039] like Figure 1 As shown, the system architecture 100 may include a terminal device 110, a network 120, and a server 130.

[0040] Network 120 is a medium used to provide a communication link between terminal device 110 and server 130. Network 120 can be a wired network or a wireless network.

[0041] Optionally, the aforementioned wireless access network or wired network uses standard communication technologies and / or protocols. Network 120 is typically the Internet, but can also be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network, private network or virtual private network, and any combination thereof. In some embodiments, technologies and / or formats including Hyper Text Markup Language (HTML), Extensible Markup Language (XML), etc., are used to represent data exchanged over the network. Furthermore, conventional encryption technologies such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), and Internet Protocol Security (IPsec) can be used to encrypt all or some links. In other embodiments, custom and / or dedicated data communication technologies can be used to replace or supplement the aforementioned data communication technologies.

[0042] Understandably, a wireless communication system is a network that provides wireless communication functionality. Communication systems can employ different communication technologies, such as Code Division Multiple Access (CDMA), Wideband CDMA, Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Orthogonal Frequency Division Multiple Access (OFDMA), Single Carrier Frequency Division Multiple Access (SFMDMA), and Carrier Sense Multiple Access / Collision Avoidance. Based on factors such as capacity, speed, and latency, networks can be categorized as 2G (generation) networks, 3G networks, 4G networks, or future evolution networks, such as 5G networks. 5G networks are often simply referred to as networks or systems.

[0043] Terminal equipment 110, also known as terminal, user terminal, user equipment (UE), etc., can be various electronic devices, including but not limited to smartphones, tablets, laptops, desktop computers, etc.

[0044] Optionally, the client applications installed on different terminal devices 110 may be the same, or clients of the same type of application based on different operating systems. Depending on the terminal platform, the specific form of the application client may also differ; for example, the application client may be a mobile client, a PC client, etc.

[0045] Server 130 can be a server that provides various services, such as a back-end management server that supports the device operated by the user using terminal device 110. The back-end management server can analyze and process received requests and other data, and feed the processing results back to terminal device 110.

[0046] Optionally, server 130 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.

[0047] Those skilled in the art will know that Figure 1 The number of terminal devices 110, networks 120, and servers 130 is merely illustrative; any number of terminal devices 110, networks 120, and servers 130 can be included according to actual needs. This disclosure does not limit the number of such devices.

[0048] Based on this, the log desensitization method provided in this embodiment extracts sensitive information fields from the log data to be desensitized; processes the sensitive information fields based on the MD5 algorithm to obtain the MD5 value of the sensitive information fields; performs hash calculation on the sensitive information fields based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information fields; and performs desensitization processing on the MD5 value and the hash value to obtain the desensitization result of each log data to be desensitized. This embodiment of the present disclosure uses the MD5 algorithm combined with a simple and effective zero-collision hash algorithm, which effectively solves the log desensitization processing of sensitive information fields in the log data to be desensitized, and also achieves the requirement of zero collision in the desensitization result. It maintains data security, ensures the uniqueness and relevance of the desensitized data, ensures the availability of subsequent data association analysis applications, and improves the efficiency of desensitization processing.

[0049] The following detailed description of this exemplary implementation method is provided in conjunction with the accompanying drawings and embodiments.

[0050] Under the above system architecture, this disclosure provides a log de-identification method, which can be executed by any electronic device with computing capabilities. In some embodiments, the log de-identification method provided in this disclosure can be executed by the aforementioned server; in other embodiments, the log de-identification method provided in this disclosure can be implemented by the server and the terminal device through interaction.

[0051] Figure 2 A flowchart of a log de-identification method according to an embodiment of this disclosure is shown. Figure 2 As shown, the log desensitization method provided in this embodiment includes the following steps:

[0052] S202. Extract sensitive information fields from the log data to be de-identified.

[0053] The log data to be de-identified in S202 above is the source data used for outputting logs after DPI processing. For example, the log data to be de-identified can be a Java object, whose data structure includes an object header, instance data, and object padding. The instance data is the actual valid information stored in the object, containing attribute data. Therefore, specific sensitive information fields can be obtained from the instance data of each object in the log data to be de-identified.

[0054] By pre-setting sensitive information keywords, sensitive information fields can be extracted from the log data to be de-identified.

[0055] In one embodiment, the log data to be de-identified may include one or more sensitive information fields; if there are multiple sensitive information fields, the de-identification process can be performed on each of the multiple sensitive information fields to obtain the de-identification result of the log data to be de-identified.

[0056] Sensitive information fields may include one or more of the following: the user's mobile phone number, ID card number, bank card number, etc.

[0057] S204. The sensitive information field is processed based on the Message Digest Algorithm (MD5) to obtain the MD5 value of the sensitive information field.

[0058] The MD5 algorithm in S204 above is a cryptographic hash function that can generate a 128-bit hash value to ensure that the transmitted information is completely consistent.

[0059] In the MD5 algorithm, the sensitive information field of the input is first padded bit by bit, and the result of taking the number of bits of the padded data modulo 512 is 448. After the data padding is completed, a 64-bit data representing the original length of the data is added to the end of the above data. When the padding and the added data are completed, the length of the resulting data is an integer multiple of 512.

[0060] Using the obtained data as input to the MD5 algorithm, the output consists of four 32-bit blocks. Concatenating these four 32-bit blocks generates a 128-bit hash value, which is the MD5 value of the sensitive information field. The MD5 algorithm flowchart can be found here. Figure 3 .

[0061] It should be noted that when there are at least two sensitive information fields in the log data to be de-identified, after the sensitive information fields are processed by the MD5 algorithm, the MD5 values of at least two sensitive information fields may be the same, that is, the MD5 algorithm cannot prevent collisions.

[0062] S206. Perform hash calculation on the sensitive information field based on the preset zero-collision hash algorithm to obtain the hash value of the sensitive information field.

[0063] It should be noted that the default zero-collision hash algorithm includes at least one of the following: SDBM hash algorithm, RS hash algorithm, JS hash algorithm, and BKDR hash algorithm. In addition, the default zero-collision hash algorithm can also be other hash algorithms capable of processing strings.

[0064] The hash value of the aforementioned sensitive information field is a fixed-length string obtained by calculating the input sensitive information field using a preset zero-collision hash algorithm. For example, the hashCode() function of the JAVA field is a hash algorithm, and the output is a fixed four-byte (32-bit binary) integer, represented in hexadecimal (one digit is represented as 4 binary digits), in the form of an eight-digit number: 7a9d88e8.

[0065] S208. Perform de-identification processing on the MD5 value and hash value to obtain the de-identification result of the log data to be de-identified.

[0066] The desensitization process in this embodiment may include, but is not limited to, mixing the MD5 value and hash value through a connection method, or mixing the MD5 value and hash value through an alternating misalignment method.

[0067] For example, such as Figure 4 As shown, the concatenation method refers to concatenating the last digit of the MD5 value with the first digit of the hash value, and the resulting string is the de-identified result of the log data to be de-identified.

[0068] For example, such as Figure 4 As shown, the alternating misalignment method refers to grouping the MD5 value into multiple MD5 subgroups, grouping the hash value into multiple hash subgroups, and inserting hash subgroups between adjacent MD5 subgroups. The resulting string is the de-identified string of the log data to be de-identified. For example, the MD5 value is divided into 4 subgroups, each 32 bits long, and the hash value is divided into 4 subgroups, each 2 bits long. The first three hash subgroups are inserted between the MD5 subgroups, and the fourth hash subgroup is concatenated with the fourth MD5 subgroup. It should be noted that the number of MD5 subgroups and the number of hash subgroups can be the same or different, depending on the specific situation.

[0069] In the embodiments of this disclosure, sensitive information fields are extracted from the log data to be de-identified; the sensitive information fields are processed based on the MD5 algorithm to obtain the MD5 value of the sensitive information fields; the sensitive information fields are hashed based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information fields; and the MD5 value and hash value are de-identified to obtain the de-identified result of the log data to be de-identified. This embodiment of the disclosure uses the MD5 algorithm combined with a simple and effective hash algorithm, which effectively solves the log de-identification processing of sensitive information fields in the log data to be de-identified, and also achieves the requirement of zero collision in the de-identification result. This maintains data security, ensures the uniqueness and relevance of the de-identified data, guarantees the availability of subsequent data association analysis applications, and improves the efficiency of the de-identification process.

[0070] Figure 5 This diagram illustrates another log desensitization method provided in an embodiment of this disclosure. Figure 2 Based on the embodiment, S209 is added after S208 to limit the scheme for storing relevant information. For example... Figure 5 As shown, in one embodiment, the log anonymization method provided in this disclosure includes steps S202 to S209. Specifically, the method includes:

[0071] S209. Store the sensitive information fields, the MD5 values of the sensitive information fields, and the desensitization results of the log data to be desensitized in the historical correspondence table.

[0072] It should be noted that the implementation methods of S202 to S208 are the same as the specific implementation methods of S202 to S208 in the previous embodiments, and will not be repeated here.

[0073] The aforementioned historical correspondence table is used to record the correspondence between sensitive information fields and their MD5 values, as well as the desensitization results of the log data to be desensitized.

[0074] When multiple sensitive information fields are stored in the historical correspondence table, after the MD5 algorithm processes the sensitive information fields, at least two of the sensitive information fields may have the same MD5 value, meaning that the MD5 algorithm cannot prevent collisions.

[0075] By traversing the historical mapping table, the relationship between sensitive information fields and their corresponding MD5 values is compared, thereby filtering records with the same MD5 value but different sensitive information fields. If such records exist, it indicates a collision. This disclosure stores the sensitive information fields and their MD5 values in the historical mapping table for subsequent optimization and reconstruction of the zero-collision hash algorithm.

[0076] Figure 6 This diagram illustrates a flowchart of yet another log desensitization method provided in an embodiment of this disclosure. Figure 1 Based on the previous example, S602 to S604 are added after S209 to achieve zero collisions through periodic optimization and reconstruction of the hash algorithm. Figure 6 As shown, the log desensitization method provided in this embodiment includes steps S202-S209 and S602-S604. Specifically, the method includes:

[0077] S602. Periodically traverse the historical correspondence table to find if there are at least two sensitive information fields with the same MD5 value but different sensitive information fields.

[0078] S604. If it exists, determine whether the hash values to be verified obtained by hashing at least two sensitive information fields with the same MD5 value based on the preset zero-collision hash algorithm are the same. If they are the same, reselect the zero-collision hash algorithm from the preset algorithm library so that the hash values of the reselected zero-collision hash algorithm for at least two sensitive information fields with the same MD5 value are different; update the preset zero-collision hash algorithm with the reselected zero-collision hash algorithm.

[0079] It should be noted that the implementation methods of S202 to S209 are the same as the specific implementation methods of S202 to S209 in the previous embodiments, and will not be repeated here.

[0080] The MD5 algorithm is a lossy compression collision-resistant method, not a zero-collision method. The probability of collisions in MD5 values processed by the MD5 algorithm is very low. This disclosure periodically traverses a historical correspondence table, comparing the relationship between sensitive information fields and their corresponding MD5 values to filter records with the same MD5 value but different sensitive information fields. If no sensitive information field with the same MD5 value and different sensitive information field exists in the historical correspondence table, the process ends. If at least two sensitive information fields with the same MD5 value and different sensitive information fields exist in the historical correspondence table, it indicates that there is a collision between these at least two sensitive information fields with the same MD5 value and different sensitive information fields. Therefore, a suitable zero-collision hash algorithm is selected to achieve zero collisions.

[0081] The period for traversing the historical correspondence table is pre-configured in the log desensitization device. It can be determined according to the actual situation. For example, the period can be configured as 1 day, 2 days, 7 days, etc., without specific limitations.

[0082] In addition, the preset zero-collision hash algorithm can be optimized and updated by counting the number of sensitive information fields with the same MD5 value in the historical correspondence table. When the number is greater than or equal to a preset value, the above sensitive information fields can be filtered.

[0083] When there are at least two records with the same MD5 value but different sensitive information fields, it is necessary to distinguish the sensitive information fields with the same MD5 value by using hash values. Therefore, it is first determined whether the hash values to be verified obtained by hashing the sensitive information fields with the same MD5 value using a preset zero-collision hash algorithm are the same. If they are different, it means that the sensitive information fields with the same MD5 value can be distinguished by de-identifying the MD5 value and the hash value. If the hash values to be verified of the sensitive information fields with the same MD5 value are the same, it means that there is a collision and they cannot be distinguished by hash values. In this case, a zero-collision hash algorithm is reselected from the algorithm library. The algorithm library is pre-configured with multiple zero-collision hash algorithms to make the hash values of the sensitive information fields with the same MD5 value different.

[0084] Figure 7 This diagram illustrates a flowchart of yet another log desensitization method provided in an embodiment of this disclosure. Figure 6 Based on the embodiment, S604 is further refined into S6042 to S6046 to further limit the reselection of a zero-collision hash algorithm based on a preset zero-collision hash algorithm. For example... Figure 7 As shown, in one embodiment, the log desensitization method provided in this disclosure includes:

[0085] S6042. Based on the preset hash algorithm priority, select the hash algorithm to be optimized and the corresponding initial modulus factor, wherein the priority of the hash algorithm to be optimized is higher than the priority of the preset zero-collision hash algorithm.

[0086] S6044. Based on the hash algorithm to be optimized and the corresponding initial modulus factor, perform hash calculation on at least two sensitive information fields with the same MD5 value to obtain the hash value to be verified for at least two sensitive information fields with the same MD5 value.

[0087] S6046. If at least two sensitive information fields with the same MD5 value have different hash values to be verified, then the hash algorithm to be optimized shall be used as the zero-collision hash algorithm to be reselected.

[0088] In one embodiment, the preset hash algorithms, from lowest to highest priority, are SDBM hash algorithm, RS hash algorithm, JS hash algorithm, and BKDR hash algorithm. These preset hash algorithms and their priorities can be pre-configured in an algorithm library for later use. The diversity of the algorithm library is increased by designing a relatively universal string hash function and by continuously adding more hash algorithms. The variables for each hash algorithm include a modulus factor and a modulus factor selection function, used to perform hash calculations on the input sensitive information field and output the hash value of the sensitive information field.

[0089] For example, the priority of the preset hash algorithm in the algorithm library is determined based on the analysis of a certain sample data. For example, if a sensitive information field has 100,000 mobile phone number samples, hash calculations are performed on them, and the algorithm priority is determined based on the ranking of the hash degree of the calculation results.

[0090] The priority of the hash algorithm to be optimized is higher than that of the preset zero-collision hash algorithm. This includes the hash algorithm to be optimized having a priority that is at least one level higher than that of the preset zero-collision hash algorithm. For example, if the preset zero-collision hash algorithm is the SDBM hash algorithm, the RS hash algorithm, JS hash algorithm, or BKDR hash algorithm can be used as the hash algorithm to be optimized.

[0091] The priority of the hash algorithm to be optimized being higher than that of the preset zero-collision hash algorithm also includes the fact that the hash algorithm to be optimized and the preset zero-collision hash algorithm are of the same type, and that the modulus factor of the hash algorithm to be optimized is obtained based on the modulus factor of the preset zero-collision hash algorithm.

[0092] The modulus factor is usually chosen as a prime number, such as 31, to reduce hash collisions. The modulus factor P can be a fixed sequence of prime numbers. The initial modulus factor can be determined based on the size of the sensitive information field, or a prime number from the above fixed sequence of prime numbers can be used as the initial modulus factor.

[0093] The modulus factor selection function is used to select a prime number from a fixed sequence of prime numbers; the modulus factor selection functions corresponding to the hash algorithms mentioned above can be the same, or a custom calculation method can be used.

[0094] In this embodiment, when there are at least two sensitive information fields with the same MD5 value and the same hash value, a hash algorithm to be optimized and an initial modulus factor with a higher priority than the preset zero-collision hash algorithm are selected. Based on the hash algorithm to be optimized and the initial modulus factor, hash calculation is performed on the at least two sensitive information fields with the same MD5 value to obtain the hash value to be verified of the above sensitive information fields. Then, when the hash values to be verified of the above sensitive information fields are different, the hash algorithm to be optimized and the initial modulus factor are used as the zero-collision hash algorithm to be reselected, so as to achieve zero collision by combining the MD5 algorithm with a simple hash algorithm. The zero-collision hash algorithm and modulus factor are found through automated processing, and the zero-collision hash algorithm and modulus factor in the desensitization process are replaced and updated, and the optimization process ends. Since the low collision rate of MD5 values results in a small input set, the diversity of zero-collision hash algorithms and modulus factors can be combined to fully guarantee the selection of zero-collision hash functions and modulus factors, and the applicability is greatly improved.

[0095] Figure 8 This diagram illustrates a flowchart of yet another log desensitization method provided in an embodiment of this disclosure. Figure 7 Based on the embodiment, the above-mentioned S604 further includes S6047 to S6048 to further limit the reselection of a zero-collision hash algorithm based on a preset zero-collision hash algorithm. Figure 8 As shown, in one embodiment, the log desensitization method provided in this disclosure includes:

[0096] S6047. If there are at least two sensitive information fields with the same MD5 value and the same hash value to be verified, then call the modulus factor selection function corresponding to the hash algorithm to be optimized to calculate the modulus factor.

[0097] S6048. If the modulus factor of the hash algorithm to be optimized does not meet the preset conditions, then based on the hash algorithm to be optimized and the modulus factor, perform hash calculation on at least two sensitive information fields with the same MD5 value, and re-determine whether the hash values of at least two sensitive information fields with the same MD5 value are the same, until the modulus factor of the hash algorithm to be optimized meets the preset conditions.

[0098] Optionally, the log desensitization method provided in this disclosure further includes: S6049 If the modulus factor of the hash algorithm to be optimized meets the preset conditions, then a new hash algorithm is selected for calculation, wherein the priority of the newly selected hash algorithm is higher than the priority of the hash algorithm to be optimized.

[0099] When at least two sensitive information fields have the same hash value, it indicates that the MD5 value and hash value of the at least two sensitive information fields are the same. The same desensitization process cannot distinguish between the at least two sensitive information fields, indicating a collision. The modulus factor selection function corresponding to the hash algorithm to be optimized is called to calculate the modulus factor. It is then determined whether the modulus factor meets the preset conditions. If it does not meet the conditions, the hash calculation is performed on the at least two sensitive information fields with the same MD5 value based on the hash algorithm to be optimized and the recalculated modulus factor. The hash values obtained are then determined to be the same. If they are different, the process ends, and the hash algorithm to be optimized and the modulus factor are used as the preset zero-collision hash algorithm. If they are the same, the steps of calling the modulus factor selection function corresponding to the hash algorithm to be optimized and calculating the modulus factor are repeated to continue optimization until the hash values of the at least two sensitive information fields with the same MD5 value are different.

[0100] In one embodiment, the modulus factor can be selected from a fixed sequence of prime numbers. When the output value of the modulus factor selection function is a certain prime number, it indicates that the aforementioned prime number has not been obtained for optimizing the hash algorithm. When the output value of the modulus factor selection function is 0, it indicates that all the values in the aforementioned fixed exponent sequence have been obtained. Whether the modulus factor calculated by the modulus factor selection function is 0 can be used as the criterion for judging whether the preset conditions are met.

[0101] It should be noted that, in addition to the above-mentioned criteria, other prompts, letters, etc. can be inserted into the exponential sequence as criteria. The prompts can be non-prime numbers, special symbols, etc., and this disclosure does not impose specific limitations.

[0102] To facilitate a deeper understanding of the technical solution disclosed herein, the following is combined with... Figure 9 Specific examples will be provided to illustrate this.

[0103] The entire method consists of two processes. The first process is the basic desensitization process, including S901 to S905, in which...

[0104] S901: Enables access to DPI system logs, i.e., obtains log data to be de-identified;

[0105] S902: According to the DPI log specification, obtain the sensitive information field F;

[0106] S903: Calculate the MD5 value M of the sensitive information field F based on the MD5 algorithm;

[0107] S904: The sensitive information field is hashed using a preset zero-collision hash algorithm to obtain the hash value H of the sensitive field F;

[0108] S905: The desensitization result G of sensitive information field F is formed by mixing MD5 value M and hash value H through concatenation, alternating misalignment and other methods.

[0109] S906: Store the sensitive information field F, the MD5 value M of the sensitive information field F, and the desensitization result G in the historical correspondence table for subsequent optimization and reconstruction of the zero-collision hash algorithm.

[0110] The second process involves periodically analyzing and processing the data in the historical corresponding table, optimizing and reconstructing the zero-collision hash algorithm. Since MD5 is a lossy compression collision-resistant method, but not zero-collision, zero-collision hashing is combined with MD5 values that collide to achieve zero-collision hashing. Sensitive information fields in the DPI system logs can generally be treated as strings, allowing for the design and use of common string hash functions (such as SDBMHash, RSHash, JSHash, BKDRHash). More algorithms can be added later to increase the diversity of the algorithm library. The process automates the search for zero-collision hash functions and modulus factors, replacing and updating the zero-collision hash function and modulus factor in the desensitization process to end the process. Because the low collision rate of MD5 results in a small input set, the variety of preset hash processing functions and modulus factors ensures the selection of zero-collision hash functions and modulus factors. The steps are as follows:

[0111] S907: In the historical correspondence table, records with the same MD5 result value but different sensitive field values are filtered based on the MD5 value. Since the MD5 is already low collision, the collision probability is very low, so there should be very few records that collide.

[0112] S908: Determine if there are any identical ones. If there are, execute S909; otherwise, end the current cycle task and wait for the next cycle task to process.

[0113] S909: Automatically obtains zero-collision hash algorithm and modulus factor;

[0114] S910: Update or replace the default zero-collision hash algorithm with a suitable zero-collision hash algorithm and modulus factor.

[0115] like Figure 10 The detailed process of automatically obtaining the zero-collision hash algorithm and modulus factor in S909 is as follows:

[0116] S1001. Obtain the sensitive field information F that caused the collision of the MD5 value M;

[0117] S1002. Select the hash algorithm G(F, P), the initial modulus factor P, and the modulus factor selection function F(P) according to the priority of the algorithm library.

[0118] S1003, Call G(F, P) to calculate the hash value H;

[0119] S1004. Determine if the hash values H are the same. If they are the same, proceed to S1005; otherwise, proceed to S1007.

[0120] S1005, Call F(P) to calculate the new modulus factor P;

[0121] S1006. Determine if the new modulus factor P is 0; if it is 0, execute S1002; if it is not 0, execute S1003.

[0122] S1007. Select the hash algorithm G(F, P) and the modulus factor P as the reselected zero-collision hash algorithm.

[0123] The above process will be explained in detail below with specific examples:

[0124] Suppose there are two sets of sensitive information fields with the same MD5 value. The first set of sensitive information fields with the same MD5 value is denoted as F11, F12, and F13, which includes three sensitive information fields and their corresponding MD5 value is denoted as M1. The second set of sensitive information fields with the same MD5 value M2 is denoted as F21 and F22, which includes two sensitive information fields and their corresponding MD5 value is M2. They are represented by the following formulas:

[0125] MD5(F11)=MD5(F12)=MD5(F13)=M1

[0126] MD5(F21) = MD5(F22) = M2.

[0127] Select the hash algorithm G1(F, P) to be optimized and the initial modulus factor P1 from the algorithm library. Calculate the hash values of the first group of sensitive information fields and the second group of sensitive information fields, respectively, and denote them as H11, H12, H13, H21, and H22, as follows:

[0128] G1(F11, P1) = H11

[0129] G1(F12, P1) = H12

[0130] G1(F13, P1) = H13

[0131] G1(F21, P1) = H21

[0132] G1(F22, P1) = H22

[0133] If H11 = H13, and the other hash values are not equal, F(P1) needs to be called to select a new modulus factor P2 = F(P1), and the above calculation is performed again to obtain the hash values of the first group of sensitive information fields and the hash values of the second group of sensitive information fields, which are denoted as H11', H12', H13', H21', and H22' respectively, and are represented as follows:

[0134] G1(F11, P2) = H11'

[0135] G1(F12, P2) = H12'

[0136] G1(F13, P2) = H13'

[0137] G1(F21, P2) = H21'

[0138] G1(F22, P2) = H22'

[0139] If a situation arises where H21 = H22 or other groups of hash values are equal, then continue to obtain a new modulus factor based on the modulus factor selection function until the hash values of each sensitive information field are different.

[0140] Furthermore, when the new modulus factor obtained by the modulus factor selection function is 0, i.e., F(Px) = 0, a new algorithm is obtained from the algorithm library and the above process continues until no collision occurs. It should be noted that the priority of the new hash algorithm is higher than the priority of the previous hash algorithm to be optimized. The priority of the new hash algorithm can be at least one level higher than the priority of the previous hash algorithm to be optimized, or the modulus factor of the new hash algorithm is calculated based on the modulus factor of the previous hash algorithm to be optimized using the corresponding modulus factor selection function.

[0141] In actual implementation, there may be situations where multiple sensitive information fields have the same MD5 value. The process is similar to the one described above, and will not be repeated here.

[0142] Based on the same inventive concept, this disclosure also provides a log desensitization device, as described in the following embodiments. Since the principle by which the device embodiment solves the problem is similar to that of the method embodiment described above, the implementation of the device embodiment can refer to the implementation of the method embodiment described above, and repeated details will not be repeated.

[0143] Figure 11 A schematic diagram of a log desensitization device according to an embodiment of this disclosure is shown. Figure 11 As shown, in one embodiment, the log desensitization device provided in this disclosure includes an information extraction module 1101, a first calculation module 1102, a second calculation module 1103, and a desensitization processing module 1104.

[0144] Among them, the information extraction module 1101 is used to extract sensitive information fields from the log data to be de-identified;

[0145] The first calculation module 1102 is used to process the sensitive information field based on the message digest algorithm MD5 to obtain the MD5 value of the sensitive information field;

[0146] The second calculation module 1103 is used to perform hash calculation on the sensitive information field based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information field;

[0147] The de-identification processing module 1104 is used to de-identify MD5 values and hash values to obtain the de-identification result of the log data to be de-identified.

[0148] In one embodiment, the apparatus further includes a data storage module (not shown in the figures) for storing the sensitive information field, the MD5 value of the sensitive information field, and the desensitization result of the log data to be desensitized in a historical correspondence table.

[0149] In one embodiment, the device further includes a hash algorithm optimization module (not shown in the figures), which periodically traverses a historical correspondence table to search for at least two sensitive information fields with the same MD5 value but different sensitive information fields. If such a table exists, the device determines whether the hash values to be verified obtained by hashing the at least two sensitive information fields with the same MD5 value using a preset zero-collision hash algorithm are the same. If they are the same, a new zero-collision hash algorithm is selected from a preset algorithm library so that the hash values of the newly selected zero-collision hash algorithm for the at least two sensitive information fields with the same MD5 value are different. The preset zero-collision hash algorithm is then updated with the newly selected zero-collision hash algorithm.

[0150] In one embodiment, the hash algorithm optimization module is specifically used to select a hash algorithm to be optimized and an initial modulus factor based on a preset hash algorithm priority, wherein the priority of the hash algorithm to be optimized is higher than the priority of the preset zero-collision hash algorithm; based on the hash algorithm to be optimized and the initial modulus factor, perform hash calculation on at least two sensitive information fields with the same MD5 value to obtain a hash value to be verified for at least two sensitive information fields with the same MD5 value; if the hash values to be verified for at least two sensitive information fields with the same MD5 value are different, then the hash algorithm to be optimized is used as the zero-collision hash algorithm to be reselected.

[0151] In one embodiment, the hash algorithm optimization module is specifically used to: if at least two sensitive information fields with the same MD5 value have the same hash value to be verified, call the modulus factor selection function corresponding to the hash algorithm to be optimized to calculate the modulus factor; if the modulus factor of the hash algorithm to be optimized does not meet the preset condition, then based on the hash algorithm to be optimized and the modulus factor, perform hash calculation on the at least two sensitive information fields with the same MD5 value, and re-determine whether the hash values of the at least two sensitive information fields with the same MD5 value are the same, until the modulus factor of the hash algorithm to be optimized meets the preset condition.

[0152] In one embodiment, the hash algorithm optimization module is specifically used to reselect a hash algorithm for calculation if the modulus factor of the hash algorithm to be optimized meets a preset condition, wherein the priority of the reselected hash algorithm is higher than the priority of the hash algorithm to be optimized.

[0153] It should be noted that the preset zero-collision hash algorithm includes at least one of the following: SDBM hash algorithm, RS hash algorithm, JS hash algorithm, and BKDR hash algorithm; the preset hash algorithm priorities from low to high are SDBM hash algorithm, RS hash algorithm, JS hash algorithm, and BKDR hash algorithm.

[0154] In one embodiment, the desensitization processing module 1004 is used to perform mixed processing on the MD5 value and hash value through a connection method or an alternating misalignment method to obtain the desensitization result of the log data to be desensitized.

[0155] Those skilled in the art will understand that various aspects of this disclosure can be implemented as a system, method, or program product. Therefore, various aspects of this disclosure can be specifically implemented in the following forms: a completely hardware implementation, a completely software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, collectively referred to herein as a "circuit," "module," or "system."

[0156] The following reference Figure 12 To describe an electronic device 1200 according to such an embodiment of the present disclosure. Figure 12 The electronic device 1200 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments disclosed herein.

[0157] like Figure 12 As shown, the electronic device 1200 is manifested in the form of a general-purpose computing device. The components of the electronic device 1200 may include, but are not limited to: at least one processing unit 1210, at least one storage unit 1220, and a bus 1230 connecting different system components (including storage unit 1220 and processing unit 1210).

[0158] The storage unit stores program code that can be executed by the processing unit 1210, causing the processing unit 1210 to perform the steps described in the "Exemplary Methods" section of this specification according to various exemplary embodiments of this disclosure. For example, the processing unit 1210 can perform the following steps of the above method embodiments: extracting sensitive information fields from the log data to be de-identified; processing the sensitive information fields based on the MD5 message digest algorithm to obtain the MD5 value of the sensitive information fields; performing hash calculation on the sensitive information fields based on a preset zero-collision hash algorithm to obtain the hash value of the sensitive information fields; and de-identifying the MD5 value and the hash value to obtain the de-identified result of the log data to be de-identified.

[0159] Storage unit 1220 may include a readable medium in the form of a volatile storage unit, such as random access memory (RAM) 12201 and / or cache memory 12202, and may further include a read-only memory (ROM) 12203.

[0160] Storage unit 1220 may also include a program / utility 12204 having a set (at least one) of program modules 12205, such program modules 12205 including but not limited to: operating system, one or more application programs, other program modules and program data, each or some combination of these examples may include an implementation of a network environment.

[0161] Bus 1230 can represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus using any of the various bus structures.

[0162] Electronic device 1200 can also communicate with one or more external devices 1240 (e.g., keyboard, pointing device, Bluetooth device, etc.), and with one or more devices that enable a user to interact with the electronic device 1200, and / or with any device that enables the electronic device 1200 to communicate with one or more other computing devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 1250. Furthermore, electronic device 1200 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 1260. Figure 12As shown, network adapter 1260 communicates with other modules of electronic device 1200 via bus 1230. It should be understood that, although not shown in the figure, other hardware and / or software modules can be used in conjunction with electronic device 1200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0163] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the methods according to the embodiments of this disclosure.

[0164] Specifically, according to embodiments of this disclosure, the process described above with reference to the flowchart can be implemented as a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the inactive security configuration information delivery method described above.

[0165] In exemplary embodiments of this disclosure, a computer-readable storage medium is also provided, which may be a readable signal medium or a readable storage medium. The computer-readable storage medium stores a program product capable of implementing the methods described above. In some possible implementations, various aspects of this disclosure may also be implemented as a program product including program code, which, when run on a terminal device, causes the terminal device to perform the steps according to various exemplary embodiments of this disclosure described in the "Exemplary Methods" section of this specification.

[0166] More specific examples of computer-readable storage media in this disclosure may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0167] In this disclosure, a computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of transmitting, propagating, or transmitting a program for use by or in connection with an instruction execution system, apparatus, or device.

[0168] Optionally, the program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.

[0169] In practical implementation, program code for performing the operations of this disclosure can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The program code can execute entirely on the user's computing device, partially on the user's device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0170] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to embodiments of this disclosure, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.

[0171] Furthermore, although the steps of the method in this disclosure are described in a specific order in the accompanying drawings, this does not require or imply that the steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or a step may be broken down into multiple steps.

[0172] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, mobile terminal, or network device, etc.) to execute the methods according to the embodiments of this disclosure.

[0173] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the appended claims.

Claims

1. A log desensitization method, characterized in that, The method comprises the following steps: extracting a sensitive information field in to-be-desensitized log data; processing the sensitive information field based on a message digest algorithm MD5 to obtain an MD5 value of the sensitive information field; performing hash calculation on the sensitive information field based on a preset zero collision hash algorithm to obtain a hash value of the sensitive information field; performing desensitization processing on the MD5 value and the hash value to obtain a desensitization result of the to-be-desensitized log data; storing the sensitive information field, the MD5 value of the sensitive information field and the desensitization result of the to-be-desensitized log data in a historical correspondence table; periodically traversing the historical correspondence table to check whether there are at least two sensitive information fields with the same MD5 value and different sensitive information fields; if there are, judging whether to-be-verified hash values obtained by performing hash calculation on the at least two sensitive information fields with the same MD5 value based on the preset zero collision hash algorithm are the same, if the to-be-verified hash values are the same, reselecting a zero collision hash algorithm from a preset algorithm library, so that the reselected zero collision hash algorithm is different from the hash values of the at least two sensitive information fields with the same MD5 value; updating the preset zero collision hash algorithm with the reselected zero collision hash algorithm.

2. The method of claim 1, wherein, The reselecting of the zero collision hash algorithm based on the preset zero collision hash algorithm comprises the following steps: selecting a to-be-optimized hash algorithm and an initial modulus factor based on a hash algorithm priority configured by the preset algorithm library, wherein the priority of the to-be-optimized hash algorithm is higher than that of the preset zero collision hash algorithm; performing hash calculation on the at least two sensitive information fields with the same MD5 value based on the to-be-optimized hash algorithm and the initial modulus factor to obtain to-be-verified hash values of the at least two sensitive information fields with the same MD5 value; if the to-be-verified hash values of the at least two sensitive information fields with the same MD5 value are different, taking the to-be-optimized hash algorithm as the reselected zero collision hash algorithm.

3. The method of claim 2, wherein, The method further comprises the following steps: if the to-be-verified hash values of the at least two sensitive information fields with the same MD5 value are the same, calling a modulus factor selection function corresponding to the to-be-optimized hash algorithm to calculate a modulus factor; if the modulus factor of the to-be-optimized hash algorithm does not satisfy a preset condition, performing hash calculation on the at least two sensitive information fields with the same MD5 value based on the to-be-optimized hash algorithm and the modulus factor, and rejudging whether the hash values of the at least two sensitive information fields with the same MD5 value are the same, until the modulus factor of the to-be-optimized hash algorithm satisfies the preset condition.

4. The method of claim 3, wherein, The method further comprises the following steps: if the modulus factor of the to-be-optimized hash algorithm satisfies the preset condition, reselecting a hash algorithm for calculation, wherein the priority of the reselected hash algorithm is higher than that of the to-be-optimized hash algorithm.

5. The method of claim 2, wherein, The preset zero collision hash algorithm comprises at least one of the following: an SDBM hash algorithm, an RS hash algorithm, a JS hash algorithm and a BKDR hash algorithm; the preset hash algorithm priority is from low to high, and is the SDBM hash algorithm, the RS hash algorithm, the JS hash algorithm and the BKDR hash algorithm.

6. A log desensitization apparatus characterized by, The method comprises the following steps: The information extraction module is configured to extract a sensitive information field in the to-be-desensitized log data. The first calculation module is configured to process the sensitive information field based on a message digest algorithm 5 (MD5) to obtain an MD5 value of the sensitive information field. The second calculation module is configured to perform hash calculation on the sensitive information field based on a preset zero collision hash algorithm to obtain a hash value of the sensitive information field. The desensitization processing module is configured to perform desensitization processing on the MD5 value and the hash value to obtain a desensitization result of the to-be-desensitized log data. The data storage module is configured to store the sensitive information field, the MD5 value of the sensitive information field, and the desensitization result of the to-be-desensitized log data in a historical correspondence table. The hash algorithm optimization module is configured to periodically traverse the historical correspondence table to check whether there are at least two sensitive information fields with the same MD5 value and different sensitive information fields. If there are, it is determined whether to-be-verified hash values obtained by performing hash calculation on the at least two sensitive information fields with the same MD5 value based on the preset zero collision hash algorithm are the same, and if so, a zero collision hash algorithm is reselected from a preset algorithm library so that the hash values of the at least two sensitive information fields with the same MD5 value are different.

7. An electronic device, comprising: The reselected zero collision hash algorithm is updated as the preset zero collision hash algorithm. The computer program is executed by the processor to implement the log desensitization method of any one of claims 1-5. The computer program is executed by the processor to implement the log desensitization method of any one of claims 1-5.

8. A computer readable storage medium having stored thereon a computer program, characterized in that,