Packet desensitization method and device based on FPGA, equipment and medium
By implementing message desensitization through FPGA hardware and using a Bloom filter module to process messages in parallel, the problem of time-consuming software desensitization is solved, and efficient message desensitization is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DAWNING INFORMATION IND (BEIJING) CO LTD
- Filing Date
- 2022-10-25
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, message desensitization processing implemented through software has high computational complexity and is time-consuming, which cannot meet the timeliness requirements of message transmission.
The system employs an FPGA-based hardware implementation. By acquiring the matching byte length and sensitive byte hash value of the message filtering rule set, a Bloom filtering module is planned. Messages are input to the Bloom filtering module in parallel for sensitive word location identification and replacement. The FPGA is used for parallel processing to output de-identified messages.
This reduces the processing time during message desensitization, meets the timeliness requirements of message desensitization, and improves processing efficiency.
Smart Images

Figure CN115632866B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of message desensitization, and in particular to a message desensitization method, apparatus, device and medium based on FPGA (Field Programmable Gate Array). Background Technology
[0002] In today's internet age, almost everyone obtains various information through the internet, and some important data is also transmitted through the network. Although messages carrying various important data on the network are now encrypted, some sensitive information can still be exposed. This makes it easy for criminals to capture, analyze, or decrypt the messages using this exposed sensitive information, thereby obtaining various important information.
[0003] Existing technologies mainly use software to desensitize messages, which is computationally complex and time-consuming, and cannot meet the timeliness requirements of message transmission. Summary of the Invention
[0004] This invention provides a message desensitization method, apparatus, device, and medium based on FPGA, to provide a hardware-based message desensitization implementation.
[0005] In a first aspect, embodiments of the present invention provide a message desensitization method based on FPGA implementation, the method comprising:
[0006] Obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length;
[0007] In the FPGA, a Bloom filter module corresponding to the length of each comparison byte is planned, and the standard comparison hash address of each Bloom filter is set according to the hash value of the sensitive byte.
[0008] The received message is sequentially shifted and truncated according to the length of each comparison byte by the FPGA, and then input into each Bloom filter module in parallel. The candidate positions identified by each Bloom filter module are sent to the rule module.
[0009] The rules module performs precise searches at each candidate location in the received message based on the message filtering rule set, and feeds back the found target sensitive locations to the FPGA.
[0010] The FPGA reads sensitive fields from the received message based on the sensitive locations of each target, replaces each sensitive field with a preset desensitized field, and then outputs the target desensitized message.
[0011] Optionally, obtaining at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length, including:
[0012] Obtain the filtering fields corresponding to each message filtering rule, and determine the length of the comparison bytes for each filtering field;
[0013] Based on the filtering fields corresponding to the length of each compared byte, generate a sensitive byte hash value corresponding to the length of each compared byte.
[0014] The advantage of this setting is that classifying the filtering fields according to the comparison byte length and calculating the hash value of sensitive bytes can minimize the time spent calculating the hash value of the filtering fields during the message desensitization process and improve work efficiency.
[0015] Optionally, based on the filtering fields corresponding to each comparison byte length, a sensitive byte hash value corresponding to each comparison byte length is generated, including:
[0016] Obtain at least one target filtering field corresponding to the byte length of the target being processed;
[0017] Calculate the hash value corresponding to each target filter field;
[0018] Obtain the hash values of all target filtering fields, perform a union operation, and use the result of the union operation as the hash value of the sensitive byte corresponding to the target comparison byte length.
[0019] The advantage of this setting is that by performing a union of hash values according to the length of the compared bytes, it more comprehensively encompasses multiple different filtering fields, simplifying calculations and improving efficiency in subsequent filtering processes.
[0020] Optionally, the received message is sequentially shifted and truncated according to the length of each comparison byte using the FPGA, and then input in parallel to each Bloom filter module, including:
[0021] The following operations are performed in parallel using the FPGA for each Bloom filter module:
[0022] Get the current comparison byte length that matches the current Bloom filter module, and determine the position of the first character of the received message as the first shift position;
[0023] Starting from the shift position, obtain the truncated message of the current comparison byte length, and input the truncated message into the current Bloom filter module;
[0024] After updating the shift position by incrementing by 1, return to execute the operation of extracting the message from the shift position to obtain the current length of the compared bytes, until the last character of the received message is extracted.
[0025] The advantage of this setting is that by shifting and filtering the received message according to different comparison byte lengths, it can ensure that no character in the message is missed during the filtering process, thus guaranteeing the accuracy of the filtering of the received message.
[0026] Optionally, the FPGA reads sensitive fields from the received message based on the sensitive locations of each target, replaces each sensitive field with a preset desensitized field, and then outputs a target desensitized message, including:
[0027] The FPGA is used to read the sensitive fields corresponding to each target sensitive location in parallel from the received message;
[0028] After replacing each sensitive field in the received message with a preset desensitized field in parallel using an FPGA, the target desensitized message is output.
[0029] The advantage of this setup is that by replacing sensitive fields in parallel through the FPGA and directly outputting the target de-identified message, the de-identified message can be obtained efficiently, meeting the timeliness requirement of message de-identification.
[0030] Optionally, the sensitive fields corresponding to each target sensitive location are read in parallel from the received message using the FPGA, including:
[0031] The length of the comparison bytes corresponding to each target sensitive location is obtained in parallel using the FPGA;
[0032] Based on the comparison byte length corresponding to each target sensitive position, the sensitive fields corresponding to each target sensitive position are read in parallel.
[0033] The advantage of this setup is that by obtaining the comparison byte length and the sensitive field corresponding to each target sensitive position through parallel operations, the processing time during the message desensitization process is reduced, and the timeliness is improved.
[0034] Optionally, the found target sensitive locations are fed back to the FPGA via the rules module, including:
[0035] The rules module feeds back the found sensitive locations of each target and the sensitivity type corresponding to each sensitive location to the FPGA.
[0036] The FPGA is used to replace each sensitive field in the received message with a preset de-identified field in parallel, including:
[0037] The FPGA obtains the desensitized field corresponding to each sensitive field based on the sensitivity type corresponding to each sensitive location of the target.
[0038] The sensitive fields are replaced in parallel using an FPGA with their corresponding desensitized fields.
[0039] The advantage of this setup is that the rule module finds the target sensitive location and sensitive type and provides feedback, and the FPGA replaces the sensitive fields according to the sensitive type, thus realizing a customized desensitization process for received messages.
[0040] Secondly, embodiments of the present invention also provide a message desensitization device based on FPGA, the device comprising:
[0041] The sensitive word hash value determination module is used to obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length.
[0042] The Bloom filter setting module is used in the FPGA to plan the Bloom filter modules corresponding to the length of each comparison byte, and to set the standard comparison hash address of each Bloom filter according to the hash value of the sensitive byte.
[0043] The message transmission module is used to sequentially shift and truncate the received message according to the length of each comparison byte through the FPGA, and then input it in parallel to each Bloom filter module, and send the candidate positions identified by each Bloom filter module to the rule module.
[0044] The target sensitive location lookup module is used to accurately search for each candidate location in the received message according to the message filtering rule set through the rule module, and to feed back the found target sensitive locations to the FPGA.
[0045] The target desensitization message output module is used to read sensitive fields from the received message according to the sensitive positions of each target through the FPGA, replace each sensitive field with the preset desensitization field, and then output the target desensitization message.
[0046] Thirdly, embodiments of the present invention also provide an electronic device, the electronic device comprising: at least one processor; and
[0047] A memory communicatively connected to the at least one processor; wherein,
[0048] The memory stores a computer program that can be executed by the at least one processor, which enables the at least one processor to perform the FPGA-based message de-identification method according to any embodiment of the present invention.
[0049] Fourthly, embodiments of the present invention also provide a computer-readable storage medium storing computer instructions, which are used to cause a processor to execute and implement the FPGA-based message desensitization method described in any embodiment of the present invention.
[0050] The technical solution of this invention provides a hardware-based message desensitization method. This method involves obtaining the hash value of the sensitive byte for each comparison byte length matched by the message filtering rule set, setting a Bloom filter based on the hash value, inputting the processed message into the Bloom filter module to determine the sensitive word location, accurately searching for the target sensitive location using the rule module, feeding back the found target sensitive location to the FPGA, and desensitizing the message using the FPGA. This method ultimately outputs the desensitized message. This approach minimizes the processing time during message desensitization and meets the timeliness requirements of message desensitization.
[0051] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description
[0052] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0053] Figure 1 This is a flowchart of a message desensitization method based on FPGA provided according to Embodiment 1 of the present invention;
[0054] Figure 2 This is a flowchart of a message desensitization method based on FPGA according to Embodiment 2 of the present invention;
[0055] Figure 3 This is a flowchart of a message desensitization method based on FPGA provided according to Embodiment 3 of the present invention;
[0056] Figure 4 This is a schematic diagram of a message desensitization device based on FPGA according to Embodiment 4 of the present invention;
[0057] Figure 5 This is a schematic diagram of the structure of an electronic device that implements the FPGA-based message desensitization method according to an embodiment of the present invention. Detailed Implementation
[0058] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0059] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0060] Example 1
[0061] Figure 1 This document provides a flowchart of a message de-identification method based on FPGA implementation, as described in Embodiment 1 of the present invention. This embodiment is applicable to message de-identification based on FPGA. The method can be executed by an FPGA-based message de-identification device, which can be implemented in hardware and / or software. This FPGA-based message de-identification device can be configured in a terminal or server with data processing capabilities, working in conjunction with the FPGA and rule modules to achieve hardware-based message de-identification. Figure 1 As shown, the method includes:
[0062] S110. Obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length.
[0063] The message filtering rule set includes multiple message filtering rules, each containing one or more sensitive words (also called filtering fields) that need to be filtered. Accordingly, the byte length occupied by the filtering field in each message filtering rule can be calculated.
[0064] The sensitive words may include: words containing user personal information, commercially sensitive information, and special sensitive words set according to user needs.
[0065] In a specific example, if message filtering rule 1 contains a filter field A consisting of 2 characters, then the length of filter field A is determined to be 1 byte. If message filtering rule 2 contains a filter field B consisting of 4 characters, then the length of filter field B is determined to be 2 bytes, and so on.
[0066] Accordingly, by summarizing the byte lengths occupied by the filter fields in each message filtering rule, the total optional byte lengths that all message filtering rules can occupy can be determined. For example, although filter fields XX and YY have different contents, they both contain 2 characters and therefore correspond to a byte length of 2 bytes. Furthermore, all the aforementioned optional byte lengths can be used as the comparison byte lengths. That is, each comparison byte length is the set of all byte lengths occupied by each sensitive word in the message filtering rule set.
[0067] After obtaining the length of each comparison byte, the length of the comparison byte to which each filter field belongs in the message filtering rule set can be determined accordingly. Then, based on one or more filter fields corresponding to each comparison byte length, the hash value of the sensitive byte corresponding to each comparison byte length can be calculated.
[0068] Among them, the sensitive byte hash value of a comparison byte length can be understood as the hash feature carried by the field content of all filter fields corresponding to that comparison byte length.
[0069] Hash operations are a method of creating a small numerical "fingerprint" from any type of data; further, hash operations compress messages or data into digests, reducing the data size and fixing the data format. The function scrambles and mixes the data to recreate a fingerprint called a hash value; further, a hash value is typically represented by a short string of random letters and numbers.
[0070] In an optional implementation of this embodiment, obtaining at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length, may include:
[0071] Obtain the filter fields corresponding to each message filtering rule and determine the comparison byte length of each filter field; generate the sensitive byte hash value corresponding to each comparison byte length based on the filter fields corresponding to each comparison byte length.
[0072] The message filtering rules are included in the message filtering rule set; furthermore, different message filtering rules correspond to different filtering fields; those skilled in the art will know that different filtering fields may correspond to the same byte length. In this case, the length of the same comparison byte can correspond to multiple filtering fields with different content and similar character counts. That is, the same byte length can correspond to multiple different filtering fields.
[0073] Furthermore, based on the filtering fields corresponding to each comparison byte length, a sensitive byte hash value corresponding to each comparison byte length is generated, which may include:
[0074] Obtain at least one target filter field corresponding to the current target comparison byte length; calculate the hash value corresponding to each target filter field; obtain the hash values of all target filter fields, perform a union operation, and use the union operation result as the sensitive byte hash value corresponding to the target comparison byte length.
[0075] In this optional implementation, the calculation method for the hash value of sensitive bytes is taken as an example, with a comparison byte length (the current target comparison byte length being processed).
[0076] This can be achieved by using existing hash operation rules to calculate a hash value of a preset length (e.g., 1Kbit or 1Mbit) corresponding to each target filtering field. In a specific application scenario of this embodiment, the message filtering rule set is configured with one filtering field corresponding to a length of one byte (comparison byte length), denoted as filtering field a; two filtering fields corresponding to a length of two bytes, defined as filtering field b and filtering field c; and n filtering fields corresponding to a length of three bytes, where n is an integer greater than 2.
[0077] Correspondingly, when the target comparison byte length is one byte, the hash value of the sensitive byte of the target comparison byte length is the hash value of filter field a; when the target comparison byte length is two bytes, the hash value of the sensitive byte of the target comparison byte length is the union of the hash values of filter field b and filter field c.
[0078] That is, after calculating the first hash value of filter field b and the second hash value of filter field c, the first hash value and the second hash value are bitwise ORed to obtain the union result.
[0079] Similarly, when the target comparison byte length is three bytes, the hash value of the sensitive byte of the target comparison byte length is the result of the union of the n filter fields obtained by performing a bitwise OR operation.
[0080] S120. In the FPGA, plan Bloom filter modules corresponding to the length of each comparison byte, and set the standard comparison hash address of each Bloom filter according to the hash value of the sensitive byte.
[0081] The FPGA, a type of semi-custom circuit within the category of application-specific integrated circuits (ASICs), is a programmable logic array that effectively addresses the issue of limited gate counts in traditional devices. The basic structure of an FPGA includes programmable input / output units, configurable logic blocks and a digital clock management module, embedded block RAM and routing resources, embedded dedicated hard cores, and low-level embedded functional units. Due to its abundant routing resources, reprogrammability, high integration density, and relatively low investment, FPGAs have been widely used in digital circuit design.
[0082] The Bloom filter comprises a binary vector and a series of random mapping functions; furthermore, the Bloom filter can be used to retrieve whether an element is in a set. That is, the Bloom filter is used to retrieve whether the hash value of the input content matches a pre-written standard comparison hash address within the Bloom filter.
[0083] The matching method can be either consistency matching or biased matching. It's understandable that generating a hash value for each filter field is equivalent to setting one or a few bits of a fixed-length sequence of all zeros to 1. Therefore, when a Bloom filter is calculated by taking the union of the hash values of multiple sensitive fields, it can be determined whether the input hash value matches the standard hash address by comparing the number of matching 1 bits. For example, if the number of matching 1 bits is greater than or equal to 2, then the two are considered to match.
[0084] In this embodiment, the logic circuits of the Bloom filter modules corresponding to the lengths of each comparison byte can be constructed first, and the configuration file corresponding to the logic circuit can be generated. Then, when initializing the FPGA to be used, the configuration file can be executed to plan the Bloom filter modules corresponding to the lengths of each comparison byte in the FPGA.
[0085] By using sensitive byte hash values, the standard comparison hash address of each Bloom filter can be set, and the filter fields that each Bloom filter can recognize can be set.
[0086] It is important to emphasize that in this embodiment, a creative approach is proposed to select Bloom filter modules with different comparison byte lengths for identification of filter fields with different character lengths appearing in the message filtering rule set. This can minimize the possibility of false matches that may occur with the Bloom filter, thereby effectively reducing the amount of computation required for accurate matching in subsequent software.
[0087] Meanwhile, considering that excessive Bloom filters would increase the hardware resource consumption of the FPGA, before planning the Bloom filter modules corresponding to each pairing byte length in the FPGA, the hardware characteristic parameters of the selected FPGA can be further analyzed to determine whether the above pairing byte lengths need to be further merged, for example, merging a 2-byte length into a 4-byte length.
[0088] S130. After the received message is sequentially shifted and truncated according to the length of each comparison byte by the FPGA, it is input into each Bloom filter module in parallel, and the alternative positions identified by each Bloom filter module are sent to the rule module.
[0089] The received messages are those that require de-identification. Therefore, it is necessary to first check whether the received messages contain pre-defined filter fields through the Bloom filter modules set in the FPGA.
[0090] Since each Bloom filter module corresponds to a set comparison byte length, such as 2 bytes, 3 bytes, or 4 bytes, the received message can be truncated using each comparison byte length bit as a truncation unit.
[0091] That is, for the 2-byte Bloom filter module A, each time a 2-byte segment of message content is extracted from the received message and input into the Bloom filter module A, the filter fields that match the 2 bytes are identified; for the 3-byte Bloom filter module B, each time a 3-byte segment of message content is extracted from the received message and input into the Bloom filter module B, the filter fields that match the 3 bytes are identified.
[0092] Meanwhile, when identifying filtering fields for Bloom filter modules with the same comparison byte length (e.g., 2 bytes), it is necessary to sequentially input any two adjacent bytes of the message content into the Bloom filter module for comprehensive identification. Therefore, a shift position can be set for each Bloom filter module in the received message. After updating the shift position of the received message each time, the truncated message of the comparison byte length is obtained from the shift position and input into the corresponding Bloom filter module.
[0093] Correspondingly, the process of sequentially shifting and truncating the received messages according to the length of each comparison byte through the FPGA and then inputting them in parallel to each Bloom filter module is actually a serial-parallel combined process. The truncated messages with different comparison byte lengths are input into different Bloom filter modules in parallel, while multiple truncated messages corresponding to the same comparison byte length are input into the same Bloom filter module serially.
[0094] The rule module can be a TCAM (ternary content addressable memory) module; furthermore, the TCAM can be used to read relevant fields from the message content, create search keywords, and return the longest matching result, etc.
[0095] The alternative position identified by a certain Bloom filter module can be: the offset position of the intercepted message in the received message when the Bloom filter confirms that a certain intercepted message matches the standard comparison hash address it sets.
[0096] S140. The rule module performs precise searches at each candidate location of the received message according to the message filtering rule set, and feeds back the found target sensitive location to the FPGA.
[0097] The precise search involves accurately matching the bytes at the candidate locations with the message filtering rule set according to each message filtering rule. Each message filtering rule is pre-stored in the rule module and is used to accurately compare whether the filtering fields in the message filtering rule set correspond precisely to the intercepted messages extracted from the candidate locations.
[0098] The target sensitive location is the location where a filter field (or sensitive word) is indeed found, as determined by the rule module. In this embodiment, the rule module filters according to each candidate location in the packet filtering rule set. If the intercepted packet corresponding to a certain candidate location A is precisely matched by the rule module and a filter field in the packet filtering rule set is found, then candidate location A is designated as the target sensitive location and fed back to the FPGA. If the intercepted packet corresponding to a certain candidate location B is precisely matched by the rule module and a filter field in the packet filtering rule set is found not to exist, then candidate location B is determined to be a misidentified location of the Bloom filter and can be directly discarded.
[0099] S150: The FPGA reads sensitive fields from the received message according to the sensitive locations of each target, replaces each sensitive field with a preset desensitized field, and then outputs the target desensitized message.
[0100] The preset desensitized field can be a pre-defined character or meaningless text field, etc.
[0101] In an optional implementation of this embodiment, reading the sensitive fields corresponding to each target sensitive location in parallel from the received message using an FPGA may include:
[0102] The FPGA reads the sensitive fields corresponding to each target sensitive location in the received message in parallel; after replacing each sensitive field in the received message with the preset desensitized field in parallel, the target desensitized message is output.
[0103] In this embodiment, in order to fully leverage the speed advantage of FPGA parallel computing, after the rule module feeds back all target sensitive locations, the FPGA can read the sensitive fields corresponding to each target sensitive location in parallel at once, and replace each sensitive field with a preset desensitized field in parallel at once, so as to meet the requirements of message desensitization timeliness.
[0104] The technical solution of this invention provides a hardware-based message desensitization method. This method involves obtaining the filter fields corresponding to the message filtering rules and the comparison byte length of each field, calculating the hash value of each target filter field, performing union processing to generate the corresponding sensitive byte hash value, setting a Bloom filter based on the sensitive byte hash value, inputting the processed message into the Bloom filter module to determine the sensitive word location, and using a rule module to accurately search for the target sensitive location. The found target sensitive location is then fed back to the FPGA, where the message is desensitized, and finally, the desensitized message is output. This method minimizes the processing time during message desensitization and meets the timeliness requirements of message desensitization.
[0105] Example 2
[0106] Figure 2 This is a flowchart of a message desensitization method based on FPGA provided in Embodiment 2 of the present invention. This embodiment is a refinement based on the above embodiment. In this embodiment, the operation of sequentially shifting and truncating the received message according to the length of each comparison byte through FPGA, and inputting it in parallel to each Bloom filter module, and sending the candidate positions identified by each Bloom filter module to the rule module is specified as follows: For each Bloom filter module, the following operations are performed in parallel through FPGA: obtain the current comparison byte length that matches the current Bloom filter module, and determine the first character position of the received message as the first shift position; starting from the shift position, obtain the truncated message of the current comparison byte length, and input the truncated message into the current Bloom filter module; after updating the shift position by 1, return to execute the operation of obtaining the truncated message of the current comparison byte length starting from the shift position, until the last character position of the received message is truncated.
[0107] Correspondingly, such as Figure 2 As shown, the method includes:
[0108] S210. Obtain at least one comparison byte length that matches the packet filtering rule set, and the sensitive byte hash value corresponding to each comparison byte length respectively.
[0109] S220. In the FPGA, plan the Bloom filter modules corresponding to each comparison byte length respectively, and set the standard comparison hash addresses of each Bloom filter according to the sensitive byte hash values respectively.
[0110] In the FPGA, for each Bloom filter module, execute S230 - S260 in parallel:
[0111] That is, if there are 3 Bloom filter modules, the FPGA needs to execute the above S230 - S260 in parallel 3 times with a parallelism of 3.
[0112] S230. Obtain the current comparison byte length that matches the current Bloom filter module, and determine the position of the first character of the received packet as the first shift position.
[0113] As described above, each Bloom filter module corresponds to a comparison byte length. Furthermore, when the current Bloom filter module is uniquely determined among all Bloom filter modules, the current comparison byte length (e.g., 2 bytes) that matches the current Bloom filter module can be uniquely determined.
[0114] In this embodiment, before using the current Bloom filter module to perform Bloom filtering on the intercepted packet of every 2 bytes in the received packet in a shifted manner, it is first necessary to determine the initial value of the shift position, and based on this initial value, perform shifted interception successively.
[0115] Specifically, the position where the first character of the received packet is located can be set as the first displacement position, that is, the starting position of the displacement is the first character of the packet.
[0116] S240. Starting from the shift position, obtain the intercepted packet of the current comparison byte length, input the intercepted packet into the current Bloom filter module, and send the shift position identified by the current Bloom filter module to the rule module.
[0117] Exemplarily, set the beginning of the packet as "A method for desensitizing packets based on FPGA implementation", then the starting point of the shift position is the position where the character '一' is located; assume the current comparison byte length is four bytes, then the first intercepted packet is "一种".
[0118] In this embodiment, if the current Bloom filter module outputs a hit message for the intercepted packet obtained for the shift position, the FPGA sends the above shift position to the rule module for accurate search.
[0119] S250, detect whether the last character of the received message has been captured in the intercepted message. If not, execute S260; otherwise, execute S270.
[0120] In this embodiment, it is necessary to sequentially extract segments from the received message that match the current comparison byte length in a loop and perform Bloom filtering until the comparison of all content in the received message is completed.
[0121] S260. After updating the shift position by 1, return to execute S240.
[0122] For example, if the message content is set to "a message desensitization method based on FPGA", and the current comparison byte length is four bytes, then the first message obtained when the shift starts is "a". After updating the shift position by incrementing by 1, the process returns to execute the intercepted message with the current comparison byte length as the starting point, which is "seed". This process is repeated for the third time, and the first message intercepted is "based on", etc., until the intercepted message is "method".
[0123] S270: The rule module performs precise searches at each candidate location of the received message according to the message filtering rule set, and feeds back the found target sensitive locations and the sensitivity types corresponding to each target sensitive location to the FPGA.
[0124] As mentioned earlier, the rules module performs a precise search for each received candidate location according to the message filtering rule set, and feeds back the found target sensitive location to the FPGA.
[0125] Specifically, the filtering fields included in the message filtering rule set can be categorized into different types of filtering field libraries, with each library corresponding to a different sensitivity type. Furthermore, after accurately searching each candidate location, the sensitivity type corresponding to each target sensitive location can be determined based on the filtering field library to which the filtering field found at each target sensitive location belongs.
[0126] This type of sensitivity can include: sensitive personal information, sensitive industry-specific terms, or sensitive events.
[0127] S280. The length of the comparison byte corresponding to each target sensitive position is obtained in parallel through the FPGA, and the sensitive field corresponding to each target sensitive position is read in parallel according to the length of the comparison byte corresponding to each target sensitive position.
[0128] For example, if the FPGA receives the target sensitive position sent by the rule module in response to the alternative sensitive positions sent by the 4-byte Bloom filter module, then it can obtain a 4-byte intercepted message as a sensitive field in the received message, starting from the target sensitive position.
[0129] S290. Using the FPGA, obtain the desensitized field corresponding to each sensitive field based on the sensitivity type corresponding to each target sensitive location.
[0130] The sensitive type is a system- or manually preset classification, including: the categorizable type of the sensitive field, such as: name, gender, or address; the de-identified field corresponding to each sensitive field is preset by the system or manually. Different sensitive fields can use the same de-identified field or different de-identified fields. This embodiment does not impose any restrictions on this.
[0131] S2100 uses FPGA to replace each sensitive field with its corresponding desensitized field in parallel, and outputs the target desensitized message.
[0132] The technical solution of this invention provides a hardware-based message desensitization method by obtaining the hash value of sensitive bytes for at least one comparison byte length matched by the message filtering rule set and the length of each comparison byte, setting a Bloom filter based on the hash value of the sensitive bytes, shifting and truncating the message starting from the shift position, inputting the truncated message into the set Bloom filter module to determine the position of sensitive words, and accurately searching for the target sensitive position through the rule module. The found target sensitive position and sensitive field type are fed back to the FPGA, and each sensitive field is replaced in parallel with a preset desensitized field, outputting a desensitized message. This method minimizes the processing time in the message desensitization process and meets the timeliness requirements of message desensitization.
[0133] Example 3
[0134] Figure 3 The flowchart of a message desensitization method based on FPGA implementation provided in Embodiment 3 of the present invention is as follows: Figure 3 As shown, the method includes:
[0135] S310. Obtain the filter fields corresponding to each message filtering rule, and determine the comparison byte length of each filter field.
[0136] S320. Obtain at least one target filter field corresponding to the length of the target comparison byte currently being processed, and calculate the hash value corresponding to each target filter field.
[0137] S330. Obtain the hash values of all target filtering fields, perform a union operation, and use the result of the union operation as the hash value of the sensitive byte corresponding to the target comparison byte length.
[0138] S340. In the FPGA, plan Bloom filter modules corresponding to the length of each comparison byte, and set the standard comparison hash address of each Bloom filter according to the hash value of the sensitive byte.
[0139] S350-S380 are executed in parallel via FPGA for each Bloom filter module:
[0140] S350. Obtain the length of the current comparison byte that matches the current Bloom filter module, and determine the position of the first character of the received message as the first shift position.
[0141] S360. Starting from the shift position, obtain the intercepted message of the current comparison byte length and input the intercepted message into the current Bloom filter module.
[0142] S370. Detect whether the last character of the received message has been captured in the intercepted message. If not, execute S380; otherwise, execute S390.
[0143] S380: After updating the shift position by 1, return to execute S360.
[0144] S390: The rule module performs precise searches at each candidate location of the received message according to the message filtering rule set, and feeds back the found target sensitive locations and the sensitivity types corresponding to each target sensitive location to the FPGA.
[0145] S3100: The FPGA obtains the length of the comparison byte corresponding to each target sensitive position in parallel, and reads the sensitive field corresponding to each target sensitive position in parallel according to the length of the comparison byte corresponding to each target sensitive position.
[0146] S3110: Using the FPGA, obtain the desensitized field corresponding to each sensitive field based on the sensitivity type corresponding to each target sensitive location.
[0147] S3120: The FPGA replaces each sensitive field with its corresponding desensitized field in parallel, and outputs the target desensitized message.
[0148] The technical solution of this invention provides a hardware-based message desensitization method by obtaining the filtering fields corresponding to the message filtering rules and the comparison byte length of each field, calculating the hash value of each target filtering field, and performing union processing to generate the corresponding sensitive byte hash value. A Bloom filter is set based on the sensitive byte hash value, and the message is shifted and truncated starting from the shift position. The truncated message is input to the set Bloom filter module to determine the sensitive word position and accurately search for the target sensitive position through the rule module. The found target sensitive position and sensitive field type are fed back to the FPGA, and each sensitive field is replaced in parallel with a preset desensitization field, outputting the desensitized message. This method minimizes the processing time during message desensitization and meets the timeliness requirements of message desensitization.
[0149] Example 4
[0150] Figure 4 This invention provides a message desensitization device based on FPGA implementation, as shown in Embodiment 4 of the present invention. Figure 4 As shown, the device includes:
[0151] Sensitive word hash value determination module 410 is used to obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length;
[0152] The Bloom filter setting module 420 is used in the FPGA to plan the Bloom filter modules corresponding to the length of each comparison byte, and to set the standard comparison hash address of each Bloom filter according to the hash value of the sensitive byte.
[0153] The message transmission module 430 is used to sequentially shift and truncate the received message according to the length of each comparison byte through the FPGA, and then input it in parallel to each Bloom filter module, and send the candidate positions identified by each Bloom filter module to the rule module.
[0154] The target sensitive location lookup module 440 is used to perform precise lookup in each candidate location of the received message according to the message filtering rule set through the rule module, and to feed back the found target sensitive location to the FPGA.
[0155] The target desensitization message output module 450 is used to read sensitive fields in the received message according to the sensitive positions of each target through the FPGA, replace each sensitive field with the preset desensitization field, and then output the target desensitization message.
[0156] The technical solution of this invention obtains the hash value of sensitive bytes for at least one comparison byte length and each comparison byte length matched by the message filtering rule set, sets a Bloom filter based on the hash value of the sensitive bytes, inputs the processed message into the Bloom filter module to determine the location of sensitive words, and uses the rule module to accurately search for the target sensitive location. The found target sensitive location is fed back to the FPGA, and the FPGA desensitizes the message, finally outputting the target desensitized message. This solves the problem of poor accuracy in existing message desensitization technologies and achieves the beneficial effect of improving the accuracy of message desensitization.
[0157] Based on the above embodiments, the sensitive word hash value determination module 410 is further configured to:
[0158] Obtain the filtering fields corresponding to each message filtering rule, and determine the length of the comparison bytes for each filtering field;
[0159] Based on the filtering fields corresponding to the length of each compared byte, generate a sensitive byte hash value corresponding to the length of each compared byte;
[0160] Obtain at least one target filtering field corresponding to the byte length of the target being processed;
[0161] Calculate the hash value corresponding to each target filter field;
[0162] Obtain the hash values of all target filtering fields, perform a union operation, and use the result of the union operation as the hash value of the sensitive byte corresponding to the target comparison byte length.
[0163] Based on the above embodiments, the message transmission module 430 further includes:
[0164] The shift position determination unit is used to obtain the current comparison byte length that matches the current Bloom filter module, and to determine the position of the first character of the received message as the first shift position;
[0165] The message interception unit is used to obtain the intercepted message with the current comparison byte length starting from the shift position, and input the intercepted message into the current Bloom filter module;
[0166] The shift position update unit is used to update the shift position by incrementing it by 1, and then return to execute the operation of obtaining the length of the currently compared bytes starting from the shift position, until the last character of the received message is captured.
[0167] Based on the above embodiments, the target de-identification message output module 450 includes:
[0168] The precise search unit is used to: perform precise searches in each candidate position of the received message based on the message filtering rule set by the rule module;
[0169] The feedback unit is used to feed back the found sensitive locations of each target and the sensitivity type corresponding to each sensitive location to the FPGA through the rule module;
[0170] The byte length acquisition unit is used to: acquire the comparison byte length corresponding to each target sensitive position in parallel via the FPGA;
[0171] The sensitive field reading unit is used to read the sensitive fields corresponding to each target sensitive position in parallel, according to the comparison byte length corresponding to each target sensitive position.
[0172] The sensitive field replacement unit is used to replace each sensitive field with a corresponding desensitized field in parallel via the FPGA.
[0173] The FPGA-based message desensitization device provided in this embodiment of the invention can execute the FPGA-based message desensitization method provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
[0174] Example 5
[0175] Figure 5 A schematic diagram of an electronic device 410 that can be used to implement embodiments of the present invention is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.
[0176] like Figure 5As shown, the electronic device 410 includes at least one processor 420 and a memory, such as a read-only memory (ROM) 430 or a random access memory (RAM) 440, communicatively connected to the at least one processor 420. The memory stores computer programs executable by the at least one processor. The processor 420 can perform various appropriate actions and processes based on the computer program stored in the ROM 430 or loaded into the RAM 440 from storage unit 490. The RAM 440 may also store various programs and data required for the operation of the electronic device 410. The processor 420, ROM 430, and RAM 440 are interconnected via a bus 450. An input / output (I / O) interface 460 is also connected to the bus 450.
[0177] Multiple components in electronic device 410 are connected to I / O interface 460, including: input unit 470, such as keyboard, mouse, etc.; output unit 480, such as various types of monitors, speakers, etc.; storage unit 490, such as disk, optical disk, etc.; and communication unit 4100, such as network card, modem, wireless transceiver, etc. Communication unit 4100 allows electronic device 410 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0178] Processor 420 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 420 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 420 performs the various methods and processes described above, such as message desensitization methods implemented on an FPGA.
[0179] The method includes:
[0180] Obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length;
[0181] In the FPGA, a Bloom filter module corresponding to the length of each comparison byte is planned, and the standard comparison hash address of each Bloom filter is set according to the hash value of the sensitive byte.
[0182] The received message is sequentially shifted and truncated according to the length of each comparison byte by the FPGA, and then input into each Bloom filter module in parallel. The candidate positions identified by each Bloom filter module are sent to the rule module.
[0183] The rules module performs precise searches at each candidate location in the received message based on the message filtering rule set, and feeds back the found target sensitive locations to the FPGA.
[0184] The FPGA reads sensitive fields from the received message based on the sensitive locations of each target, replaces each sensitive field with a preset desensitized field, and then outputs the target desensitized message.
[0185] In some embodiments, the message desensing method based on a Field-Programmable Gate Array (FPGA) can be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 490. In some embodiments, part or all of the computer program can be loaded into and / or installed on the electronic device 410 via ROM 430 and / or communication unit 4100. When the computer program is loaded into RAM 440 and executed by processor 420, one or more steps of the FPGA-based message desensing method described above can be performed. Alternatively, in other embodiments, processor 420 can be configured to execute the FPGA-based message desensing method by any other suitable means (e.g., by means of firmware).
[0186] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0187] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0188] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0189] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0190] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.
[0191] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.
[0192] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.
[0193] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.
Claims
1. A message desensitization method based on a Field-Programmable Gate Array (FPGA), characterized in that, include: Obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length; wherein, the message filtering rule set includes multiple message filtering rules, and each message filtering rule contains one or more filtering fields; In the FPGA, a Bloom filter module corresponding to the length of each comparison byte is planned, and the standard comparison hash address of each Bloom filter is set according to the hash value of the sensitive byte. The received message is sequentially shifted and truncated according to the length of each comparison byte by the FPGA, and then input into each Bloom filter module in parallel. The alternative positions identified by each Bloom filter module are sent to the rule module. The rules module performs precise searches at each candidate location in the received message based on the message filtering rule set, and feeds back the found target sensitive locations to the FPGA. The FPGA reads sensitive fields from the received message based on the sensitive locations of each target, replaces each sensitive field with a preset desensitized field, and then outputs the target desensitized message.
2. The method according to claim 1, characterized in that, Obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length, including: Obtain the filtering fields corresponding to each message filtering rule, and determine the length of the comparison bytes for each filtering field; Based on the filtering fields corresponding to the length of each compared byte, generate a sensitive byte hash value corresponding to the length of each compared byte.
3. The method according to claim 2, characterized in that, Based on the filtering fields corresponding to each comparison byte length, a sensitive byte hash value corresponding to each comparison byte length is generated, including: Obtain at least one target filtering field corresponding to the byte length of the target being processed; Calculate the hash value corresponding to each target filter field; Obtain the hash values of all target filtering fields, perform a union operation, and use the result of the union operation as the hash value of the sensitive byte corresponding to the target comparison byte length.
4. The method according to claim 1, characterized in that, The received message is sequentially shifted and truncated according to the length of each comparison byte using the FPGA, and then input in parallel to each Bloom filter module, including: The following operations are performed in parallel using the FPGA for each Bloom filter module: Get the current comparison byte length that matches the current Bloom filter module, and determine the position of the first character of the received message as the first shift position; Starting from the shift position, obtain the truncated message of the current comparison byte length, and input the truncated message into the current Bloom filter module; After updating the shift position by incrementing by 1, return to execute the operation of extracting the message from the shift position to obtain the current length of the compared bytes, until the last character of the received message is extracted.
5. The method according to any one of claims 1-4, characterized in that, The FPGA reads sensitive fields from the received message based on the sensitive locations of each target, replaces each sensitive field with a preset de-identified field, and then outputs a target de-identified message, including: The FPGA is used to read the sensitive fields corresponding to each target sensitive location in parallel from the received message; After replacing each sensitive field in the received message with a preset desensitized field in parallel using an FPGA, the target desensitized message is output.
6. The method according to claim 5, characterized in that, The FPGA reads the sensitive fields corresponding to each target sensitive location in parallel from the received message, including: The length of the comparison bytes corresponding to each target sensitive location is obtained in parallel using the FPGA; Based on the comparison byte length corresponding to each target sensitive position, the sensitive fields corresponding to each target sensitive position are read in parallel.
7. The method according to claim 5, characterized in that, The rules module feeds back the identified sensitive locations to the FPGA, including: The rules module feeds back the found sensitive locations of each target and the sensitivity type corresponding to each sensitive location to the FPGA. The FPGA is used to replace each sensitive field in the received message with a preset de-identified field in parallel, including: The FPGA obtains the desensitized field corresponding to each sensitive field based on the sensitivity type corresponding to each sensitive location of the target. The sensitive fields are replaced in parallel using an FPGA with their corresponding desensitized fields.
8. A message desensitization device based on a Field Programmable Gate Array (FPGA), characterized in that, include: The sensitive word hash value determination module is used to obtain at least one matching byte length that matches the message filtering rule set, and the sensitive byte hash value corresponding to each matching byte length; wherein, the message filtering rule set includes multiple message filtering rules, and each message filtering rule contains one or more filtering fields; The Bloom filter setting module is used in the FPGA to plan the Bloom filter modules corresponding to the length of each comparison byte, and to set the standard comparison hash address of each Bloom filter according to the hash value of the sensitive byte. The message transmission module is used to sequentially shift and truncate the received message according to the length of each comparison byte through the FPGA, and then input it in parallel to each Bloom filter module, and send the candidate positions identified by each Bloom filter module to the rule module. The target sensitive location lookup module is used to accurately search for each candidate location in the received message according to the message filtering rule set through the rule module, and to feed back the found target sensitive locations to the FPGA. The target desensitization message output module is used to read sensitive fields from the received message according to the sensitive positions of each target through the FPGA, replace each sensitive field with the preset desensitization field, and then output the target desensitization message.
9. An electronic device, characterized in that, The electronic device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the message desensitization method based on FPGA as described in any one of claims 1-7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that are used to cause a processor to execute the FPGA-based message desensitization method according to any one of claims 1-7.