Network attack detection method, device and equipment based on FPGA
By compiling regular expressions into a three-stage pipeline and deploying them on different memories of an FPGA, efficient network attack detection is achieved, solving the problem of low detection efficiency in existing technologies and meeting the requirements for high performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NSFOCUS TECH
- Filing Date
- 2026-04-15
- Publication Date
- 2026-06-19
AI Technical Summary
Existing network attack detection methods cannot meet the requirements for high performance and have low detection efficiency, especially under large-scale network traffic, where it is difficult to complete the matching of a large number of rules at the millisecond level.
Regular expressions are compiled into a preset hash set, a first nondeterministic finite automaton set, and a second nondeterministic finite automaton set, respectively. A three-stage pipeline is constructed for matching. The preset hash set is used to quickly filter out invalid pattern strings. The first nondeterministic finite automaton set is deployed on-chip memory as a fast path to process high-frequency pattern strings, and the second nondeterministic finite automaton set is deployed on external memory as a slow path to process complex pattern strings.
By combining fast and slow path matching and step-by-step filtering, the processing overhead and resource consumption of pattern string and regular expression matching on the FPGA are significantly reduced, the efficiency of network attack detection is improved, and the high-performance requirements are met.
Smart Images

Figure CN122247733A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of network security technology, and in particular to FPGA-based network attack detection methods, devices, and equipment. Background Technology
[0002] Network attack detection relies heavily on rule-based detection engines to identify known attack patterns and anomalous behaviors, with feature string matching being one of the core methods. As network traffic volume and complexity continue to grow, especially in scenarios with tens or hundreds of gigabytes of traffic, network attack detection needs to complete the matching of a large number of rules at the millisecond or even microsecond level. This places extremely high demands on the real-time processing capabilities of rule-based detection engines.
[0003] In related technologies, regular expressions are widely used to define flexible and complex network attack characteristics in the description of network attack rules. Although regular expressions have powerful pattern expression capabilities, their performance overhead often becomes a system bottleneck when performing large-scale, high-frequency matching. For example, in network attack detection scenarios, each network data stream may need to be matched with hundreds or thousands of regular expression rules in turn. During the matching process, the performance of the system detection engine will be a huge bottleneck, resulting in low detection efficiency.
[0004] Therefore, network attack detection methods in related technologies cannot meet the requirements for high performance and have low detection efficiency. Summary of the Invention
[0005] The purpose of this application is to provide a network attack detection method, apparatus, and device based on FPGA, which solves the problem that network attack detection methods in related technologies cannot meet high performance requirements and have low detection efficiency.
[0006] This application provides a network attack detection method based on FPGA, applied to a field-programmable gate array (FPGA), the method comprising: Receive network requests and extract the pattern string to be matched from the network requests; The pattern string to be matched is matched with a preset hash set; the preset hash set includes the hash values of multiple determining substrings, which are used to characterize the key features of the regular expression; If it is determined that the pattern string to be matched fails to match the preset hash set, then the pattern string to be matched is matched with the first nondeterministic finite automaton set to obtain the first matching result; the first nondeterministic finite automaton set is compiled from the first regular expression rule set, and the regular expressions in the first regular expression rule set do not contain determinant substrings. If it is determined that the pattern string to be matched successfully matches the preset hash set, then the pattern string to be matched is matched with the second nondeterministic finite automaton set to obtain the second matching result; the second nondeterministic finite automaton set is compiled from the second regular expression rule set, and the first regular expression rule set is a subset of the second regular expression rule set; If the first matching result indicates that the pattern string to be matched is successfully matched with the first set of nondeterministic finite automata, then the pattern string to be matched is matched with the second set of nondeterministic finite automata to obtain a third matching result. Based on the first matching result, the second matching result, or the third matching result, determine the network attack detection result corresponding to the network request.
[0007] In this embodiment, regular expressions are compiled into three sets: a preset hash set, a first nondeterministic finite automaton set, and a second nondeterministic finite automaton set. A three-stage pipeline of "deterministic substring pre-filtering - first nondeterministic finite automaton set matching - second nondeterministic finite automaton set matching" is constructed for regular expression matching. The deterministic substring pre-filtering stage uses the preset hash set to quickly filter out most invalid pattern strings that do not contain deterministic substrings. The first nondeterministic finite automaton set is deployed on-chip memory as a fast path to handle frequently occurring pattern strings or simple pattern strings. The second nondeterministic finite automaton set is deployed on external memory as a slow path to handle complex pattern strings. This embodiment, through fast and slow path collaborative matching and step-by-step filtering, enables most network requests to be matched and filtered via the fast path, significantly reducing the processing overhead and resource consumption of pattern string and regular expression matching on the FPGA, meeting the high-performance requirements of the rule detection engine, and improving the detection efficiency of network attacks.
[0008] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description
[0009] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments of this application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0010] Figure 1 This is a schematic diagram illustrating an application scenario according to an embodiment of this application; Figure 2This is a schematic diagram of the overall process of an FPGA-based network attack detection method according to an embodiment of this application; Figure 3 This is a schematic diagram illustrating the classification and compilation process of regular expressions according to an embodiment of this application; Figure 4 This is a schematic diagram of the process of obtaining a preset hash set using the Bloom filtering algorithm according to an embodiment of this application; Figure 5 This is a flowchart illustrating step 202 according to an embodiment of the present application; Figure 6 This is a schematic diagram illustrating the matching of a pattern string to be matched with a preset hash set according to an embodiment of this application; Figure 7 This is a schematic diagram of a process for matching a pattern string to be matched with a first nondeterministic finite automaton set according to an embodiment of this application; Figure 8 This is a schematic diagram of the overall matching process of a nondeterministic finite automaton according to an embodiment of this application; Figure 9 This is a flowchart illustrating step 702 according to an embodiment of the present application; Figure 10 This is a schematic diagram illustrating the process of generating a finite span successor state set according to an embodiment of this application; Figure 11 This is a flowchart illustrating step 703 according to an embodiment of the present application; Figure 12 This is a schematic diagram of the state transition diagram of a nondeterministic finite automaton according to an embodiment of this application; Figure 13 This is a schematic diagram of an FPGA-based network attack detection device according to an embodiment of this application; Figure 14 This is a schematic diagram of an electronic device according to an embodiment of the present application. Detailed Implementation
[0011] To enable those skilled in the art to better understand the technical solutions of this application, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings.
[0012] It should be noted that the terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0013] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings.
[0014] The following explanations of some terms used in the embodiments of this application are provided to facilitate understanding by those skilled in the art.
[0015] (1) FPGA: Field Programmable Gate Array.
[0016] (2) DFA: Deterministic Finite State Machine.
[0017] (3) NFA: Nondeterministic finite state machine.
[0018] (4) PCRE (Perl Compatible Regular Expressions) is a regular expression library written in C. It implements most of the functions of regular expressions in Perl, so PCRE can adapt well to many situations that support Perl syntax. PCRE is widely used in various open-source software, such as the Apache HTTP Server and the PHP scripting language. PCRE (Perl Compatible Regular Expressions) is a library that supports Perl regular expression syntax. It implements two regular expression engines: NFA (Nondeterministic Finite Automaton) and DFA (Deterministic Finite Automaton). In practical use, PCRE's NFA mode is the most common choice due to its flexibility and powerful matching ability. However, if performance issues are encountered, especially when dealing with regular expressions that may cause a lot of backtracking, the DFA mode can be considered.
[0019] (5) Hyperscan is a high-performance regular expression library developed by Intel, specifically designed to run on the x86 platform and supporting Perl Compatible Regular Expression (PCRE) syntax. It can match multiple sets of regular expressions simultaneously and supports streaming operations. As open-source software, Hyperscan is released under the BSD license, providing flexible CAPI and multiple operating modes to adapt to different network scenarios.
[0020] Hyperscan's workflow is mainly divided into two parts: compile time and runtime. During compile time, Hyperscan's compiler accepts regular expressions as input and, based on the characteristics of the Intel architecture platform, user-defined patterns, and pattern characteristics, generates a corresponding pattern database through a complex graph analysis and optimization process. This database can be serialized and stored in memory for later use.
[0021] At runtime, Hyperscan uses a runtime environment developed in C, which requires pre-allocating a "draft" space for temporary information during scanning. Then, the compiled database calls Hyperscan's scanning API to trigger internal matching engines (such as NFA or DFA) to perform matching. Hyperscan utilizes SIMD instructions provided by Intel processors to accelerate these engines and passes the matching results to the user application for processing via user-provided callback functions.
[0022] Network attack detection relies heavily on rule-based detection engines to identify known attack patterns and anomalous behaviors, with feature string matching being one of the core methods. As network traffic volume and complexity continue to grow, especially in scenarios with tens or hundreds of gigabytes of traffic, network attack detection needs to complete the matching of a large number of rules at the millisecond or even microsecond level. This places extremely high demands on the real-time processing capabilities of rule-based detection engines.
[0023] In related technologies, regular expressions are widely used to define flexible and complex network attack characteristics in the description of network attack rules. Although regular expressions have powerful pattern expression capabilities, their performance overhead often becomes a system bottleneck when performing large-scale, high-frequency matching. For example, in network attack detection scenarios, each network data stream may need to be matched with hundreds or thousands of regular expression rules in turn. During the matching process, the performance of the system detection engine will be a huge bottleneck, resulting in low detection efficiency.
[0024] Therefore, network attack detection methods in related technologies cannot meet the requirements for high performance and have low detection efficiency.
[0025] In view of this, embodiments of this application provide a network attack detection method, apparatus, and device based on FPGA, which solves the problem that network attack detection methods in related technologies cannot meet high performance requirements and have low detection efficiency.
[0026] First, regular expressions are compiled into three sets: a preset hash set, a first nondeterministic finite automaton set, and a second nondeterministic finite automaton set. A three-stage pipeline of "deterministic substring pre-filtering - first nondeterministic finite automaton set matching - second nondeterministic finite automaton set matching" is constructed for matching regular expressions. Then, the preset hash set is used to quickly filter out most invalid pattern strings that do not contain deterministic substrings. The first nondeterministic finite automaton set is deployed on-chip memory as a fast path to handle frequently occurring pattern strings or simple pattern strings. The second nondeterministic finite automaton set is deployed on external memory as a slow path to handle complex pattern strings.
[0027] This application embodiment achieves the goal of fast and slow path collaborative matching and step-by-step filtering, enabling most network requests to be matched and filtered via fast paths. This significantly reduces the processing overhead and resource consumption of pattern string and regular expression matching on the FPGA, meets the high-performance requirements of the rule detection engine, and improves the detection efficiency of network attacks.
[0028] After introducing the design concept of the embodiments of this application, the following is a brief introduction to the application scenarios to which the technical solutions of the embodiments of this application can be applied. It should be noted that the application scenarios described below are only for illustrating the embodiments of this application and are not intended to limit the scope. In specific implementation, the technical solutions provided by the embodiments of this application can be flexibly applied according to actual needs.
[0029] Figure 1 This is a schematic diagram of an application scenario according to an embodiment of this application.
[0030] like Figure 1 As shown, this application environment may include, for example, a storage system 10, a server 20, and an electronic device 30. The electronic device 30 can be any suitable electronic device used for network access, including but not limited to computers, laptops, or other types of devices. The storage system 10 is capable of storing accessed data resources, such as traffic packets, network requests, etc.
[0031] It should be noted that, Figure 1 The electronic devices 30 shown can also communicate with each other via network 40 (e.g., between 30_1 and 30_2 or 30_N). Network 40 can be a network for information transmission in a broad sense, and may include one or more communication networks, such as wireless communication networks, the Internet, private area networks, local area networks, metropolitan area networks, wide area networks, or cellular data networks.
[0032] The description in this application details only a single server or electronic device. However, those skilled in the art should understand that the illustrated single server 20, electronic device 30, and storage system 10 or 11 are intended to illustrate that the technical solutions of this application involve the operation of electronic devices, servers, and storage systems. The detailed description of a single electronic device, server, and storage system is at least for ease of explanation and does not imply any limitation on the number, type, or location of the electronic devices and servers. It should be noted that adding additional modules to or removing individual modules from the illustrated environment does not change the underlying concept of the exemplary embodiments of this application. Furthermore, although detailed descriptions are provided for ease of explanation... Figure 1 The diagram shows a bidirectional arrow from the storage system to the server, but those skilled in the art will understand that the sending and receiving of the aforementioned data can also be achieved through the network 40.
[0033] The storage system 10 can be, for example, a cache system, hard disk storage, memory storage, etc. Of course, the methods provided in this application embodiment are not limited to... Figure 1 The application scenarios shown can also be used in other possible application scenarios, and the embodiments of this application do not impose limitations. Figure 1 The functions that each device in the application scenario shown can achieve will be described in subsequent method embodiments, and will not be elaborated on here.
[0034] Server 20 can be a single server, a server cluster consisting of several servers, or a cloud computing center. Server 20 can be a standalone physical server, a server cluster consisting of multiple physical servers, or a distributed system. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, and domain name services. To further illustrate the technical solutions provided in the embodiments of this application, a detailed description is provided below in conjunction with the accompanying drawings and specific implementation methods. Although the embodiments of this application provide method operation steps as shown in the following embodiments or drawings, the method may include more or fewer operation steps based on conventional or non-inventive methods. In steps where there is no logically necessary causal relationship, the execution order of these steps is not limited to the execution order provided in the embodiments of this application.
[0035] refer to Figure 2 A schematic diagram of the overall process of a network attack detection method based on FPGA provided in this application embodiment includes the following steps: In step 201, a network request is received and the pattern string to be matched is extracted from the network request.
[0036] In step 202, the pattern string to be matched is matched with a preset hash set.
[0037] In step 203, if it is determined that the pattern string to be matched fails to match the preset hash set, the pattern string to be matched is matched with the first nondeterministic finite automaton set to obtain the first matching result.
[0038] In step 204, if it is determined that the pattern string to be matched successfully matches the preset hash set, the pattern string to be matched is matched with the second nondeterministic finite automaton set to obtain the second matching result.
[0039] In step 205, if the first matching result indicates that the pattern string to be matched is successfully matched with the first set of nondeterministic finite automata, then the pattern string to be matched is matched with the second set of nondeterministic finite automata to obtain the third matching result.
[0040] In step 206, the network attack detection result corresponding to the network request is determined based on the first matching result, the second matching result, or the third matching result.
[0041] Before matching the pattern string, this embodiment divides all existing regular expressions into three sets: a preset hash set, a first regular expression rule set, and a second regular expression rule set.
[0042] The preset hash set contains a decision substring in the regular expression. This decision substring is used to characterize the key features of the regular expression. The preset hash set includes the hash values of multiple decision substrings.
[0043] For example, the regular expression is Digest\x20[^\x0d\x0a] nc=\w{20}; Hyperscan is used to extract the Digest as the determining substring. If the determining substring does not match, the full regular expression will not match either.
[0044] The regular expressions in the first set of regular expressions do not contain deterministic substrings. Compiling the first set of regular expressions yields the first set of nondeterministic finite automata.
[0045] The second regular expression rule set contains all regular expressions. The first regular expression rule set is a subset of the second regular expression rule set. Compiling the second regular expression rule set yields the second nondeterministic finite automaton set.
[0046] It should be added that the classification and compilation process of regular expressions are as follows: Figure 3As shown, the preset hash set and the first nondeterministic finite automaton set are deployed on the on-chip memory of the FPGA, and the second nondeterministic finite automaton set is deployed on the external memory of the FPGA, such as DDR; the first nondeterministic finite automaton set includes one nondeterministic finite automaton; the second nondeterministic finite automaton set includes multiple nondeterministic finite automata, and each nondeterministic finite automaton in the second nondeterministic finite automaton set corresponds one-to-one with each regular expression in the second regular expression rule set.
[0047] In one possible implementation, the embodiments of this application employ a Bloom filter algorithm to obtain a preset hash set, the process of which is as follows: Figure 4 As shown, it includes the following steps: In step 401, an m-bit initial bit set is created, and the value of each position in the initial bit set is set to the initial value.
[0048] In step 402, for each decision substring, k preset hash functions are used to hash the decision substring, and the initial bit set is updated according to the obtained k hash values to obtain the hash result corresponding to the decision substring.
[0049] In step 403, the hash results corresponding to multiple decision substrings are summarized to obtain a preset hash set.
[0050] For example, to create an m-bit initial bit set BitSet, first initialize all bits to 0, select k different hash functions, and hash each decision substring in the decision substring set k times. For each hash result, update the corresponding position in the initial bit set BitSet to 1 to obtain the preset hash set.
[0051] In one possible implementation, step 202 involves matching the pattern string to be matched against a preset hash set, the process of which is as follows: Figure 5 As shown, it can be implemented as follows: In step 501, k preset hash functions are used to hash the pattern string to be matched, resulting in k hash values to be matched.
[0052] In step 502, if it is determined that the values at the k positions corresponding to the k hash values to be matched in the preset hash set are not the initial values, then it is determined that the pattern string to be matched matches the preset hash set successfully.
[0053] In step 503, if it is determined that the value of any one of the k positions corresponding to the k hash values to be matched in the preset hash set is the initial value, then it is determined that the pattern string to be matched fails to match the preset hash set.
[0054] For example, a diagram illustrating the matching of the pattern string to be matched against a preset hash set is shown below. Figure 6As shown, the decision substring set is {x, y, z}, and the pattern string to be matched is w. Each decision substring in the decision substring set is hashed k times, resulting in a preset hash set of 010111000001010010. The pattern string to be matched w is hashed k times, resulting in k hash values to be matched of 000010000000010100. The second value from the right in the preset hash set is 0, and the second value in the k positions corresponding to the k hash values to be matched is 1. Therefore, it is determined that the pattern string to be matched fails to match the preset hash set, that is, the pattern string to be matched does not contain the decision substring.
[0055] In one possible implementation, in step 203, the pattern string to be matched is matched with a first set of nondeterministic finite automata, the process of which is as follows: Figure 7 As shown, it includes the following steps: For the nondeterministic finite automata in the first set of nondeterministic finite automata, perform the following steps: In step 701, each character of the pattern string to be matched is traversed, the character is input into a nondeterministic finite automaton, and state transitions are performed along multiple state transition paths in the state transition diagram to determine the reachable state set corresponding to the character.
[0056] In step 702, based on the initial active state set, state transitions with a preset span are performed along multiple state transition paths to obtain a finite span successor state set; the initial active state set is a set of initial states of a nondeterministic finite automaton before state transitions along multiple state transition paths.
[0057] In step 703, based on the initial active state set, additional span state transitions are performed along multiple state transition paths to obtain an additional span successor state set.
[0058] In step 704, the finite span successor state set and the extra span successor state set are bitwise ORed to obtain the successor state set.
[0059] In step 705, the reachable state set corresponding to the character is bitwise ANDed with the successor state set to obtain the updated active state set.
[0060] In step 706, the updated active state set is ANDed with the preset accept mask to obtain the accept state set.
[0061] In step 707, if it is determined that the received state set contains a hit state, then it is determined that the character matches the regular expression corresponding to the nondeterministic finite automaton successfully; the hit state is determined during the process of compiling the regular expression into a nondeterministic finite automaton.
[0062] In step 708, if it is determined that the received state set does not contain a hit received state, the updated active state set is used as the initial active state set, and the step of traversing each character of the pattern string to be matched is returned until the character successfully matches the regular expression corresponding to the nondeterministic finite automaton, or the traversal of the pattern string to be matched is completed.
[0063] In this embodiment, the matching process between the pattern string to be matched and one nondeterministic finite automaton set in the first nondeterministic finite automaton set or each nondeterministic finite automaton set in the second nondeterministic finite automaton set employs the bitmap-based LiMEX matching algorithm. This algorithm involves several important data structures, as follows: A. Activation state set: A 0 in each bit represents that the state is not active, and a 1 represents that the state is active. Activation indicates that the node is matched, and we need to look at the subsequent nodes of this node; A state set supports 512 bits, meaning it can support 512 states active simultaneously. For example, ......1001001 means that states 0, 3, and 6 are active.
[0064] B. Shift mask (spanning transition mask): Construct three shift masks, shift0, shift1, and shift2, to represent the subsequent states with spans of 0, 1, and 2, respectively.
[0065] C, limit successor state set (finite span successor state set): After activating the state set and using the shift mask, we obtain the successor state to the limit, as shown in the example below: First, prepare the mask: Based on NFA, the transition masks for 0-2 are compiled, including: shift0: The shift mask with a span of 0; shift1: A shift mask with a span of 1; shift2: The shift mask for span 2; Then, perform a bitwise AND operation on the current active state set distribution and the three transition masks to obtain the next active state with a span of 0-2: tmp0 = current active state & shift0; tmp1 = current active state & shift1; tmp2 = current active state & shift2; Shift tmp1 left by one bit, shift tmp2 left by two bits, and then perform an OR operation with tmp0 to obtain the successor state set of limit.
[0066] D. Exception mask (extra span mask): This mask indicates which states have additional span state sets, such as the abnormal state sets in states 1 and 3, such as ... 01010, with a length of 512.
[0067] E. Exception state set (additional span state set or abnormal state set): The definition of the exception state set is the state other than shift. For example, if the shift mask defines positive offsets 0, 1, and 2, then the negative offset and the positive offset greater than 2 will each have an exception state set. For example, each state with an abnormal state has a corresponding exception state set. For example, in ... 01010, states 1 and 3 each have a 512-bit exception state set.
[0068] F. Exception successor state set (extra span successor state set or abnormal successor state set): Activate the state set and exception mask to obtain the exception state set that needs to be activated; Multiple exception state sets that need to take effect are ORed to obtain the subsequent exception state set.
[0069] G. Successor state set: The successor state set is obtained from the limit successor state set or the exception successor state set.
[0070] H. Updated Activation State Set: The successor state set and the reachable state set of the character yield a new active state set.
[0071] I. The reachable state set of a character: Each character has a 512-bit set of reachable states, with each bit representing whether the state is reachable: 1 indicates reachable and 0 indicates unreachable.
[0072] J. ACCEPT state set (hit state).
[0073] The overall matching process of a nondeterministic finite automaton is as follows: Figure 8 As shown, it includes the following steps: 1. Starting from the set of active states, we obtain the set of successor states with finite span and the set of successor states with extra span, respectively; 2. Perform an OR operation on the successor state set with the finite span and the successor state set with the extra span to obtain the successor state set; 3. Perform a bitwise AND operation between the successor state set and the reachable state set of the input character to obtain a new active state set; 4. Perform a bitwise AND operation between the new set of active states and the receive mask. If the result is non-zero (i.e., contains a hit state), then the input character has successfully matched the nondeterministic finite automaton. 5. If the result of the operation is determined to be non-zero, the new activation state set is returned to step 1 as the activation state set until the end of the string or a successful match is achieved.
[0074] It should be noted that in the first set of nondeterministic finite automata, each successfully matched node has a corresponding offset, which serves as the index for the second set of nondeterministic finite automata. For example, the first set of nondeterministic finite automata is compiled from regular expressions 1, 2, 3, 4, and 5. The matched node numbers are 1 and 2. Node number 1 corresponds to regular expressions 1, 2, and 3, and node number 2 corresponds to regular expressions 4 and 5. If the node that successfully matches the pattern string is 1, then it is determined that the pattern string matches regular expressions 1, 2, and 3. Thus, based on the successfully matched node numbers, it is known that three nondeterministic finite automata corresponding to regular expressions 1, 2, and 3 need to be obtained from the second set of nondeterministic finite automata for precise matching.
[0075] In one possible implementation, in step 702, based on the initial active state set, state transitions with a preset span are performed along multiple state transition paths to obtain a finite span successor state set, as follows: Figure 9 As shown, it can be implemented as follows: In step 901, based on the initial active state set, multiple preset span state transitions are performed along multiple state transition paths to obtain corresponding multiple span results.
[0076] In step 902, the corresponding left shift operation is performed on the multiple span results to obtain multiple shift results.
[0077] In step 903, the multiple shift results are bitwise ORed to obtain a finite span successor state set.
[0078] The process of generating a finite span successor state set is as follows: Figure 10 As shown, it includes the following steps: 1. The set of active states is used as input; 2. Perform a bitwise AND (&) operation between the active state set and the three preset span masks Shift0, Shift1, and Shift2 respectively; 3. Shift the result of the operation with the three preset span masks to the left by 0, 1, and 2 bits to simulate a state transition; 4. Finally, combine all the shift results by bitwise OR (|) operation to obtain the successor state set with a finite span.
[0079] In one possible implementation, in step 703, based on the initial active state set, additional span state transitions are performed along multiple state transition paths to obtain an additional span successor state set, the process of which is as follows: Figure 11 As shown, it can be implemented as follows: In step 1101, based on the initial active state set, multiple additional span state transitions are performed along multiple state transition paths to obtain multiple additional state sets.
[0080] In step 1102, multiple additional state sets are bitwise ORed to obtain additional span successor state sets.
[0081] For example, the state transition diagram of a nondeterministic finite automaton is as follows: Figure 12 As shown, the initial activation state set is {0}, double circles indicate a hit, and the hit states are states 4 and 7, represented as ...1010000; the initial activation state set {0} undergoes state transitions along multiple state transition paths, resulting in a finite span successor state set {0,1,2,3}, represented as ...01111; since the state transition graph does not define extra span state transitions, the extra span successor state set is empty; the first character 'a' of the input pattern string 'abc' undergoes state transitions along multiple state transition paths in the state transition graph, resulting in a reachable state set {1,4} corresponding to character 'a', represented as ...10010; for the finite span successor state set { Perform a bitwise OR operation on {0,1,2,3} and the subsequent state set (empty set) with the extra span to obtain the subsequent state set {0,1,2,3}. Perform an AND operation on the reachable state set {1,4} corresponding to character 'a' and the subsequent state set {0,1,2,3} to obtain the accepting state set {1}, represented as ...00010. If the accepting state set {1} contains the hit states 4 and 7, then it is determined that character 'a' fails to match the regular expression corresponding to the nondeterministic finite automaton. Continue to input the next character 'b' in the pattern string to be matched, and so on, until the character successfully matches the regular expression corresponding to the nondeterministic finite automaton, or the pattern string 'abc' has been traversed.
[0082] When FPGA resources are insufficient, the precise matching part of the second nondeterministic finite automaton set can be uploaded to the CPU for precise matching. Since most of the traffic can be filtered out, the CPU only needs to process a small portion of the traffic, thus greatly reducing the performance pressure on the CPU.
[0083] It should be noted that the various state sets and masks mentioned above are all bitmaps, and each bit in the bitmap represents a different state if it is 0 or 1; the operations in the above steps are all based on the "AND" and "OR" operations of the bitmap; moreover, the above masks can be reset according to the actual situation; the embodiments of this application can also set multiple circuits to process simultaneously to improve detection performance.
[0084] In another possible implementation, if the number of states of the automaton meets the condition, the embodiments of this application can also compile the first regular expression rule set into a deterministic finite automaton, which can speed up the matching speed of the pattern string.
[0085] In summary, this application's embodiments compile regular expressions into three sets: a preset hash set, a first nondeterministic finite automaton set, and a second nondeterministic finite automaton set. A three-stage pipeline of "deterministic substring pre-filtering—first nondeterministic finite automaton set matching—second nondeterministic finite automaton set matching" is constructed for regular expression matching. The deterministic substring pre-filtering stage uses the preset hash set to quickly filter out most invalid pattern strings that do not contain deterministic substrings. The first nondeterministic finite automaton set is deployed on-chip memory as a fast path to handle frequently occurring or simple pattern strings. The second nondeterministic finite automaton set is deployed on external memory as a slow path to handle complex pattern strings. This application's embodiments, through fast and slow path collaborative matching and step-by-step filtering, ensure that most network requests are matched and filtered via the fast path, significantly reducing the processing overhead and resource consumption of pattern string and regular expression matching on the FPGA, meeting the high-performance requirements of the rule detection engine, and improving the detection efficiency of network attacks.
[0086] Based on the same inventive concept, embodiments of this application also provide an FPGA-based network attack detection device, applied to a Field Programmable Gate Array (FPGA), such as... Figure 13 As shown, the device 1300 includes: The pattern string acquisition unit 1301 is configured to receive a network request and extract the pattern string to be matched from the network request. The key feature matching unit 1302 is configured to match the pattern string to be matched with a preset hash set; the preset hash set includes the hash values of multiple decision substrings, which are used to characterize the key features of the regular expression; The fast path matching unit 1303 is configured to match the pattern string to be matched with a first nondeterministic finite automaton set to obtain a first matching result if it is determined that the pattern string to be matched fails to match the preset hash set; the first nondeterministic finite automaton set is compiled from a first regular expression rule set, and the regular expressions in the first regular expression rule set do not contain determinant substrings. Slow path matching unit 1304 is configured to match the pattern string to be matched with a second nondeterministic finite automaton set if it is determined that the pattern string to be matched successfully matches the preset hash set, and to obtain a second matching result; the second nondeterministic finite automaton set is compiled from a second regular expression rule set, and the first regular expression rule set is a subset of the second regular expression rule set; The slow path matching unit 1304 is further configured to match the pattern string to be matched with the second nondeterministic finite automaton set if the first matching result indicates that the pattern string to be matched is successfully matched with the first nondeterministic finite automaton set, thereby obtaining a third matching result. The detection result determination unit 1305 is configured to determine the network attack detection result corresponding to the network request based on the first matching result, the second matching result, or the third matching result.
[0087] Optionally, the preset hash set and the first nondeterministic finite automaton set are deployed on the on-chip memory of the FPGA, and the second nondeterministic finite automaton set is deployed on the external memory of the FPGA; the first nondeterministic finite automaton set includes one nondeterministic finite automaton; the second nondeterministic finite automaton set includes multiple nondeterministic finite automata.
[0088] Optionally, a preset hash set is obtained using the Bloom filter algorithm, including the following steps: Create an m-bit initial bit set and set the value of each bit in the initial bit set to the initial value; For each decision substring, k preset hash functions are used to hash the decision substring, and the initial bit set is updated according to the obtained k hash values to obtain the hash result corresponding to the decision substring; By summing the hash results corresponding to multiple decision substrings, a preset hash set is obtained.
[0089] Optionally, matching the pattern string to be matched with a preset hash set includes: The pattern string to be matched is hashed using k preset hash functions to obtain k hash values to be matched; If it is determined that the values at the k positions corresponding to the k unmatched hash values in the preset hash set are not the initial values, then it is determined that the unmatched pattern string matches the preset hash set successfully. If it is determined that the value of any one of the k positions corresponding to the k unmatched hash values in the preset hash set is the initial value, then it is determined that the unmatched pattern string fails to match the preset hash set.
[0090] Optionally, matching the pattern string to be matched with the first set of nondeterministic finite automata includes: For the nondeterministic finite automata in the first set of nondeterministic finite automata, perform the following steps: Iterate through each character of the pattern string to be matched, input the character into the nondeterministic finite automaton, perform state transitions along multiple state transition paths in the state transition graph, and determine the reachable state set corresponding to the character; Based on the initial set of activated states, state transitions with a preset span are performed along the multiple state transition paths to obtain a finite span successor state set; the initial set of activated states is the set of initial states of the nondeterministic finite automaton before the state transitions along the multiple state transition paths. Based on the initial set of active states, additional span state transitions are performed along the multiple state transition paths to obtain an additional span successor state set. Perform a bitwise OR operation on the finite span successor state set and the extra span successor state set to obtain the successor state set; Perform a bitwise AND operation between the reachable state set corresponding to the character and the successor state set to obtain the updated active state set; The updated activation state set is ANDed with the preset acceptance mask to obtain the acceptance state set. If it is determined that the received state set contains a hit state, then it is determined that the character successfully matches the regular expression corresponding to the nondeterministic finite automaton; the hit state is determined during the process of compiling the regular expression into a nondeterministic finite automaton. If it is determined that the received state set does not contain a hit received state, then the updated active state set is used as the initial active state set, and the step of traversing each character of the pattern string to be matched is returned until the character successfully matches the regular expression corresponding to the nondeterministic finite automaton, or the traversal of the pattern string to be matched is completed.
[0091] Optionally, the step of performing a preset span state transition along the multiple state transition paths based on the initial active state set to obtain a finite span successor state set includes: Based on the initial set of activated states, multiple state transitions with preset spans are performed along the multiple state transition paths to obtain multiple corresponding span results; Perform the corresponding left shift operation on the multiple span results to obtain multiple shift results; Perform a bitwise OR operation on the multiple shift results to obtain a finite span successor state set.
[0092] Optionally, the step of performing additional span state transitions along the multiple state transition paths based on the initial active state set to obtain an additional span successor state set includes: Based on the initial set of activated states, multiple additional span state transitions are performed along the multiple state transition paths to obtain multiple additional state sets; Perform a bitwise OR operation on the multiple additional state sets to obtain the additional span successor state set.
[0093] For details on the implementation of each operation and its beneficial effects in the FPGA-based network attack detection device, please refer to the description in the previous method section; it will not be repeated here.
[0094] Having introduced the FPGA-based network attack detection method and apparatus according to exemplary embodiments of this application, we will now introduce an electronic device according to another exemplary embodiment of this application.
[0095] Those skilled in the art will understand that various aspects of this application can be implemented as a system, method, or program product. Therefore, various aspects of this application can be specifically implemented in the following forms: a completely hardware implementation, a completely software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, collectively referred to herein as a "circuit," "module," or "system."
[0096] In some possible implementations, the electronic device according to this application may include at least one processor and at least one memory. The memory stores program code that, when executed by the processor, causes the processor to perform the steps in the FPGA-based network attack detection method according to various exemplary embodiments of this application described above. For example, the processor may perform steps such as those in the FPGA-based network attack detection method.
[0097] The following reference Figure 14 To describe an electronic device 130 according to this embodiment of the present application. Figure 14 The electronic device 130 shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.
[0098] like Figure 14 As shown, the electronic device 130 is presented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: at least one processor 131, at least one memory 132, and a bus 133 connecting different system components (including memory 132 and processor 131).
[0099] Bus 133 represents one or more of several bus structures, including a memory bus or memory controller, peripheral bus, processor, or local bus using any of the various bus structures.
[0100] The memory 132 may include a readable medium in the form of volatile memory, such as random access memory (RAM) 1321 and / or cache memory 1322, and may further include read-only memory (ROM) 1323.
[0101] The memory 132 may also include a program / utility 1325 having a set (at least one) of program modules 1324, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.
[0102] Electronic device 130 can also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), and with one or more devices that enable a user to interact with electronic device 130, and / or with any device that enables electronic device 130 to communicate with one or more other electronic devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 135. Furthermore, electronic device 130 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 136. As shown, network adapter 136 communicates with other modules used in electronic device 130 via bus 133. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.
[0103] In some possible implementations, various aspects of the FPGA-based network attack detection method provided in this application can also be implemented in the form of a program product, which includes program code. When the program product is run on a computer device, the program code is used to cause the computer device to perform the steps in the FPGA-based network attack detection method according to the various exemplary embodiments of this application described above.
[0104] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0105] The program product for the FPGA-based network attack detection method according to the embodiments of this application can be a portable compact disc read-only memory (CD-ROM) and include program code, and can run on an electronic device. However, the program product of this application is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
[0106] A readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying readable program code. This propagated data signal may take many forms, including—but not limited to—electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting a program for use by or in conjunction with an instruction execution system, apparatus, or device.
[0107] The program code contained on the readable medium may be transmitted using any suitable medium, including—but not limited to—wireless, wired, fiber optic, RF, etc., or any suitable combination thereof.
[0108] Program code for performing the operations of this application can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The program code can execute entirely on the user's electronic device, partially on the user's device, as a standalone software package, partially on the user's electronic device and partially on a remote electronic device, or entirely on a remote electronic device or server. In cases involving remote electronic devices, the remote electronic device can be connected to the user's electronic device via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external electronic device (e.g., via the Internet using an Internet service provider).
[0109] It should be noted that although several units or sub-units of the device have been mentioned in the detailed description above, this division is merely exemplary and not mandatory. In fact, according to embodiments of this application, the features and functions of two or more units described above can be embodied in one unit. Conversely, the features and functions of one unit described above can be further divided and embodied by multiple units.
[0110] Furthermore, although the operations of the method of this application are described in a specific order in the accompanying drawings, this does not require or imply that these operations must be performed in that specific order, or that all the operations shown must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step, and / or one step may be broken down into multiple steps.
[0111] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0112] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable image scaling device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image scaling device, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0113] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable image scaling device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0114] These computer program instructions can also be loaded onto a computer or other programmable image scaling device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable device for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0115] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0116] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A network attack detection method based on FPGA, characterized in that, The method, applied to a field-programmable gate array (FPGA), includes: Receive network requests and extract the pattern string to be matched from the network requests; The pattern string to be matched is matched with a preset hash set; the preset hash set includes the hash values of multiple determining substrings, which are used to characterize the key features of the regular expression; If it is determined that the pattern string to be matched fails to match the preset hash set, then the pattern string to be matched is matched with the first nondeterministic finite automaton set to obtain the first matching result; the first nondeterministic finite automaton set is compiled from the first regular expression rule set, and the regular expressions in the first regular expression rule set do not contain determinant substrings. If it is determined that the pattern string to be matched successfully matches the preset hash set, then the pattern string to be matched is matched with the second nondeterministic finite automaton set to obtain the second matching result; the second nondeterministic finite automaton set is compiled from the second regular expression rule set, and the first regular expression rule set is a subset of the second regular expression rule set; If the first matching result indicates that the pattern string to be matched is successfully matched with the first set of nondeterministic finite automata, then the pattern string to be matched is matched with the second set of nondeterministic finite automata to obtain a third matching result. Based on the first matching result, the second matching result, or the third matching result, determine the network attack detection result corresponding to the network request.
2. The method according to claim 1, characterized in that, The preset hash set and the first nondeterministic finite automaton set are deployed in the on-chip memory of the FPGA, and the second nondeterministic finite automaton set is deployed in the external memory of the FPGA. The first set of nondeterministic finite automata includes one nondeterministic finite automaton; the second set of nondeterministic finite automata includes multiple nondeterministic finite automata.
3. The method according to claim 1, characterized in that, The process of obtaining a preset hash set using the Bloom filter algorithm includes the following steps: Create an m-bit initial bit set and set the value of each bit in the initial bit set to the initial value; For each decision substring, k preset hash functions are used to hash the decision substring, and the initial bit set is updated according to the obtained k hash values to obtain the hash result corresponding to the decision substring; By summing the hash results corresponding to multiple decision substrings, a preset hash set is obtained.
4. The method according to claim 3, characterized in that, The step of matching the pattern string to be matched with the preset hash set includes: The pattern string to be matched is hashed using k preset hash functions to obtain k hash values to be matched; If it is determined that the values at the k positions corresponding to the k unmatched hash values in the preset hash set are not the initial values, then it is determined that the unmatched pattern string matches the preset hash set successfully. If it is determined that the value of any one of the k positions corresponding to the k unmatched hash values in the preset hash set is the initial value, then it is determined that the unmatched pattern string fails to match the preset hash set.
5. The method according to claim 1, characterized in that, The step of matching the pattern string to be matched with the first set of nondeterministic finite automata includes: For the nondeterministic finite automata in the first set of nondeterministic finite automata, perform the following steps: Iterate through each character of the pattern string to be matched, input the character into the nondeterministic finite automaton, perform state transitions along multiple state transition paths in the state transition graph, and determine the reachable state set corresponding to the character; Based on the initial set of activated states, state transitions with a preset span are performed along the multiple state transition paths to obtain a finite span successor state set; the initial set of activated states is the set of initial states of the nondeterministic finite automaton before the state transitions along the multiple state transition paths. Based on the initial set of active states, additional span state transitions are performed along the multiple state transition paths to obtain an additional span successor state set. Perform a bitwise OR operation on the finite span successor state set and the extra span successor state set to obtain the successor state set; Perform a bitwise AND operation between the reachable state set corresponding to the character and the successor state set to obtain the updated active state set; The updated activation state set is ANDed with the preset acceptance mask to obtain the acceptance state set. If it is determined that the received state set contains a hit state, then it is determined that the character successfully matches the regular expression corresponding to the nondeterministic finite automaton; the hit state is determined during the process of compiling the regular expression into a nondeterministic finite automaton. If it is determined that the received state set does not contain a hit received state, then the updated active state set is used as the initial active state set, and the step of traversing each character of the pattern string to be matched is returned until the character successfully matches the regular expression corresponding to the nondeterministic finite automaton, or the traversal of the pattern string to be matched is completed.
6. The method according to claim 5, characterized in that, The process of performing state transitions along multiple state transition paths based on the initial activation state set to obtain a finite span successor state set includes: Based on the initial set of activated states, multiple state transitions with preset spans are performed along the multiple state transition paths to obtain multiple corresponding span results; Perform the corresponding left shift operation on the multiple span results to obtain multiple shift results; Perform a bitwise OR operation on the multiple shift results to obtain a finite span successor state set.
7. The method according to claim 5, characterized in that, The process of performing additional span state transitions along the multiple state transition paths based on the initial active state set to obtain an additional span successor state set includes: Based on the initial set of activated states, multiple additional span state transitions are performed along the multiple state transition paths to obtain multiple additional state sets; Perform a bitwise OR operation on the multiple additional state sets to obtain the additional span successor state set.
8. A network attack detection device based on FPGA, characterized in that, The device, applied to a field-programmable gate array (FPGA), includes: The pattern string acquisition unit is configured to receive a network request and extract the pattern string to be matched from the network request. A key feature matching unit is configured to match the pattern string to be matched with a preset hash set; the preset hash set includes the hash values of multiple decision substrings, which are used to characterize the key features of the regular expression; The fast path matching unit is configured to, if it is determined that the pattern string to be matched fails to match the preset hash set, match the pattern string to be matched with a first nondeterministic finite automaton set to obtain a first matching result; the first nondeterministic finite automaton set is compiled from a first regular expression rule set, and the regular expressions in the first regular expression rule set do not contain determinant substrings. The slow path matching unit is configured to, if it is determined that the pattern string to be matched successfully matches the preset hash set, match the pattern string to be matched with a second nondeterministic finite automaton set to obtain a second matching result; the second nondeterministic finite automaton set is compiled from a second regular expression rule set, and the first regular expression rule set is a subset of the second regular expression rule set; The slow path matching unit is further configured to match the pattern string to be matched with the second nondeterministic finite automaton set if the first matching result indicates that the pattern string to be matched is successfully matched with the first nondeterministic finite automaton set, thereby obtaining a third matching result. The detection result determination unit is configured to determine the network attack detection result corresponding to the network request based on the first matching result, the second matching result, or the third matching result.
9. An electronic device, characterized in that, include: Processor and memory; The memory is used to store the processor-executable instructions; The processor is configured to execute the instructions to implement the FPGA-based network attack detection method as described in any one of claims 1-7.
10. A computer-readable storage medium, characterized in that, When the instructions in the computer-readable storage medium are executed by the processor of the electronic device, the electronic device is able to perform the FPGA-based network attack detection method as described in any one of claims 1-7.