Network security server control method and system integrating firewall function
By optimizing the rule matching process of the industrial internet firewall through hash mapping and decision tree algorithms, the matching delay problem caused by the large number of rules is solved, and fast response and efficient processing within microseconds are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN MAXTOPIC TECH CO LTD
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing industrial internet firewalls suffer from excessively high matching latency when dealing with a large number of rules, impacting the response speed of industrial control systems.
A hash mapping method is used to quickly locate the subset of rules that are initially matched. Combined with a decision tree algorithm, hierarchical traversal and cluster analysis are performed to eliminate low-value rules. An optimized index version is generated for hash mapping and decision tree matching to generate precise control instructions.
The number of matching calculations has been reduced from tens of thousands to hundreds, and the processing latency has been compressed from milliseconds to microseconds, improving the efficiency of rule retrieval and real-time response capabilities in high-concurrency scenarios.
Smart Images

Figure CN122247716A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of industrial internet firewall technology, and in particular to a network security server control method and system with integrated firewall functionality. Background Technology
[0002] The field of industrial internet firewall technology primarily focuses on boundary protection for critical infrastructure such as industrial control systems, intelligent manufacturing production lines, and energy management networks. Network traffic in these scenarios is characterized by a wide variety of protocols, high real-time requirements, and fixed communication relationships between devices. Firewall devices need to accurately filter massive amounts of industrial control protocol messages without introducing significant latency.
[0003] Existing industrial internet firewalls employ a five-tuple sequential matching method for access control. The firewall extracts five key features from network data packets: source IP address, destination IP address, source port, destination port, and protocol type. It then compares each packet against these features in a pre-defined rule table until a matching rule is found. Based on the rule's action field, it either allows or blocks the packet. While this method works quickly with a small number of rules, as the number of rules grows to tens of thousands, and with the introduction of multi-dimensional conditions such as application protocol identification and user authentication, the number of rules that the sequential matching method needs to traverse increases dramatically. The average number of matches reaches tens of thousands, causing the processing latency for a single data packet to rise from microseconds to milliseconds. This latency significantly hinders the response speed of industrial control systems.
[0004] In summary, existing technologies suffer from excessively high matching latency when the number of rules to be matched is large. Summary of the Invention
[0005] This invention provides a network security server control method and system with integrated firewall functionality to solve the problem of excessively high matching latency when the number of rules to be matched is large.
[0006] In a first aspect, to solve the above-mentioned technical problems, the present invention provides a network security server control method integrating firewall functionality, comprising: Acquire network data packets; Based on the network data packets, feature extraction processing is performed to obtain a network feature set; Based on the network feature set, rule filtering is performed to obtain a preliminary matching rule subset; Based on the initially matched rule subset, an applicability analysis is performed to obtain a rule applicability matrix. Then, the rule applicability matrix is refined to obtain a refined rule set. Based on the refined rule set, a hash mapping construction process is performed to obtain a hash table, and a matching scoring process is performed on the hash table to obtain the matching degree score of each rule. Based on the matching score, permission identifier conversion processing is performed to obtain permission identifiers, and data verification processing is performed on the permission identifiers to obtain access permission signals. Based on the access permission signal and the network data packet, an associated event is generated, and the dependency strength of the associated event is calculated to obtain the rule dependency strength. Based on the rule dependency strength, an optimization candidate set is selected, and the adjustment requirements of the optimization candidate set are evaluated to obtain the adjustment requirements. After determining the adjustment requirements, the index of the optimization candidate set is reconstructed to obtain the optimized index version. Based on the optimized index version, a hash mapping is generated to obtain a key-value pair sequence, and a decision instruction is generated from the key-value pair sequence to obtain a precise control instruction.
[0007] Secondly, the present invention provides a network security server control system with integrated firewall functionality, comprising: The data acquisition module is used to acquire network data packets; The feature processing module is used to perform feature extraction processing based on the network data packets to obtain a network feature set; The rule filtering module is used to perform rule filtering processing based on the network feature set to obtain a preliminary matching rule subset; The refinement module is used to perform applicability analysis processing based on the initially matched rule subset to obtain a rule applicability matrix, and to perform rule refinement processing on the rule applicability matrix to obtain a refined rule group. The matching score module is used to perform hash mapping construction processing based on the refined rule group to obtain a hash table, and to perform matching scoring processing on the hash table to obtain the matching score of each rule. The access permission module is used to perform permission identifier conversion processing based on the matching degree score to obtain a permission identifier, and to perform data verification processing on the permission identifier to obtain an access permission signal. The dependency strength module is used to generate associated events based on the access permission signal and the network data packets, and to calculate the dependency strength of the associated events to obtain the rule dependency strength. The indexing module is used to filter out an optimization candidate set based on the rule dependency strength, evaluate the adjustment requirements of the optimization candidate set to obtain the adjustment requirements, and after determining the adjustment requirements, reconstruct the index of the optimization candidate set to obtain the optimized index version. The control instruction module is used to generate a hash mapping based on the optimized index version to obtain a key-value pair sequence, and to generate decision instructions based on the key-value pair sequence to obtain precise control instructions.
[0008] Compared with the prior art, the present invention has the following beneficial effects: (1) After obtaining the network feature set, the present invention uses a hash mapping method to project the feature set onto a pre-established rule index structure to quickly locate the preliminary matching rule subset. This process reduces the matching range from all rules to a local subset, avoiding the inefficient operation of traditional sequential matching that requires traversing all rules, reducing the matching calculation of a single data packet from tens of thousands to hundreds of times, and compressing the processing latency from milliseconds to microseconds.
[0009] (2) When the number of initially matched rule subsets exceeds a threshold, this invention performs hierarchical traversal of the rule subsets using a decision tree algorithm, constructs a rule applicability matrix by combining the hit frequency and resource consumption weight of the rules, and uses cluster analysis to eliminate low-value rules to obtain a refined rule set. This process further compresses the number of rules to be compared to the level of dozens, significantly reducing the computational load of subsequent parallel comparisons and avoiding the drag on processing performance caused by high-consumption rules.
[0010] (3) This invention generates associated events based on access permission signals and traffic context, calculates the dependency strength between rules, selects high-dependency rule pairs for inclusion in the optimization candidate set, and evaluates the matching degree between the current index structure and rule dependency characteristics through simulated queries. When the repeated access ratio exceeds the threshold, index reconstruction is performed. This process aggregates rules with strong dependencies into the same branch in the index tree, enabling path reuse in subsequent queries, reducing the number of index traversals by more than 50%, and improving the rule retrieval efficiency in high-concurrency scenarios.
[0011] (4) This invention calibrates the parameters of the hash function based on the optimized index version, generates a calibrated hash mapping set, performs hash calculation on the data packet payload to obtain a key-value pair sequence, extracts the decision path through decision tree matching, and finally generates precise control instructions from the preset instruction template library. This process directly converts the comparison results of packet features and rule sets into an instruction format executable by the underlying forwarding engine. After the instruction is output, there is no need to go through the rule matching process again, so that each data packet can obtain a clear processing result within microseconds, ensuring real-time response capability in high-throughput scenarios. Attached Figure Description
[0012] Figure 1 This is a schematic flowchart of the network security server control method with integrated firewall function provided in the first embodiment of the present invention; Figure 2This is a schematic diagram of the network security server control system structure with integrated firewall function provided in the second embodiment of the present invention. Detailed Implementation
[0013] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0014] Reference Figure 1 The first embodiment of the present invention provides a network security server control method with integrated firewall functionality, comprising the following steps: S11, Obtain network data packets; S12, Based on the network data packets, perform feature extraction processing to obtain a network feature set; S13, Based on the network feature set, perform rule filtering processing to obtain a preliminary matching rule subset; S14. Based on the initially matched rule subset, perform applicability analysis to obtain a rule applicability matrix, and then refine the rule applicability matrix to obtain a refined rule set. S15, Based on the refined rule group, perform hash mapping construction processing to obtain a hash table, and perform matching scoring processing on the hash table to obtain the matching degree score of each rule; S16. Based on the matching score, perform permission identifier conversion processing to obtain a permission identifier, and perform data verification processing on the permission identifier to obtain an access permission signal. S17. Based on the access permission signal and the network data packet, generate an associated event, and calculate the dependency strength of the associated event to obtain the rule dependency strength. S18. Based on the rule dependency strength, an optimization candidate set is selected, and the adjustment requirements of the optimization candidate set are evaluated to obtain the adjustment requirements. After determining the adjustment requirements, the index of the optimization candidate set is reconstructed to obtain the optimized index version. S19, Based on the optimized index version, perform hash mapping to generate a key-value pair sequence, and generate decision instructions from the key-value pair sequence to obtain precise control instructions.
[0015] In step S11, network data packets are acquired.
[0016] It is worth noting that incoming network data packets are captured by a data capture component deployed at the network entry point. This component is connected to the mirror port of the core switch and receives all network data frames passing through the link in real time via the network interface card's promiscuous mode. When an external terminal or internal host initiates a communication request, the resulting Ethernet data frames are captured by this component and copied to the server processing unit. The captured data frames are then preliminarily parsed, reading the Ethernet header, IP header, and transport layer header information, and cached and organized according to the data packet time sequence to obtain a continuous raw network communication data stream. This raw data stream is then grouped and encapsulated according to the packet structure to form a set of network data packets that can be used for subsequent parsing and feature extraction. For example, when an internal host 192.168.1.100 accesses an external web server, its TCP communication data packets are captured in real time by the data capture component and input as a network data packet into the subsequent processing flow.
[0017] In step S12, feature extraction processing is performed based on the network data packets to obtain a network feature set, including: The network data packets are parsed to obtain the first intermediate data; The communication records from the same source in the first intermediate data are merged to obtain the second intermediate data. The second intermediate data is then merged and organized to obtain the network feature set.
[0018] It is worth noting that the network data packets collected at the network entry point are first written into the buffer in order of arrival time. Then, the frame header, network layer fields, and transport layer fields of each packet are read sequentially to break down the raw binary content into recognizable data items, thus obtaining the first intermediate data. The first intermediate data includes the source address, destination address, source port, destination port, transport protocol type, packet length, and arrival timestamp. The source address and destination address are read from fixed bytes in the IP header, the source port and destination port are read from fixed bytes in the TCP or UDP header, the packet length is directly obtained from the IP total length field, and the arrival timestamp is the system time record when the server received the packet. For example, if a network data packet arrives and is recorded as received by the server at 10:15:23 on March 16, 2026, after the system performs header parsing on the packet, it reads from the IP header that the source address is 192.168.1.100 and the destination address is 10.0.0.5, and from the TCP header that the source port is 51524 and the destination port is 443, identifies the transport protocol type as TCP, and determines the packet length to be 512 bytes based on the IP total length field, thus forming the first intermediate data corresponding to the packet.
[0019] Subsequently, communication records generated from the same source address within a set time window in the first intermediate data are aggregated, and the distribution of target ports, protocol types, and message lengths are extracted to obtain the second intermediate data. The set time window here is 10 seconds. Specifically, all messages corresponding to the same source address within 10 consecutive seconds are placed into the same statistical unit, and then the frequency of different target ports, the number of messages of various protocols, and the proportion of messages in different length ranges are counted. The message length range can be divided into three categories: less than 128 bytes, 128 bytes to 1024 bytes, and greater than 1024 bytes. After obtaining the second intermediate data, feature merging is performed, and a network feature set is generated according to a pre-defined field order. For example, a host with a source address of 192.168.1.100 generates 20 communication records within 10 consecutive seconds. Of these, 12 messages are sent to port 80, 6 to port 443, and 2 to port 8888. By protocol type, there are 18 TCP messages and 2 UDP messages. By message length range, there are 5 messages less than 128 bytes, 11 messages between 128 and 1024 bytes, and 4 messages greater than 1024 bytes, representing 25%, 55%, and 20% respectively. Based on this, the system obtains second intermediate data and further merges it according to the fields of source address, destination port distribution, protocol type distribution, and message length distribution to generate a network feature set corresponding to the source address. Finally, the network feature set includes source address features, destination port distribution features, protocol type distribution features, and message length distribution features.
[0020] In step S13, rule filtering is performed based on the network feature set to obtain a preliminary matching rule subset, including: The network feature set is organized according to a preset field order to form feature records, and index calculation is performed based on the feature records to obtain the first candidate rule set; The first candidate rule set is compared with the network feature set for consistency, and candidate rules that do not meet the consistency requirement are deleted to obtain the second candidate rule set. The second set of candidate rules is compared with the network feature set, and the candidate rules that meet the conditions are retained to obtain a preliminary matching subset of rules.
[0021] It is worth noting that the various features in the network feature set are organized according to a preset field order to form feature records that can directly participate in index calculation. These feature records include at least source address features, destination port distribution features, protocol type distribution features, and message length distribution features. Specifically, the source address features are first converted into corresponding numerical address information; the destination port distribution features are arranged according to port number size and then written into the port sequence; the protocol type distribution features are obtained by counting the number of messages corresponding to each protocol type within a set time window, and then dividing the number of messages for each protocol type by the total number of messages to obtain the corresponding percentage; the message length distribution features are obtained by counting the number of messages within each length interval, and then dividing the number of messages in each length interval by the total number of messages to obtain the corresponding percentage. After the above organization, the source address values, destination port sequences, the count values or percentage values corresponding to the protocol types, and the count values or percentage values corresponding to the message lengths are sequentially written into a fixed-length feature record, and this fixed-length feature record is input into the hash function corresponding to the rule index table for index calculation. The index calculation process involves concatenating the fields in the fixed-length feature record in a predetermined order, performing a hash operation, outputting one or more index positions, and then retrieving the corresponding stored set of rule numbers from the pre-established rule index structure based on these index positions to obtain the first candidate rule set. For example, a network feature set might correspond to a source address of 192.168.1.100, a target port distribution of 12 occurrences of port 80, 6 occurrences of port 443, and 2 occurrences of port 8888, a protocol type distribution of 18 TCP packets and 2 UDP packets, and a packet length distribution of 5 packets less than 128 bytes, 11 packets between 128 and 1024 bytes, and 4 packets greater than 1024 bytes. The system first converts the source address into numerical address information, then rewrites the protocol type distribution as a TCP count of 18, a UDP count of 2, and a TCP percentage of 0.9 and a UDP percentage of 0.1; simultaneously, it rewrites the packet length distribution as counts of 5, 11, and 4, and percentages of 0.25, 0.55, and 0.2. The data and target port sequence are then written into a fixed-length feature record in a preset order and input into a rule index table for hashing to obtain the 15th and 28th index bits. Finally, rule numbers R12, R18, R26 and R31 are read from the rule index structure corresponding to the 15th and 28th index bits as the first candidate rule set.
[0022] The pre-defined field order is a fixed arrangement of source address characteristics, destination port distribution characteristics, protocol type distribution characteristics, and message length distribution characteristics. For example, it might be written sequentially in the order of source address, destination port sequence, protocol type count, protocol type percentage, message length count, and message length percentage. The rule index table is a pre-organized table of rule numbers, used to record which rule numbers correspond to different feature combinations. It originates from writing the source address conditions, destination port conditions, protocol type conditions, and message length conditions involved in matching in historical access control rules into rule records in a fixed order. Then, fixed values are assigned to each field in each rule record before calculating the position number. Specifically, this means first converting the source address condition into an address value, the destination port condition into a port value, the protocol type condition into a protocol number, and the message length condition into a length range number. Then, summation and modulo operations are performed according to a fixed formula to obtain an integer position number. Rule numbers with the same position number are stored in the same location, thus forming the rule index table. For example, the position number is calculated by adding the source address value to the destination port value multiplied by 3, then adding the protocol number multiplied by 5, then adding the length range number multiplied by 7, and finally taking the remainder when divided by 256, resulting in a position number between 0 and 255. If a rule corresponds to the TCP protocol, the destination port is 443, and the message length range is 128 bytes to 1024 bytes, and the position number is 15 after calculation in the above manner, then the rule number is stored in position 15; when a subsequent network feature record also yields a position number of 15 after calculation using the same formula, the system can directly read the corresponding candidate rule number from position 15.
[0023] First, the matching conditions corresponding to each candidate rule in the first candidate rule set are read and matched against each feature in the current network feature set. The matching conditions are the source address restrictions, destination port restrictions, protocol type restrictions, and packet length restrictions specified in each candidate rule. The system first performs protocol consistency filtering, comparing the protocol type distribution in the network feature set with the protocol type conditions in each candidate rule, deleting candidate rules with inconsistent protocol types, and obtaining the second candidate rule set. For example, the current first candidate rule set contains rules R12, R18, and R26, where the protocol type condition of rules R12 and R18 is TCP, and the protocol type condition of rule R26 is UDP, while the protocol type distribution in the current network feature set shows that TCP packets constitute the majority. After comparing each rule, the system retains rules R12 and R18 and deletes rule R26 with inconsistent protocol types, thus obtaining the second candidate rule set.
[0024] After obtaining the second set of candidate rules, a condition check is performed on each candidate rule to determine the preliminary matching subset. Specifically, the target port restriction, message length restriction, and source address restriction corresponding to each candidate rule in the second set of candidate rules are read, and these conditions are then compared item by item with the target port distribution characteristics, message length distribution characteristics, and source address characteristics in the current network feature set. The comparison of the target port restriction checks whether the main target ports in the current communication record fall within the port range defined by the candidate rule; the comparison of the message length restriction checks whether the current message length distribution meets the length range requirements corresponding to the candidate rule; and the comparison of the source address restriction checks whether the current source address is within the address range allowed by the candidate rule. After the above comparisons, candidate rules that meet all conditions are retained, and candidate rules that do not meet any condition are deleted. The final set of remaining rules is the preliminary matching subset. For example, if the second candidate rule set includes rules R12 and R18, the target ports corresponding to the current network feature set are 80, 443 and 8888, the packets are mainly concentrated in the range of 128 bytes to 1024 bytes, and the source address is 192.168.1.100, where rule R12 allows access from ports 80 to 9000, matches packets from 128 bytes to 1024 bytes, and allows communication from this source address range, and rule R18 only allows access from ports 80 and 443, then after comparison, the system retains rule R12 and deletes rule R18, thereby obtaining a preliminary matching rule subset.
[0025] The final preliminary matching rule subset includes several candidate rule records corresponding to the current network feature set. Each candidate rule record includes at least rule identifier information and rule matching condition information. The rule matching condition information includes source address restriction information, destination port restriction information, protocol type restriction information, and message length restriction information.
[0026] In step S14, based on the initially matched rule subset, an applicability analysis is performed to obtain a rule applicability matrix. Then, the rule applicability matrix is refined to obtain a refined rule set, including: If the number of rules in the initially matched rule subset exceeds a preset rule threshold, the rule subset is arranged hierarchically according to the sequential dependency of the judgment conditions in the rule subset to obtain a rule hierarchy sequence. Based on the rule hierarchy sequence, applicability indicators are extracted from each rule in the initially matched rule subset to obtain a rule applicability matrix. The applicability indicators include hit frequency and resource consumption weight. The hit frequency is obtained by comparing the number of times the corresponding rule is triggered within a preset statistical time window with the total number of times all candidate rules are triggered. The resource consumption weight is obtained by summing the normalized average processing time, the number of comparison fields, and the number of bytes read during the execution of the corresponding rule. If there are rules in the rule application matrix whose resource consumption weight is greater than a preset resource threshold, then the rule application matrix is grouped based on the rule hierarchy sequence to obtain rule clusters; Based on the hit frequency, resource consumption weight, and concentrated distribution of the rule hierarchy sequence position of each rule in the rule cluster, the value range of each rule retention condition is determined, a filtering boundary is generated, and the rule cluster is filtered and retained based on the filtering boundary to obtain a refined rule group.
[0027] It is worth noting that the system determines whether the number of rules in the initial matching subset exceeds a preset rule threshold, which is set to 5. When the number of rules exceeds 5, the conditions that are evaluated first in the rules are placed at the top level, and the conditions that depend on the results of the previous level are placed at the bottom level. For example, the message payload length is evaluated first, then the fixed offset byte value is evaluated, and then the target port or protocol identifier is evaluated, thus obtaining the rule hierarchy sequence.
[0028] After obtaining the rule hierarchy sequence, applicability indicators are extracted for each rule. These applicability indicators include hit frequency and resource consumption weight. The hit frequency is obtained by dividing the number of times the rule is triggered within the statistical time window by the total number of times all candidate rules are triggered. The resource consumption weight is obtained by summing the average processing time, the number of comparison fields, and the number of bytes read during the execution of the rule, after normalization. For example, within the statistical time window of the most recent 60 seconds, rule 101 was triggered 34 times, rule 102 was triggered 26 times, and rule 103 was triggered 20 times. Therefore, the total number of triggers for all candidate rules is 80, and the hit frequency of rule 101 is 34 divided by 80, resulting in 0.425. Regarding the resource consumption weight, the maximum average processing time among all candidate rules is 10 microseconds, the maximum number of comparison fields is 10, and the maximum number of bytes read is 200. If rule 101 has an average processing time of 8 microseconds, a comparison field count of 7, and a load read count of 10 bytes, then the corresponding normalized values are 0.8, 0.7, and 0.05, respectively. Adding these three together yields a resource consumption weight of 1.55. To further control the result between 0 and 1, this weight can be divided by the sum of the three full-scale values (3), resulting in a final normalized resource consumption weight of 0.5167.
[0029] The rule hierarchy sequence, hit frequency, and resource consumption weight are then written into the rule application matrix in a fixed order, with each row corresponding to one rule. If a rule in the matrix has a resource consumption weight greater than a preset resource threshold of 0.4, all rules are grouped according to their position in the rule hierarchy sequence, hit frequency, and resource consumption weight to obtain rule clusters. Specifically, during grouping, the difference between each pair of rules in the rule application matrix is calculated item by item to obtain the difference in rule hierarchy sequence position, hit frequency, and resource consumption weight. These three differences are then compared with preset proximity thresholds, where the rule hierarchy sequence position difference threshold is set to 1, the hit frequency difference threshold is set to 0.15, and the resource consumption weight difference threshold is set to 0.10. When two rules do not exceed their respective thresholds in any of the three differences, they are considered close and assigned to the same rule cluster; when any difference exceeds its corresponding threshold, they are considered significantly different and assigned to different rule clusters. For example, if the difference in rule hierarchy sequence position between rule 101 and rule 102 is 1, the difference in hit frequency is 0.08, and the difference in resource consumption weight is 0.06, then none of the three differences exceed the corresponding thresholds. Therefore, rule 101 and rule 102 are classified into the same rule cluster.
[0030] For each rule cluster, the hit frequency and resource consumption weight are read for all rules. The corresponding values of each rule within the same rule cluster are averaged to obtain the cluster mean. Simultaneously, the positional distribution of each rule within the rule cluster in the rule hierarchy sequence is statistically analyzed, and the interval with the highest frequency of occurrence is taken as the hierarchy concentration interval for that rule cluster. The cluster mean characterizes the overall applicability and resource consumption level of the rule cluster, and the hierarchy concentration interval characterizes the traversal position where the rules within the rule cluster are mainly distributed. After obtaining the cluster mean and hierarchy concentration interval, a filtering boundary is generated based on preset retention thresholds. The hit frequency retention threshold is set to be no lower than 0.70, the resource consumption weight retention threshold is set to be no higher than 0.30, and the rule hierarchy sequence position retention range is set to be within the top 60% of the hierarchy positions. Next, each rule in each rule cluster is checked one by one. Rules with a hit frequency of not less than 0.70, a resource consumption weight of not more than 0.30, and a rule hierarchy sequence position falling within the corresponding hierarchy concentration interval or the top 60% range are retained. The remaining rules that do not meet the conditions are deleted. All the retained rules are merged to obtain the refined rule group. For example, if a rule cluster includes rule 101, rule 102, and rule 104, where rule 101 has a hit frequency of 0.85, a resource consumption weight of 0.20, and a rule hierarchy sequence position of 2; rule 102 has a hit frequency of 0.78, a resource consumption weight of 0.25, and a rule hierarchy sequence position of 3; and rule 104 has a hit frequency of 0.42, a resource consumption weight of 0.51, and a rule hierarchy sequence position of 5, then the cluster mean of the hit frequency of this rule cluster is 0.683, the cluster mean of the resource consumption weight is 0.32, and the hierarchy concentration interval is located between positions 2 and 3. After checking each rule against the above filtering boundaries, rules 101 and 102 meet the retention conditions, while rule 104 is deleted because its hit frequency is lower than 0.70 and its resource consumption weight is greater than 0.30. Finally, rules 101 and 102 constitute the refined rule group corresponding to this rule cluster.
[0031] The preset rule thresholds, rule hierarchy sequence position difference thresholds, hit frequency difference thresholds, hit frequency retention thresholds, and resource consumption weight retention thresholds mentioned above are all determined using a normal distribution-based approach. In practice, historical operational data for each indicator is first collected during continuous system operation to form sample sequences corresponding to each threshold. Then, the mean and standard deviation of each sample sequence are calculated. Based on this, the mean plus twice the standard deviation is used as the threshold calculation method to obtain the preset threshold for each indicator. For example, for the hit frequency difference threshold, the hit frequency difference between different rules during historical operation is statistically analyzed. The mean of this sample sequence is 0.05, and the standard deviation is 0.05. Therefore, the threshold is calculated as the mean plus twice the standard deviation, resulting in a threshold of 0.15.
[0032] The 60-second statistical time window mentioned above is achieved by first pre-setting multiple candidate time window lengths, such as 30 seconds, 45 seconds, 60 seconds, 90 seconds, and 120 seconds. Then, the number of rule triggers, hit frequency, and resource consumption weight within each candidate time window are statistically analyzed on the same historical running data. Subsequently, the change amplitude between adjacent statistical periods under each candidate time window is calculated, and the average change rate is obtained. Finally, the time window length with the lowest average change rate is selected as the statistical time window.
[0033] In step S15, a hash mapping construction process is performed based on the refined rule group to obtain a hash table, and a matching scoring process is performed on the hash table to obtain the matching degree score of each rule, including: Read the field order of the refined rule group, merge and organize the rules with the same field order to obtain a rule hierarchical structure, calculate the identifier value of each layer field in the hierarchical structure, and write the mapping relationship between the identifier value and the rule identifier of the rule hierarchical structure into a hash table; Based on the hash table, the number of hit fragments for each rule in the refined rule group is counted, and the number of hit fragments is divided by the total number of fragments for each rule in the refined rule group to obtain the initial similarity calculation value; The initial similarity values are linearly normalized to obtain the matching score for each rule.
[0034] It is worth noting that the matching content and field order of each rule in the refined rule group are read, and rules with the same field order are merged and organized. Then, a hierarchical rule structure for searching is created based on the order of the fields. Subsequently, the corresponding identifier value is calculated for each layer of fields in the hierarchical rule structure, and the mapping relationship between the identifier value and the corresponding rule identifier is written into a hash table, thus obtaining a fragmented mapping hash table. For example, the refined rule group includes rule 201 and rule 202, where the matching content corresponding to rule 201 is, in order, protocol type, target port, and payload characteristics, and the matching content corresponding to rule 202 is, in order, protocol type, target port, and message length. These two rules are written into the same hierarchical structure according to the order of the fields. Then, identifier values are calculated for the protocol type fragment, target port fragment, and subsequent feature fragment, and the calculated identifier values are written into a hash table corresponding to the rule identifiers of rule 201 and rule 202, thus forming a fragmented mapping hash table. The process of calculating the identifier value involves refining the rule group, which includes rules 201 and 202. Rule 201 matches the protocol type, target port, and payload characteristics, while rule 202 matches the protocol type, target port, and message length. The system first writes these two rules into the same hierarchical structure according to the order of their fields. Then, it calculates the identifier value for the protocol type fragment, target port fragment, and subsequent feature fragments. The calculated identifier value is then written into a hash table corresponding to the rule identifiers of rules 201 and 202, thus forming a fragment mapping hash table. For example, the identifier value calculation process involves a rule fragment with fields of protocol number 6, target port 443, and length marker 2. The system first writes these fields as the numerical strings 6, 443, and 2 in sequence. Then, it calculates 1345 by adding 443, multiplying by 3, adding 2, and multiplying by 5. Finally, it takes the remainder when divided by 256 to obtain the identifier value 65. The identifier value 65 and its corresponding rule identifier are then written into the hash table.
[0035] After obtaining the hash table, the rule fragments corresponding to each rule are located according to the identifier values recorded in the hash table, and the fragment data to be compared are evenly distributed to multiple processing threads. Each processing thread reads the identifier value corresponding to the assigned fragment, and then searches the fragment mapping hash table to see if there is a rule identifier corresponding to that identifier value. If it exists, the fragment is recorded as a hit for the corresponding rule, and the hit count for that rule is incremented by one. If it does not exist, it is not counted as a hit result. After all processing threads have completed the search, the hit counts for each rule are summarized, and the number of hit fragments for a certain rule is divided by the total number of fragments pre-recorded for that rule to obtain the initial similarity calculation value for that rule. Then, the initial similarity calculation value is compared with a preset trigger threshold. When the initial similarity calculation value is greater than or equal to the preset trigger threshold, weight allocation processing is performed. The specific process of weight allocation processing is as follows: first, the field content and field order corresponding to all rule fragments in each rule are read, and then the fields are classified according to their judgment role in rule matching. Field fragments that directly determine whether a rule is valid are identified as key fragments; field fragments used to supplement the matching range, narrow down the filtering conditions, or assist in verification are identified as ordinary fragments. Specifically, fragments corresponding to protocol type, target port, fixed position control byte, or feature identifier fields are key fragments, while fragments corresponding to message length, position supplement, or non-core content fields are ordinary fragments. After classification, key fragments are assigned larger base weight values, and ordinary fragments are assigned smaller base weight values. The base weight values of all fragments within the same rule are normalized so that the sum of the weights of all fragments under that rule is 1. In the hit statistics phase, only the weight values corresponding to the hit fragments are extracted and accumulated to obtain the hit weight sum. This hit weight sum is then divided by the sum of the weights of all fragments under that rule to obtain the weighted similarity value. This processing method can both utilize the hit ratio for preliminary screening and reflect the actual impact of key field hits on the rule matching results. For example, if rule 201 corresponds to a total of 10 fragments, and after parallel processing, the number of hit fragments is 6, then the initial similarity calculation value is 6 divided by 10, resulting in 0.6. If the preset trigger threshold is 0.5, then the weight allocation for this rule continues. Further, if the first and second fragments of the rule correspond to the protocol type and target port fields, and are therefore identified as critical fragments, and the third to tenth fragments correspond to the length field and payload supplement field, and are therefore identified as ordinary fragments; then the first fragment is assigned a weight of 0.2, the second fragment a weight of 0.15, and the remaining hit fragments a weight of 0.15, 0.15, 0.08, and 0.07 respectively, and the sum of the weights of all fragments of this rule is 1, then the sum of the hit weights is 0.75, and the final weighted similarity value of this rule is 0.75.
[0036] The preset trigger threshold is based on the initial similarity values collected during continuous operation for each rule, forming a historical sample sequence. The mean and standard deviation of this sample sequence are then calculated, and the mean plus twice the standard deviation is used as the preset trigger threshold. For example, if the mean of the historical initial similarity samples is 0.42 and the standard deviation is 0.04, then the preset trigger threshold is 0.42 plus 2.004, resulting in 0.50. Therefore, the preset trigger threshold is set to 0.50.
[0037] Fragment weights are allocated based on the correlation between the fragment category and the final matching result of the rule. Specifically, all fragments within the same rule are first divided into several categories according to their decision function. Then, based on historical samples, the correlation coefficient between each category of fragments and the final matching result of the rule is calculated. Next, the absolute value of the correlation coefficient for each category of fragments is taken and divided by the sum of the absolute values of the correlation coefficients for all categories to obtain the category weight percentage for each category. Then, based on the number of fragments contained in each category, the category weight percentage corresponding to that category is evenly distributed to each fragment within that category to obtain the initial weight for each fragment. Finally, the initial weights of all fragments are normalized so that the sum of all fragment weights is 1, thus obtaining the final weight for each fragment.
[0038] The weighted similarity values are then converted into a matching score using linear normalization. Specifically, the weighted similarity values for each rule are statistically analyzed from historical data to determine the lower and upper limits of the weighted similarity values. The lower limit is then subtracted from the weighted similarity value of the current rule, and the result is divided by the difference between the upper and lower limits to obtain the normalized matching score. A score less than 0 is assigned the value 0, and a score greater than 1 is assigned the value 1, thus uniformly controlling the matching score within the range of 0 to 1.
[0039] In step S16, based on the matching score, permission identifier conversion processing is performed to obtain a permission identifier, and data verification processing is performed on the permission identifier to obtain an access permission signal, including: Select the refined rule group with the highest matching score as the rule group to be processed, parse the access indication information in the rule group to be processed, and if the access indication information is to allow access, extract the control instructions of the rule group to be processed. The control commands are converted into permission identifiers using a pre-configured action mapping table; The permission identifier and the network data packet are concatenated to obtain a complete data structure. The complete data structure is then validated to obtain a validation field. This validation field is added to the complete data structure to obtain an access permission signal.
[0040] It's worth noting that the refined rule group with the highest matching score is selected as the rule group to be processed, ranked from highest to lowest. After selecting the rule group to be processed, the access indication information in the rule group is parsed. The access indication includes "Allow Access" and "Deny Access". If the access indication is "Allow Access", the system continues to execute the subsequent processing of that rule. If the access indication is "Deny Access", the system skips that rule and continues to check the next rule.
[0041] After the rule passes the access indication check, the system further extracts the control instructions associated with that rule. These control instructions specify concrete network traffic control operations, such as opening port 443 or allowing specific protocols to pass. The system then translates these control instructions into standardized permission identifiers by consulting a pre-configured action mapping table.
[0042] After obtaining the authorization identifier, signal encapsulation processing is performed. First, the system concatenates the authorization identifier with metadata such as the source address, destination address, destination port, and protocol type from the network data packet to form a complete data structure. Next, the system performs verification and calculation on the concatenated data: first, the authorization identifier is converted into a byte array, such as the ASCII representation of "ALLOW_443_TCP". The bytes [65,76,76,79,87,95,52,52,51,95,84,67,80] are concatenated with the source address 192.168.1.100 (bytes: 192,168,1,100), the destination address 10.0.0.5 (bytes: 10,0,0,5), the destination port 443 (bytes: 1,187), and the protocol type TCP (byte 6) to form a complete data packet byte array. Then, all bytes in the data packet are summed one by one. The sum is 1586 (which does not exceed the maximum value of 16-bit integer 65535, so no overflow folding is needed). The sum is then subtracted from the maximum value of 16-bit integer 65535 to obtain the checksum 63949 (0xF98D in hexadecimal). Finally, this checksum (bytes 249,141) is appended to the end of the data packet as a check field to form a complete access permission signal.
[0043] A pre-configured action mapping table is a data table that establishes a correspondence between control commands and standardized permission identifiers. Based on common network traffic control requirements, such as port opening and protocol permission granting, the system designs control commands and maps them to permission identifiers in a unified format; for example, "open port 443" maps to "ALLOW_443_TCP". The action mapping table is manually configured or generated using automated tools based on historical network traffic management experience and device support, assigning each control command to a unique permission identifier.
[0044] In step S17, based on the access permission signal and the network data packet, an associated event is generated, and the dependency strength of the associated event is calculated to obtain the rule dependency strength, including: Extract the permission identifier from the access permission signal, and merge the permission identifier and the network data packet to obtain the associated event; The number of times each pair of rules co-occurs in the associated events is counted, and the number of times co-occurs is divided by the total number of times each rule of the associated events is triggered to obtain the co-occurrence frequency. The co-occurrence frequency is then multiplied by a preset scaling factor to obtain the rule dependency strength.
[0045] It's worth noting that the permission identifier is extracted from the access permission signal and combined with network data packet data to construct a related event. Specifically, the permission identifier, such as "ALLOW_443_TCP," is extracted from the access permission signal, and then relevant metadata, such as the source address 192.168.1.100 and the destination port 443, is extracted from the network data packet. Next, the system merges this data into a single related event and records the rule ID of that event.
[0046] Subsequently, the system counts the co-occurrence frequency of each pair of rules in the associated events. The co-occurrence frequency refers to whether rule A and rule B are triggered simultaneously in the same network traffic session. Then, the system calculates the co-occurrence frequency for each pair of rules. Rule dependency strength is the ratio of the co-occurrence frequency to the total number of times rule A and rule B are triggered independently. Specifically, the co-occurrence frequency of rule A and rule B is equal to the number of times they are jointly triggered in the same traffic session divided by the total number of times they are triggered independently, and then multiplied by a scaling factor. For example, if rule A and rule B have been jointly triggered 50 times in the past period, while rule A has been triggered a total of 100 times and rule B has been triggered a total of 80 times, then the co-occurrence frequency of rule A and rule B is 50 divided by 100, then divided by 80, and then multiplied by the preset scaling factor of 100 to obtain 0.625.
[0047] To determine the value of the preset scaling factor, historical running data is analyzed, particularly the co-occurrence frequency among the rules. The preset scaling factor is calculated based on the maximum co-occurrence frequency. For example, if the maximum co-occurrence frequency in the historical data is 0.01, then dividing 1 by the maximum co-occurrence frequency of 0.01 yields a preset scaling factor of 100. The preset scaling factor ensures that the rule dependency strength is within the range of 0 to 1.
[0048] In step S18, based on the rule dependency strength, an optimization candidate set is selected, and the adjustment requirements of the optimization candidate set are evaluated to obtain the adjustment requirements. After determining the adjustment requirements, the optimization candidate set is restructured to obtain an optimized index version, including: Rule pairs whose rule dependency strength is greater than a preset strong dependency threshold are included in the optimization candidate set; According to the storage layout of the preset index structure, the number of nodes accessed during the query process of each rule in the optimization candidate set and the number of nodes repeatedly accessed between different rules are simulated layer by layer. The repeated access ratio is calculated. When the repeated access ratio exceeds the preset repeated threshold, it is determined that there is an adjustment requirement. Once the adjustment requirement is determined, a set of rules is selected from the optimization candidate set, the common ancestor node of the set of rules is determined in the storage layout of the preset index structure, an intermediate node is created under the common ancestor node, and the child nodes of the set of rules are migrated to the intermediate node to obtain the optimized index version.
[0049] It is worth noting that all dependency strength values are compared one by one with a preset strong dependency threshold, and rule pairs that exceed the preset strong dependency threshold are selected into the optimization candidate set. For example, the dependency strength value of rule C on rule D is 0.88, the dependency strength value of rule E on rule F is 0.79, and the dependency strength value of rule G on rule H is 0.53. These values are compared one by one with the preset strong dependency threshold of 0.70. If it is determined that the dependency strength values of rule C on rule D and rule E on rule F both exceed the threshold, then {rule C, rule D, strength value 0.88} and {rule E, rule F, strength value 0.79} are included in the preferred candidate set.
[0050] The system takes the rule pairs in the candidate optimization set as input and simulates querying the location of each rule layer by layer according to the storage layout of the preset index structure. Taking rules C and D as examples, since the key value order of the preset index structure is divided first by the source address segment and then by the target port segment, the system first enters the source address segment 192.168.1.0 branch corresponding to rule C from the root node, and then enters the child node of the target port 80 to locate rule C. This step visits 3 nodes. Subsequently, the system starts from the root node again, enters the same source address segment 192.168.1.0 branch again, and then enters the child node of the target port 3306 to locate rule D. This step visits another 3 nodes. The two queries visit a total of 6 nodes, with the source address segment branch being visited twice. The system records the paths of the two queries and compares the node overlap of the two paths. It finds that rules C and D overlap at the source address branch level, but they are scattered in different target port child nodes, resulting in two independent traversals required for the query. Based on the recorded results, the current index structure requires access to a total of 6 nodes when processing this pair of rules, of which 3 nodes are accessed repeatedly, resulting in a repetition rate of 50%. When this rate exceeds the system's preset repetition threshold of 40%, it is determined that the current index structure is not fully utilizing the relationships between rules and requires adjustment.
[0051] Once an adjustment is determined, a pair of rules, C and D, is selected from the optimization candidate set. Taking C and D as an example, the system first locates the storage paths of these two rules within the current index structure, recording each node layer by layer from the root node down, and finding the common ancestor node of the two paths—the node where they last overlapped in the index tree. Under this common ancestor node, the system creates a new intermediate node, moving the child nodes of C and D, originally directly attached to the common ancestor node, to the new intermediate node as its children. Subsequently, the system attaches the new intermediate node to the child node position of the common ancestor node and updates the entries in the hash map table that originally pointed to C and D, changing their node pointers to the address of the new intermediate node. Finally, the system atomically writes the modified index structure to persistent storage, overwriting the old version, resulting in the optimized index version.
[0052] The preset strong dependency threshold is based on the statistical strength of dependency values between all rule pairs over a past period. Every day at midnight, the system extracts the calculation results of all rule pairs from the rule association set and aggregates these dependency strength values into a dataset. After 30 consecutive days of data accumulation, the system performs statistical analysis on this dataset and finds that these dependency strength values exhibit a normal distribution. The system calculates the mean of this dataset to be 0.52 and the standard deviation to be 0.09. In a normal distribution, the mean plus twice the standard deviation can cover approximately 95% of the data. Following this principle, the system adds twice the standard deviation (0.18) to the mean of 0.52, resulting in 0.70.
[0053] The storage layout of the default index structure is determined by two configuration parameters: key-value order and branch factor. Key-value order specifies the order in which multi-dimensional features are hashed during index construction; for example, the source address range is hashed first, followed by the target port range. The branch factor determines the maximum number of child nodes that each node in the hash tree can have. These two parameters are set by the system administrator based on the actual characteristics of the network environment when the index structure is initially created. After setting, they are saved in the configuration file, and the index structure organizes the rules layer by layer to different node locations according to these parameters.
[0054] The preset duplicate threshold is based on the percentage of repeated accesses to all rule pairs in simulated queries over a past period. The system extracts all rule pairs from the rule association set daily, executes a simulated query for each pair according to the current index structure, records the total number of accessed nodes and the number of repeated accessed nodes for each query, calculates the duplicate access percentage, and aggregates these percentages into a dataset. After 30 consecutive days of data accumulation, the system performs statistical analysis on this dataset and finds that these duplicate access percentages exhibit a normal distribution. The system calculates the mean of this dataset to be 0.28 and the standard deviation to be 0.06. In a normal distribution, the mean plus twice the standard deviation can cover approximately 95% of the data. Following this principle, the system adds twice the standard deviation (0.12) to the mean of 0.28, resulting in 0.40.
[0055] In step S19, based on the optimized index version, a hash mapping is generated to obtain a key-value pair sequence, and a decision instruction is generated from the key-value pair sequence to obtain precise control instructions, including: Based on the preset hash function and the optimized index version, the network data packets are calibrated to obtain a hash mapping set; The payload content is extracted from the network data packets. Based on the hash mapping set, the payload content is hashed to obtain a hash value and a target port value. The hash value is used as the key and the target port value is used as the value. The key and the value are combined to obtain a key-value pair sequence. Based on a preset decision tree, node matching is performed on the key-value pair sequence to determine the successfully matched nodes. Decision paths are then extracted back from the successfully matched nodes to obtain a path dataset. Based on the network data packets and the rule identifiers in the path dataset, the data is filled into a preset instruction template library to generate precise control instructions.
[0056] It's worth noting that the message length, protocol type, source IP address, and destination port of network data packets are used as traffic characteristics. The system reads the hash function seed value stored in the optimized index version, which is initially set to 10. The system inputs the message length of 1500 bytes and the protocol type TCP into the hash function. The default hash function is an algorithm based on a linear congruence generator, mathematically expressed as the new seed value equal to the current seed value multiplied by a fixed multiplier plus an increment value, then modulo a large prime number. The system reads the hash function seed value stored in the optimized index version, which is initially set to 10. The system extracts the message length of 1500 bytes and the protocol type TCP from subsequent data packets. The system first converts the protocol type TCP to a fixed value of 6, then divides the message length of 1500 by 100 to get 15, adds these two values to get 21, and then uses this 21 to replace the original increment value. The system performs a hash function calculation: multiply the current seed value 10 by the multiplier 1664525, add the new increment value 21, and then take the remainder when modulo 4294967296 to obtain the new seed value.
[0057] The subsequent process of generating the entire hash map set using the new seed value is as follows: All rules in the current rule index structure are traversed. Each rule contains multiple dimensions of features, including source address range, target port, and application protocol. The system uses the new seed value as the initial input to the hash function. Then, the source address range of the rule is converted to integer form (e.g., 192.168.1.0 is converted to 3232235776). This integer is XORed with the current hash value, followed by multiplication to obtain an intermediate hash value. The target port value (80) of the rule is then XORed and multiplied with the intermediate hash value to obtain the final hash result. This hash result is moduloed by the total number of hash buckets to determine which hash bucket the rule should be placed in. The total number of hash buckets is set to 1024 when the index structure is created. The system repeats this calculation process for each rule, assigning the rule to the corresponding hash bucket to form a complete hash map set.
[0058] Based on the hash mapping set, a preset hash algorithm is executed on the payload content of subsequent data packets to generate a key-value pair sequence. The system extracts the HTTP request header from the data packet as the payload content, divides the payload content into multiple data blocks of fixed length, and calculates the hash value for each data block sequentially. The hash calculation uses a cyclic redundancy check (CRC) algorithm, updating the check value for each input byte, and finally outputting a 32-bit hash result. The system uses this hash result as the key and the corresponding target port value as the value to form a key-value pair, for example, the key is "a3f2c1b4" and the value is "80". Multiple such key-value pairs are collected together to form a key-value pair sequence. The hash algorithm calculation process consists of three steps. First, the input payload content is divided into multiple data blocks of fixed length, each data block being 16 bytes long. Then, a 32-bit hash value of 0 is initialized. Each data block is read sequentially, the current hash value is left-shifted by 5 bits and XORed with the value of the current data block, and then the result is added to the current hash value to obtain the updated hash value.
[0059] Based on the matching degree between the key-value pair sequence and the preset decision tree nodes, the system extracts decision path information from the key-value pair sequence to obtain the extracted path dataset. The system compares each key-value pair in the key-value pair sequence with the conditions stored on the decision tree nodes. The conditions on the decision tree nodes include the source IP address range, the target port range, and the protocol type. The system compares the values in each key-value pair to see if they fall within the range defined by the node conditions, counts the number of successfully matched nodes, divides it by the total number of nodes, and obtains a matching degree of 0.85. When this matching degree exceeds the system's preset matching threshold of 0.8, the system starts from the successfully matched node and backtracks upwards along the decision tree to the root node, collecting the conditions recorded on each node sequentially. For example, the first-level condition is that the protocol type is TCP, the second-level condition is that the source IP address is in the range of 192.168.0.0 to 192.168.255.255, and the third-level condition is that the target port is 80. The system packages these conditions together with the corresponding rule identifier R3 into a path dataset.
[0060] Subsequently, an access-allowed template is selected from the preset instruction template library to perform an instruction conversion on the path dataset, resulting in a precise control instruction. The preset instruction template library is a fixed-format data structure containing an opcode field, a rule identifier field, a forwarding target field, and a checksum field. The system extracts the rule identifier R3 from the path dataset and fills it into the rule identifier field, sets the opcode to 0x01 to indicate access is allowed, extracts the next-hop gateway address 192.168.1.254 from the data packet and fills it into the forwarding target field, and calculates the CRC16 checksum for the first three fields and fills it into the checksum field. After completion, the system sends this complete instruction to the underlying forwarding engine as the precise control instruction for the current data packet.
[0061] After precise control commands are output, the underlying forwarding engine can directly allow or block data packets based on the opcode and forwarding target field in the command, without needing to go through a rule matching process. This command also carries a rule identifier, allowing subsequent traffic with similar characteristics to be quickly associated with the confirmed decision path, reducing redundant calculations. By continuously outputting such commands, the system can maintain stable processing latency in high-concurrency scenarios, ensuring that each data packet receives a clear processing result within milliseconds.
[0062] The preset matching threshold is based on the matching degree values generated when matching all key-value pair sequences with decision tree nodes over a past period. The system extracts the calculation results of each matching operation from the processing logs daily, including the number of successfully matched nodes and the total number of nodes, calculates the matching degree value, and aggregates these values into a dataset. After 30 consecutive days of data accumulation, the system performs statistical analysis on this dataset and finds that these matching degree values exhibit a normal distribution. The system calculates the mean of this dataset to be 0.65 and the standard deviation to be 0.075. In a normal distribution, the mean plus twice the standard deviation can cover approximately 95% of the data. Following this principle, the system adds twice the standard deviation (0.15) to the mean of 0.65, resulting in 0.80.
[0063] The default decision tree is constructed during system initialization based on all rule conditions in the rule set. The system traverses each rule, extracts condition fields such as source address range, target port range, and protocol type, and organizes these conditions into a tree structure according to logical relationships. The root node represents the broadest condition, the leaf nodes represent specific rule identifiers, and the intermediate nodes represent progressively more refined combinations of conditions.
[0064] The preset instruction template library is a set of fixed-format data structures predefined by the administrator according to the interface specifications of the underlying forwarding engine during system deployment. Each template contains an opcode field, a rule identifier field, a forwarding target field, and a verification field. The opcode field defines two handling types: allow access or block access. The forwarding target field specifies the next-hop address of the data packet. The administrator saves these templates in the configuration file for the system to call during runtime.
[0065] In summary, this invention discloses a network security server control method integrating firewall functionality, comprising: acquiring network data packets and extracting feature sets; filtering a preliminary matching rule subset based on the feature sets; performing applicability analysis on the rule subset to obtain a rule applicability matrix, and refining it to obtain a refined rule group; constructing a hash table and performing matching scoring on the refined rule group to obtain a matching degree score for each rule; converting and verifying permission identifiers based on the matching degree scores to obtain access permission signals; generating associated events based on the access permission signals and calculating rule dependency strength; filtering and optimizing candidate sets and evaluating adjustment requirements, and reconstructing the index after determining the requirements to obtain an optimized index version; generating a key-value pair sequence based on the optimized index version, and obtaining precise control instructions after decision-making. This invention solves the problem of excessively high matching latency when the number of rule matches is large by using precise control instructions.
[0066] Reference Figure 2 The second embodiment of the present invention provides a network security server control system with integrated firewall functionality, comprising: The data acquisition module is used to acquire network data packets; The feature processing module is used to perform feature extraction processing based on the network data packets to obtain a network feature set; The rule filtering module is used to perform rule filtering processing based on the network feature set to obtain a preliminary matching rule subset; The refinement module is used to perform applicability analysis processing based on the initially matched rule subset to obtain a rule applicability matrix, and to perform rule refinement processing on the rule applicability matrix to obtain a refined rule group. The matching score module is used to perform hash mapping construction processing based on the refined rule group to obtain a hash table, and to perform matching scoring processing on the hash table to obtain the matching score of each rule. The access permission module is used to perform permission identifier conversion processing based on the matching degree score to obtain a permission identifier, and to perform data verification processing on the permission identifier to obtain an access permission signal. The dependency strength module is used to generate associated events based on the access permission signal and the network data packets, and to calculate the dependency strength of the associated events to obtain the rule dependency strength. The indexing module is used to filter out an optimization candidate set based on the rule dependency strength, evaluate the adjustment requirements of the optimization candidate set to obtain the adjustment requirements, and after determining the adjustment requirements, reconstruct the index of the optimization candidate set to obtain the optimized index version. The control instruction module is used to generate a hash mapping based on the optimized index version to obtain a key-value pair sequence, and to generate decision instructions based on the key-value pair sequence to obtain precise control instructions.
[0067] It should be noted that the network security server control system with integrated firewall function provided in the embodiments of the present invention is used to execute all the process steps of the network security server control method with integrated firewall function in the above embodiments. The working principle and beneficial effect of the two are one-to-one, so they will not be described again.
[0068] It should be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Furthermore, in the accompanying drawings of the device embodiments provided by this invention, the connection relationships between modules indicate that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines. Those skilled in the art can understand and implement this without any creative effort.
[0069] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. In particular, it should be noted that any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention for those skilled in the art.
Claims
1. A network security server control method integrating firewall functionality, characterized in that, include: Acquire network data packets; Based on the network data packets, feature extraction processing is performed to obtain a network feature set; Based on the network feature set, rule filtering is performed to obtain a preliminary matching rule subset; Based on the initially matched rule subset, an applicability analysis is performed to obtain a rule applicability matrix. Then, the rule applicability matrix is refined to obtain a refined rule set. Based on the refined rule set, a hash mapping construction process is performed to obtain a hash table, and a matching scoring process is performed on the hash table to obtain the matching degree score of each rule. Based on the matching score, permission identifier conversion processing is performed to obtain permission identifiers, and data verification processing is performed on the permission identifiers to obtain access permission signals. Based on the access permission signal and the network data packet, an associated event is generated, and the dependency strength of the associated event is calculated to obtain the rule dependency strength. Based on the rule dependency strength, an optimization candidate set is selected, and the adjustment requirements of the optimization candidate set are evaluated to obtain the adjustment requirements. After determining the adjustment requirements, the index of the optimization candidate set is reconstructed to obtain the optimized index version. Based on the optimized index version, a hash mapping is generated to obtain a key-value pair sequence, and a decision instruction is generated from the key-value pair sequence to obtain a precise control instruction.
2. The network security server control method with integrated firewall function according to claim 1, characterized in that, The step of performing feature extraction processing based on the network data packets to obtain a network feature set includes: The network data packets are parsed to obtain the first intermediate data; The communication records from the same source in the first intermediate data are merged to obtain the second intermediate data. The second intermediate data is then merged and organized to obtain the network feature set.
3. The network security server control method with integrated firewall function according to claim 1, characterized in that, The step of performing rule filtering based on the network feature set to obtain a preliminary matching rule subset includes: The network feature set is organized according to a preset field order to form feature records, and index calculation is performed based on the feature records to obtain the first candidate rule set; The first candidate rule set is compared with the network feature set for consistency, and candidate rules that do not meet the consistency requirement are deleted to obtain the second candidate rule set. The second set of candidate rules is compared with the network feature set, and the candidate rules that meet the conditions are retained to obtain a preliminary matching subset of rules.
4. The network security server control method with integrated firewall function according to claim 1, characterized in that, The applicability analysis is performed on the initially matched rule subset to obtain a rule applicability matrix, and then the rule applicability matrix is refined to obtain a refined rule set, including: If the number of rules in the initially matched rule subset exceeds a preset rule threshold, the rule subset is arranged hierarchically according to the sequential dependency of the judgment conditions in the rule subset to obtain a rule hierarchy sequence. Based on the rule hierarchy sequence, applicability indicators are extracted from each rule in the initially matched rule subset to obtain a rule applicability matrix. The applicability indicators include hit frequency and resource consumption weight. The hit frequency is obtained by comparing the number of times the corresponding rule is triggered within a preset statistical time window with the total number of times all candidate rules are triggered. The resource consumption weight is obtained by summing the normalized average processing time, the number of comparison fields, and the number of bytes read during the execution of the corresponding rule. If there are rules in the rule application matrix whose resource consumption weight is greater than a preset resource threshold, then the rule application matrix is grouped based on the rule hierarchy sequence to obtain rule clusters; Based on the hit frequency, resource consumption weight, and concentrated distribution of the rule hierarchy sequence position of each rule in the rule cluster, the value range of each rule retention condition is determined, a filtering boundary is generated, and the rule cluster is filtered and retained based on the filtering boundary to obtain a refined rule group.
5. The network security server control method with integrated firewall function according to claim 1, characterized in that, The process involves constructing a hash table based on the refined rule set, and then performing a matching scoring process on the hash table to obtain a matching score for each rule, including: Read the field order of the refined rule group, merge and organize the rules with the same field order to obtain a rule hierarchical structure, calculate the identifier value of each layer field in the hierarchical structure, and write the mapping relationship between the identifier value and the rule identifier of the rule hierarchical structure into a hash table; Based on the hash table, the number of hit fragments for each rule in the refined rule group is counted, and the number of hit fragments is divided by the total number of fragments for each rule in the refined rule group to obtain the initial similarity calculation value; The initial similarity values are linearly normalized to obtain the matching score for each rule.
6. The network security server control method with integrated firewall function according to claim 1, characterized in that, The process of performing permission identifier conversion based on the matching score to obtain a permission identifier, and performing data verification processing on the permission identifier to obtain an access permission signal, includes: Select the refined rule group with the highest matching score as the rule group to be processed, parse the access indication information in the rule group to be processed, and if the access indication information is to allow access, extract the control instructions of the rule group to be processed. The control commands are converted into permission identifiers using a pre-configured action mapping table; The permission identifier and the network data packet are concatenated to obtain a complete data structure. The complete data structure is then validated to obtain a validation field. This validation field is added to the complete data structure to obtain an access permission signal.
7. The network security server control method with integrated firewall function according to claim 1, characterized in that, The process of generating associated events based on the access permission signal and the network data packets, and calculating the dependency strength of the associated events to obtain the rule dependency strength, includes: Extract the permission identifier from the access permission signal, and merge the permission identifier and the network data packet to obtain the associated event; The number of times each pair of rules co-occurs in the associated events is counted, and the number of times co-occurs is divided by the total number of times each rule of the associated events is triggered to obtain the co-occurrence frequency. The co-occurrence frequency is then multiplied by a preset scaling factor to obtain the rule dependency strength.
8. The network security server control method with integrated firewall function according to claim 1, characterized in that, Based on the rule dependency strength, an optimization candidate set is selected, and the adjustment requirements of the optimization candidate set are evaluated to obtain the adjustment requirements. After determining the adjustment requirements, the optimization candidate set is restructured to obtain an optimized index version, including: Rule pairs whose rule dependency strength is greater than a preset strong dependency threshold are included in the optimization candidate set; According to the storage layout of the preset index structure, the number of nodes accessed during the query process of each rule in the optimization candidate set and the number of nodes repeatedly accessed between different rules are simulated layer by layer. The repeated access ratio is calculated. When the repeated access ratio exceeds the preset repeated threshold, it is determined that there is an adjustment requirement. Once the adjustment requirement is determined, a set of rules is selected from the optimization candidate set, the common ancestor node of the set of rules is determined in the storage layout of the preset index structure, an intermediate node is created under the common ancestor node, and the child nodes of the set of rules are migrated to the intermediate node to obtain the optimized index version.
9. The network security server control method with integrated firewall function according to claim 1, characterized in that, The step involves generating a hash mapping based on the optimized index version to obtain a key-value pair sequence, and then generating decision instructions from the key-value pair sequence to obtain precise control instructions, including: Based on the preset hash function and the optimized index version, the network data packets are calibrated to obtain a hash mapping set; The payload content is extracted from the network data packets. Based on the hash mapping set, the payload content is hashed to obtain a hash value and a target port value. The hash value is used as the key and the target port value is used as the value. The key and the value are combined to obtain a key-value pair sequence. Based on a preset decision tree, node matching is performed on the key-value pair sequence to determine the successfully matched nodes. Decision paths are then extracted back from the successfully matched nodes to obtain a path dataset. Based on the network data packets and the rule identifiers in the path dataset, the data is filled into a preset instruction template library to generate precise control instructions.
10. A network security server control system with integrated firewall functionality, characterized in that, include: The data acquisition module is used to acquire network data packets; The feature processing module is used to perform feature extraction processing based on the network data packets to obtain a network feature set; The rule filtering module is used to perform rule filtering processing based on the network feature set to obtain a preliminary matching rule subset; The refinement module is used to perform applicability analysis processing based on the initially matched rule subset to obtain a rule applicability matrix, and to perform rule refinement processing on the rule applicability matrix to obtain a refined rule group. The matching score module is used to perform hash mapping construction processing based on the refined rule group to obtain a hash table, and to perform matching scoring processing on the hash table to obtain the matching score of each rule. The access permission module is used to perform permission identifier conversion processing based on the matching degree score to obtain a permission identifier, and to perform data verification processing on the permission identifier to obtain an access permission signal. The dependency strength module is used to generate associated events based on the access permission signal and the network data packets, and to calculate the dependency strength of the associated events to obtain the rule dependency strength. The indexing module is used to filter out an optimization candidate set based on the rule dependency strength, evaluate the adjustment requirements of the optimization candidate set to obtain the adjustment requirements, and after determining the adjustment requirements, reconstruct the index of the optimization candidate set to obtain the optimized index version. The control instruction module is used to generate a hash mapping based on the optimized index version to obtain a key-value pair sequence, and to generate decision instructions based on the key-value pair sequence to obtain precise control instructions.