Gateway port on-off control method, device and medium

CN121664549BActive Publication Date: 2026-06-19LINGBO TECH (BEIJING) CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: LINGBO TECH (BEIJING) CO LTD
Filing Date: 2025-12-19
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, the control methods for gateway ports cannot balance business flexibility and security. Static policies can easily lead to overly lenient or overly strict security policies, which cannot adapt to dynamic network threats.

Method used

A pre-trained long short-term memory network model is used for time series analysis. Combined with a deep reinforcement learning agent, port exposure rules are dynamically generated based on historical access logs and threat intelligence data. The on/off control commands are determined according to real-time traffic feature vectors to achieve dynamic and precise port management.

Benefits of technology

It improves the time accuracy of port exposure, reduces the security risks of statically exposed ports, can proactively respond to sudden threats, and enhances the overall security protection level of the network.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121664549B_ABST

Patent Text Reader

Abstract

This application provides a method, device, and medium for controlling the accessibility of a gateway port, including: acquiring historical access log data and threat intelligence data of a target port in the gateway; performing time-series analysis based on the historical access log data and threat intelligence data using a pre-trained long short-term memory network model to obtain a predicted secure access period for the target port within a preset future time period; generating preset rules for temporary port exposure of the target port based on the predicted secure access period; determining the traffic feature vector of real-time traffic data packets of the target port; using a preset deep reinforcement learning agent, determining accessibility control instructions for the target port based on the traffic feature vector and the preset rules for temporary port exposure; and executing the accessibility control instructions to modify the access control list status of the target port by the gateway firewall. This application can achieve dynamic and fine-grained gateway port accessibility control, thereby balancing business flexibility and security.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of gateway port technology, specifically to a method, device, and medium for controlling the on / off state of a gateway port. Background Technology

[0002] As the essential gateway for interaction between internal and external networks, the openness of gateway ports directly affects the security of core systems. Therefore, minimizing the visibility of ports to the public network while ensuring normal access for compliant services—that is, achieving "stealth" protection for ports—has become a key requirement in network perimeter defense systems.

[0003] In related technologies, gateway port access is mainly controlled using static access control lists (ACLs) or fixed-period policy management mechanisms. However, these two control methods are always fixed, which can easily lead to port security policies that are either too lenient or too strict, failing to balance business flexibility and security. Summary of the Invention

[0004] The embodiments of this application provide a method, device, and medium for controlling the on / off state of a gateway port, aiming to achieve dynamic and fine-grained control of the on / off state of the gateway port, thereby balancing business flexibility and security.

[0005] In a first aspect, embodiments of this application provide a method for controlling the on / off state of a gateway port, the method being applied to a gateway and comprising:

[0006] Obtain historical access log data and threat intelligence data of the target port in the gateway;

[0007] Based on the historical access log data and the threat intelligence data, a pre-trained long short-term memory network model is used to perform time series analysis to obtain the predicted safe access period of the target port in the future preset time period.

[0008] Based on the predicted secure access period, generate a preset rule for exposing a temporary port to the target port;

[0009] Determine the traffic feature vector of the real-time traffic data packets of the target port;

[0010] Using a pre-defined deep reinforcement learning agent, based on the traffic feature vector and the pre-defined rules for exposing temporary ports, the on / off control command for the target port is determined;

[0011] Execute the on / off control command to modify the access control list status of the gateway firewall for the target port.

[0012] In the above embodiments, by fusing the forward-looking security time-period prediction of a Long Short-Term Memory (LSTM) network with the real-time adaptive decision-making of a deep reinforcement learning agent, dynamic, precise, and automated management of gateway port access control is achieved. This solution not only improves the time accuracy of port exposure and reduces the security risks associated with static openness, but also proactively responds to sudden threats based on real-time traffic characteristics, thereby enhancing the overall security level of the network while ensuring necessary business connectivity.

[0013] In one embodiment, the step of performing time-series analysis using a pre-trained Long Short-Term Memory (LSTM) network model based on the historical access log data and the threat intelligence data to obtain the predicted secure access period for the target port within a preset future time period includes:

[0014] Based on the historical access log data, a historical access frequency sequence is determined with a preset time granularity.

[0015] Based on the malicious IP information recorded in the threat intelligence data, determine the risk coefficient value for each preset time granularity in the historical access frequency sequence;

[0016] Based on the risk coefficient value, the historical access frequency sequence is corrected to obtain the safe access frequency sequence;

[0017] Based on the secure access frequency sequence, a pre-trained long short-term memory network model is used for time series analysis to obtain the predicted secure access period of the target port within a preset time period in the future.

[0018] In the above embodiments, by combining historical access frequency with real-time threat intelligence, the original data is corrected for risk perception, and then a deep learning model is used for prediction. This not only improves the accuracy of port security status prediction, but also realizes the transformation from passive response to proactive prediction and defense, making the opening of network resources more intelligent and secure.

[0019] In one embodiment, the step of performing time-series analysis using a pre-trained long short-term memory network model based on the secure access frequency sequence to obtain the predicted secure access period for the target port within a preset future time period includes:

[0020] Determine the distribution of TCP control flag bits in the historical access log data within each preset time granularity;

[0021] Based on the TCP control flag distribution data, a sequence of historical TCP flag entropy values is generated with the preset time granularity as the unit.

[0022] Based on the historical TCP flag entropy sequence and the secure access frequency sequence, a pre-trained long short-term memory network model is used for time series analysis to obtain the predicted secure access period of the target port within a preset time period in the future.

[0023] In the above embodiments, by introducing the TCP flag bit entropy value sequence as a new dimension of time series analysis, the protocol layer behavior characteristics of network traffic are quantified into information entropy and fused with the risk-corrected access frequency sequence for analysis, thereby more accurately predicting truly low-risk safe access periods and improving the accuracy and security of dynamic port control strategies.

[0024] In one embodiment, generating a sequence of historical TCP flag entropy values with the preset time granularity based on the TCP control flag distribution data includes:

[0025] Determine the set of flags for the TCP protocol, wherein the set of flags includes at least one of the following: synchronization flag, acknowledgment flag, end flag, and reset flag;

[0026] For each preset time granularity, determine the probability of occurrence of each flag bit in the flag bit set within the current preset time granularity;

[0027] Based on the occurrence probability, determine the TCP flag entropy value at the preset time granularity;

[0028] Based on the TCP flag entropy values of multiple preset time granularities, a sequence of historical TCP flag entropy values is generated, with each preset time granularity as the unit.

[0029] In the above embodiments, by specifically defining the set of TCP flag bits and accurately calculating the information entropy of each time granularity based on their occurrence probability, a historical time series reflecting the complexity of TCP traffic behavior is finally constructed. This provides the prediction model with key inputs that reveal the differences in micro-behavioral patterns between normal business and potential attacks, thereby enhancing the depth and accuracy of security period prediction.

[0030] In one embodiment, the step of performing time-series analysis using a pre-trained Long Short-Term Memory (LSTM) network model based on the historical TCP flag entropy sequence and the secure access frequency sequence to obtain the predicted secure access period for the target port within a preset future time period includes:

[0031] The historical TCP flag entropy value sequence is used as the first dimension feature, and the secure access frequency sequence is used as the second dimension feature.

[0032] A multi-dimensional time-series feature matrix is generated based on the first-dimensional feature and the second-dimensional feature.

[0033] Based on the multidimensional temporal feature matrix, a pre-trained long short-term memory network model is used to perform temporal analysis to obtain the predicted secure access period of the target port within a preset time period in the future.

[0034] In the above embodiments, by constructing a multi-dimensional time series feature matrix by combining the entropy sequence reflecting the complexity of traffic behavior with the frequency sequence reflecting the intensity of access, and by using an advanced long short-term memory network model for joint time series analysis, the prediction model can more comprehensively understand access behavior patterns, thereby making more reliable and precise predictions for future safe periods, and providing a more solid decision-making basis for dynamic port control.

[0035] In one embodiment, generating a preset rule for temporary port exposure of the target port based on the predicted secure access period includes:

[0036] From the historical access log data, identify the set of source IP addresses whose historical reputation score is greater than a preset reputation threshold during the predicted secure access period;

[0037] Based on the set of source IP addresses, the start and end times of the predicted secure access period, the preset rules for exposing temporary ports are generated.

[0038] In the above embodiments, by combining the predicted security period with the historical reputation assessment of the source IP address to generate port exposure rules, dual access control based on time and identity is achieved. This enhances flexibility by leveraging dynamic openness while strengthening the security of port access through fine-grained authorization.

[0039] In one embodiment, the step of using a preset deep reinforcement learning agent to determine the on / off control command for the target port based on the traffic feature vector and the preset rules for exposing the temporary port includes:

[0040] Based on the traffic feature vector, the temporary port exposure preset rules, and the on / off status of the target port at the current moment, a reinforcement learning state vector is generated;

[0041] Define an action space for reinforcement learning, which includes multiple candidate on / off operations for the target port;

[0042] The state vector is input into the policy network of the deep reinforcement learning agent to obtain the action probability distribution for multiple candidate on / off operations in the action space;

[0043] Based on the action probability distribution, the target on / off operation is determined from a plurality of candidate on / off operations;

[0044] Based on the target on / off operation, the on / off control command is generated.

[0045] In the above embodiments, by constructing a state vector that integrates real-time traffic, preset rules, and port status, and using a policy network of a deep reinforcement learning agent for decision-making, not only is the real-time performance and accuracy of the response improved, but the ability to explore and adapt to unknown threats is also enhanced through probabilistic decision-making.

[0046] In one embodiment, determining the traffic feature vector of the real-time traffic data packets of the target port includes:

[0047] Obtain the header information of the real-time traffic data packet;

[0048] Determine the flow statistics characteristics within a preset sliding window prior to the current time.

[0049] Based on the header information and the traffic statistics features, the traffic feature vector is generated.

[0050] In the above embodiments, by combining the detailed header information of real-time data packets with macro-level traffic statistics within a preset sliding window, a multi-level, multi-granularity traffic feature vector is constructed, providing rich, accurate, and timely environmental state inputs for subsequent intelligent decision-making, thereby improving the accuracy and context awareness of port on / off control decisions.

[0051] Secondly, embodiments of this application provide an electronic device including a processor and a memory, wherein the memory stores a computer program configured to be executed by the processor to implement the gateway port on / off control method as described in any of the preceding claims.

[0052] Thirdly, embodiments of this application provide a computer-readable storage medium storing a computer program configured to be executed by a processor to implement the gateway port on / off control method as described in any of the preceding claims.

[0053] The beneficial effects of the embodiments of this application are as follows:

[0054] In the embodiments of this application, based on the historical access log data and threat intelligence data of the target port in the gateway, a pre-trained long short-term memory network model is used to perform time series analysis to obtain the predicted safe access period of the target port in the future preset time period. Then, a preset rule for temporary port exposure of the target port is generated. Combined with the traffic feature vector of the real-time traffic data packet of the target port, a preset deep reinforcement learning agent is used to determine the on / off control command for the target port. This can realize dynamic and fine-grained gateway port on / off control, thereby taking into account both business flexibility and security. Attached Figure Description

[0055] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0056] Figure 1 This is a schematic flowchart of an embodiment of the gateway port on / off control method provided in this application. Detailed Implementation

[0057] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application. In addition, in the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0058] Firstly, embodiments of this application provide a method for controlling the on / off state of a gateway port, wherein the executing entity is a gateway port on / off control system (hereinafter referred to as the "system"). Specifically, refer to... Figure 1 The gateway port on / off control method is applied to the gateway and includes the following steps:

[0059] S101. Obtain historical access log data and threat intelligence data for the target port in the gateway.

[0060] In the embodiments of this application, the gateway includes, but is not limited to, firewalls, routers, switches, or servers with traffic forwarding capabilities. The target port refers to the network port on the gateway that requires dynamic on / off control, such as ports 80 and 443 of the Transmission Control Protocol (TCP), or specific application ports of the User Datagram Protocol (UDP). Historical access log data refers to the raw data set recorded by the gateway regarding access behavior on the target port over past time periods. Threat intelligence data refers to the external or internal data set regarding known network threats, malicious Internet Protocol (IP) addresses, attack signatures, or vulnerability information.

[0061] In some embodiments of this application, obtaining historical access log data includes extracting it from the gateway's local storage system, a centralized log management server, or a security information and event management (SIEM) system. The extraction process can be filtered and aggregated according to time range (e.g., the past 30 days), port number, and specific log types (e.g., connection establishment, connection termination, traffic rejection) to ensure the relevance and completeness of the data.

[0062] In some embodiments of this application, obtaining threat intelligence data includes periodically pulling the latest lists of malicious IP addresses, domain name blacklists, or indicators of compromise (IoC) from commercial threat intelligence subscription services, open-source threat intelligence sharing platforms, or internal threat detection systems. The obtained data is matched with the log time range of the target port to ensure the timeliness of the intelligence.

[0063] S102. Based on historical access log data and threat intelligence data, a pre-trained Long Short-Term Memory (LSTM) network model is used to perform time series analysis to obtain the predicted secure access period of the target port in the future within a preset time period.

[0064] In the embodiments of this application, the pre-trained Long Short-Term Memory (LSTM) network model is a Recurrent Neural Network (RNN) pre-trained using historical time-series data. Time-series analysis refers to analyzing data sequences arranged chronologically to discover their inherent patterns. Predicting secure access periods refers to the continuous time intervals inferred by the model that are less likely to be maliciously accessed by a target port and are suitable for external access within a specified future time period. The pre-trained LSM network model combines the time-series prediction capabilities of deep learning with risk perception in the field of cybersecurity, enabling the extraction of complex and dynamic secure access patterns from historical data and achieving early prediction of future risks.

[0065] S103. Based on the predicted secure access period, generate a preset rule for exposing a temporary port to the target port.

[0066] In the embodiments of this application, the temporary port exposure preset rule is a set of instructions or policy templates used to guide how to manage access permissions to a target port during a specific time period. Generating the temporary port exposure preset rule is to transform the predicted secure access period into an executable policy form.

[0067] In some embodiments of this application, generating a preset rule for temporary port exposure specifically involves creating a policy entry that explicitly specifies the allowed access actions for the target port during the predicted secure access period, such as "allowing all IP addresses to access this port" or "allowing access from a specific source IP address range." This policy entry is also associated with precise start and end times, which are directly derived from the predicted secure access period.

[0068] S104. Determine the traffic feature vector of the real-time traffic data packets of the target port.

[0069] In the embodiments of this application, real-time traffic data packets refer to network data packets flowing through the target port in the current or most recent short period of time. A traffic feature vector is a vector formed by numerically combining the multi-dimensional feature information of a data packet, used to characterize the behavioral pattern of that data packet or a group of data packets.

[0070] In some embodiments of this application, determining the traffic feature vector includes sampling and parsing real-time data packets arriving at the target port. Extracted features include, but are not limited to: packet size, packet arrival time interval, transport layer protocol type, payload length, TCP flag combinations (such as the synchronization flag (SYN), acknowledgment flag (ACK), finish flag (FIN), and reset flag (RST), the geographic entropy value of the source IP address, and the request rate anomaly compared to a historical baseline. These feature values are normalized and arranged in a fixed order to form a fixed-length numerical vector, i.e., the traffic feature vector.

[0071] S105. Using a pre-defined deep reinforcement learning (DRL) agent, based on traffic feature vectors and pre-defined rules for exposing temporary ports, determine the on / off control commands for the target port.

[0072] In the embodiments of this application, the preset deep reinforcement learning agent is a trained agent model capable of making autonomous decisions based on environmental state input. The on / off control command is a specific operation command that instructs the gateway to perform an "open" (allow traffic to pass) or "close" (block traffic) operation on the target port.

[0073] S106. Execute the on / off control command to modify the access control list (ACL) status of the gateway firewall for the target port.

[0074] In the embodiments of this application, the gateway firewall is a network security system deployed on the gateway. An access control list is an ordered set of rule entries in the firewall used to control the entry and exit of network traffic. Modifying its state refers to adding, deleting, or modifying specific rule entries targeting a target port, thereby actually changing the on / off state of the target port.

[0075] As can be seen, the embodiments of this application achieve dynamic, precise, and automated management of gateway port access control by integrating the forward-looking security time prediction of long short-term memory networks with the real-time adaptive decision-making of deep reinforcement learning agents. This solution not only improves the time accuracy of port exposure and reduces the security risks caused by static opening, but also proactively responds to sudden threats based on real-time traffic characteristics, thereby enhancing the overall security protection level of the network while ensuring necessary business connectivity.

[0076] In some embodiments of this application, based on historical access log data and threat intelligence data, a pre-trained long short-term memory network model is used to perform time series analysis to obtain the predicted secure access period for the target port within a preset future time period, including the following steps:

[0077] S201. Based on historical access log data, determine the historical access frequency sequence with a preset time granularity.

[0078] In the embodiments of this application, the preset time granularity is a pre-defined unit of time length used to divide a continuous time axis into multiple equal-length time periods. Through the preset time granularity, historical access log data is divided into a series of time periods, and the number of accesses within each time period is counted, thereby forming a historical access frequency sequence.

[0079] In some embodiments of this application, the preset time granularity can be 1 minute, 5 minutes, 15 minutes, 1 hour, or 1 day. For example, when the preset time granularity is set to 1 hour, each data point in the historical access frequency sequence represents the total number of accesses to the target port in each hour in history.

[0080] S202. Based on the malicious IP information recorded in the threat intelligence data, determine the risk coefficient value for each preset time granularity in the historical access frequency sequence.

[0081] In embodiments of this application, the threat intelligence data includes information on one or more IP addresses that have been marked as malicious. The risk coefficient value is a numerical value used to quantify the potential risk of access behavior within each preset time granularity.

[0082] In some embodiments of this application, determining the risk coefficient value for each preset time granularity includes: identifying all source IP addresses accessing the target port within that preset time granularity; comparing these source IP addresses with malicious IP information in threat intelligence data; and calculating the risk coefficient value based on the number of malicious IP addresses discovered within that time period and their threat levels. For example, the risk coefficient value can be set to: 0 (no malicious IP), 0.5 (one low-threat-level malicious IP discovered), or 1.0 (one high-threat-level malicious IP or multiple malicious IPs discovered). A more complex calculation method could be to use the ratio of the number of malicious IPs to the total number of accessing IPs during that time period as the risk coefficient value.

[0083] S203. Based on the risk coefficient value, correct the historical access frequency sequence to obtain the safe access frequency sequence.

[0084] In the embodiments of this application, the correction involves adjusting each frequency value in the historical access frequency sequence according to the risk coefficient value of its corresponding time period, thereby generating a new sequence, namely the safe access frequency sequence. This sequence reflects a relatively "safe" access frequency pattern after eliminating or reducing the impact of high-risk access.

[0085] In some embodiments of this application, the correction operation is a multiplicative correction. For example, secure access frequency = original historical access frequency × (1 - risk coefficient value). When the risk coefficient value is 0, the frequency value remains unchanged; when the risk coefficient value is 1, the frequency value is corrected to 0, meaning that access during that period is completely considered insecure and excluded.

[0086] In some embodiments of this application, the correction operation is conditional filtering. For example, a threshold (such as 0.7) is set for the risk coefficient value. When the risk coefficient value of a certain period exceeds the threshold, the historical access frequency of that period is directly set to zero or replaced with an extremely low background noise value; periods that do not exceed the threshold retain their original values. This is equivalent to a risk-based threshold filtering.

[0087] S204. Based on the secure access frequency sequence, a pre-trained long short-term memory network model is used to perform time series analysis to obtain the predicted secure access period of the target port in the future preset time period.

[0088] In some embodiments of this application, time-series analysis using a pre-trained Long Short-Term Memory (LSTM) network model includes: inputting a secure access frequency sequence as input to the pre-trained LSM network model. Based on learned sequence patterns, the model outputs predicted secure access frequencies for each preset time granularity within a future period. Then, based on the predicted secure access frequency values and a secure access frequency threshold, a predicted secure access period is determined. For example, periods with predicted frequencies higher than the threshold are identified as secure access periods.

[0089] As can be seen, the embodiments of this application combine historical access frequency with real-time threat intelligence to perform risk perception correction on the original data, and then use deep learning models for prediction. This not only improves the accuracy of port security status prediction, but also realizes the transformation from passive response to proactive prediction and defense, making the opening of network resources more intelligent and secure.

[0090] In some embodiments of this application, based on the secure access frequency sequence, a pre-trained long short-term memory network model is used for time series analysis to obtain the predicted secure access period of the target port within a preset future time period, including:

[0091] S301. Determine the distribution of TCP control flag bits in historical access log data within each preset time granularity.

[0092] In the embodiments of this application, TCP control flags are specific bits in the TCP header used to control connection status and message type, mainly including flags such as Synchronize (SYN), Acknowledge (ACK), FIN, Reset (RST), Push (PSH), and Urgent (URG). TCP control flag distribution data refers to the frequency or proportion statistics of different TCP control flag combinations appearing in all TCP packets accessing the target port within a given preset time granularity.

[0093] In some embodiments of this application, determining the TCP control flag distribution data includes: filtering log entries from historical access log data that match the TCP protocol type and the target port. For each preset time granularity, the flag field of all relevant TCP packets within that time period is parsed, and the frequency of each flag combination is counted. For example, the number of packets with only the SYN flag set, the number of packets with both the SYN and ACK flags set, and the number of packets with only the RST flag set are counted. Finally, a distribution vector or dictionary reflecting the frequency of TCP flag occurrences is generated for each preset time granularity.

[0094] S302. Based on the TCP control flag distribution data, generate a sequence of historical TCP flag entropy values with a preset time granularity.

[0095] In the embodiments of this application, entropy is an indicator used in information theory to measure the uncertainty of random variables. The historical TCP flag entropy value sequence is a time series, where each data point corresponds to a preset time granularity. Its value is the information entropy calculated based on the distribution data of TCP control flags within that time period, used to quantify the disorder or diversity of TCP flag combination patterns within that time period.

[0096] In some embodiments of this application, the step of generating a historical TCP flag bit entropy value sequence includes: for each preset time granularity, treating the occurrence frequency of various TCP flag bit combinations as a probability distribution. The information entropy of this probability distribution is calculated. Specifically, assuming there are N different flag bit combinations within the time granularity, and the probability of the i-th combination appearing is p_i, then the information entropy H = -Σ(p_i × log2(p_i)), where the summation ranges from 1 to N. The calculated entropy value H constitutes the value of the historical TCP flag bit entropy value sequence at that time point.

[0097] In some embodiments of this application, to enhance the indicative power of entropy values for anomalous behavior, the TCP control flag distribution data is weighted before calculating the entropy value. For example, higher weights are given to the RST flag combination and the SYN flag combination only, as they are likely to occur more frequently in anomalous behavior. The weighted distribution is then used to calculate the weighted entropy, making the change in entropy value more significant as scanning or attack activity increases.

[0098] S303. Based on the historical TCP flag entropy sequence and the secure access frequency sequence, a pre-trained long short-term memory network model is used to perform time series analysis to obtain the predicted secure access period of the target port in the future within a preset time period.

[0099] In the embodiments of this application, the pre-trained Long Short-Term Memory (LSTM) network model is constructed as a multivariate time series prediction model. Its input consists of two parallel, time-aligned sequences: a historical TCP flag entropy value sequence and a secure access frequency sequence. The output is a prediction of the security status of the target port within a preset future time period.

[0100] In some embodiments of this application, the pre-trained Long Short-Term Memory (LSTM) network model employs an attention mechanism. When processing the input dual sequences, the model can dynamically assign different attention weights to the entropy sequence points and frequency sequence points at different historical moments, thereby more accurately capturing the historical patterns most critical to predicting future safe periods. For example, the model may pay more attention to historical periods where the entropy value suddenly increases but the frequency does not change significantly, which may correspond to a covert probing activity.

[0101] As can be seen, this application's embodiments introduce the TCP flag bit entropy value sequence as a new dimension of time series analysis, quantify the protocol layer behavior characteristics of network traffic into information entropy, and fuse it with the risk-corrected access frequency sequence for analysis, thereby more accurately predicting truly low-risk secure access periods and improving the accuracy and security of dynamic port control strategies.

[0102] In some embodiments of this application, a sequence of historical TCP flag entropy values with a preset time granularity is generated based on TCP control flag distribution data, including:

[0103] S401. Determine the set of flags for the TCP protocol. The set of flags includes at least one of the following: synchronization flag, acknowledgment flag, end flag, and reset flag.

[0104] In the embodiments of this application, the set of flags in the TCP protocol refers to one or more specific bits selected from the control bit field of the TCP header for entropy calculation. The synchronization flag (SYN), acknowledgment flag (ACK), finish flag (FIN), and reset flag (RST) are the core flags in the TCP protocol used to control the connection state.

[0105] S402. For each preset time granularity, determine the probability of each flag bit in the flag bit set appearing in the current preset time granularity.

[0106] In the embodiments of this application, the probability of occurrence refers to the frequency or proportion of a specific TCP flag bit being set (i.e., having a value of 1) in all TCP packets accessing the target port within the preset time granularity. This step requires calculating the probability of activation for each flag bit independently based on the set of flag bits.

[0107] In some embodiments of this application, determining the probability of a flag bit (such as SYN) occurring includes: counting the total number of all TCP packets (N_total) within the current preset time granularity; counting the number of packets (N_syn) in which the SYN flag bit is set to 1; and calculating the probability P_syn = N_syn / N_total. This process is repeated for each flag bit in the flag bit set (such as ACK, FIN, RST) to obtain a set of independent probability values.

[0108] S403. Based on the probability of occurrence, determine the TCP flag entropy value with a preset time granularity.

[0109] In the embodiments of this application, the TCP flag entropy value is a scalar value that is calculated based on the probability of occurrence and is used to quantify the uncertainty or information content of the TCP flag activity pattern within the preset time granularity.

[0110] In some embodiments of this application, the step of determining the TCP flag entropy value can be achieved by calculating the entropy value for the probability distribution of each flag bit separately and then summing or averaging them. For example, the entropy value H_i for each flag bit i is calculated as H_i = -[p_i × log2(p_i) + (1-p_i) × log2(1-p_i)], where p_i is the probability of that flag bit being located. Then, the TCP flag entropy value at the preset time granularity is defined as the arithmetic mean of all H_i values.

[0111] S404. Generate a sequence of historical TCP flag entropy values in units of preset time granularity based on multiple preset time granularity TCP flag entropy values.

[0112] In the embodiments of this application, generating a historical TCP flag entropy value sequence means arranging the TCP flag entropy values calculated at each preset time granularity in chronological order to form an ordered numerical sequence.

[0113] In some embodiments of this application, the step of generating a sequence of historical TCP flag entropy values specifically includes: following a preset time granularity (e.g., every 5 minutes), starting from the earliest time point, sequentially taking the entropy value corresponding to each time granularity as a data point in the sequence. This ultimately forms a sequence with a length equal to the preset number of time granularities contained in the analyzed historical time period.

[0114] As can be seen, this application embodiment defines a specific set of TCP flag bits and accurately calculates the information entropy of each time granularity based on their occurrence probability, ultimately constructing a historical time series that reflects the complexity of TCP traffic behavior. This provides the prediction model with key inputs that reveal the differences in micro-behavioral patterns between normal business and potential attacks, thereby enhancing the depth and accuracy of security period prediction.

[0115] In some embodiments of this application, based on historical TCP flag entropy value sequences and secure access frequency sequences, a pre-trained long short-term memory network model is used for time series analysis to obtain the predicted secure access period for the target port within a preset future time period, including:

[0116] S501, take the historical TCP flag entropy value sequence as the first dimension feature and the secure access frequency sequence as the second dimension feature.

[0117] In the embodiments of this application, the first dimension feature and the second dimension feature are two sets of time-series data used to characterize different aspects of port access behavior. The historical TCP flag entropy value sequence serves as the first dimension feature, with each data point reflecting the complexity or pattern diversity of TCP traffic behavior within a corresponding preset time granularity. The secure access frequency sequence serves as the second dimension feature, with each data point reflecting the number of relatively secure access requests after risk correction within a corresponding preset time granularity.

[0118] In some embodiments of this application, a preprocessing step of time alignment is performed on the first-dimensional feature and the second-dimensional feature before this step. Time alignment means ensuring that two sequences have the same preset time granularity, the same start time point, and the same number of data points. If the original sequence lengths are inconsistent, they are aligned using interpolation or truncation methods. Subsequently, the two aligned sequences are labeled as the first-dimensional feature and the second-dimensional feature, respectively.

[0119] S502. Generate a multi-dimensional time series feature matrix based on the first-dimensional features and the second-dimensional features.

[0120] In the embodiments of this application, the multidimensional time-series feature matrix is a two-dimensional data structure. The rows of this matrix typically represent consecutive time steps (i.e., a preset time granularity arranged in chronological order), and its columns represent different feature dimensions. Each matrix element represents a value at a specific time step and a specific feature dimension.

[0121] S503. Based on the multi-dimensional temporal feature matrix, a pre-trained long short-term memory network model is used to perform temporal analysis to obtain the predicted secure access period of the target port within a preset time period in the future.

[0122] In embodiments of this application, a pre-trained Long Short-Term Memory (LSTM) network model is configured to receive a multi-dimensional temporal feature matrix as input and output a prediction of future time series. This prediction is used to identify secure access periods.

[0123] In some embodiments of this application, temporal analysis includes inputting the generated multidimensional temporal feature matrix into a pre-trained long short-term memory network model. The model processes each row of the matrix (i.e., the multidimensional feature vector at each time point) sequentially by time step, learning dependencies across time steps through its internal gating mechanism. The model's output layer is configured for either regression or classification tasks. For regression tasks, the model directly outputs a sequence of security risk scores for each time step within a preset future time period; consecutive time periods with scores below a predetermined threshold are identified as predicted secure access periods. For classification tasks, the model outputs the probability that each future time step belongs to the "secure" or "unsecure" category, and consecutive "secure" time periods with probabilities above a threshold are identified as predicted secure access periods.

[0124] As can be seen, the embodiments of this application construct a multi-dimensional time series feature matrix by combining the entropy sequence reflecting the complexity of traffic behavior with the frequency sequence reflecting the intensity of access, and use an advanced long short-term memory network model for joint time series analysis. This enables the prediction model to more comprehensively understand access behavior patterns, thereby making more reliable and precise predictions for future safe periods and providing a more solid decision-making basis for dynamic port control.

[0125] In some embodiments of this application, a preset rule for temporary port exposure of the target port is generated based on the predicted secure access period, including:

[0126] S601. From the historical access log data, identify the set of source IP addresses whose historical reputation score is greater than the preset reputation score threshold during the predicted secure access period.

[0127] In the embodiments of this application, historical reputation is a score or rating for a specific source IP address, used to quantify the reliability or good behavior of that IP address when accessing the target port historically. The preset reputation threshold is a pre-defined numerical limit used to distinguish between trusted IP addresses and untrusted or suspicious IP addresses. The source IP address set is a collection of IP addresses that meet the reputation criteria.

[0128] In some embodiments of this application, determining the source IP address set includes the following sub-steps: First, filtering out all log entries whose timestamps fall within the predicted secure access period from historical access log data. Second, calculating the historical reputation score of each appearing source IP address based on the filtered logs. This calculation can be done by statistically analyzing the IP address's access behavior to the target port across all historical time periods (not just the predicted period), combining factors such as whether it appears on a threat intelligence blacklist, the regularity of its access behavior (e.g., whether it follows regular business hours), and whether it has triggered security alerts, to obtain a reputation score using a weighted scoring model. Finally, comparing this reputation score with a preset reputation threshold, and including all source IP addresses with scores greater than the threshold in the set.

[0129] In some embodiments of this application, determining the set of source IP addresses whose historical reputation score is greater than a preset reputation score threshold during the predicted secure access period from historical access log data includes the following specific steps:

[0130] Step 1: Construct a heterogeneous graph of access sessions within the predicted time period.

[0131] Extract all historical access logs that occurred within the same time window as the currently predicted safe access period (e.g., 9 AM to 11 AM on each weekday for the past four weeks). Based on these logs, construct an undirected heterogeneous graph G=(V, E, R).

[0132] Node (V): Contains two types of nodes. The first type is the source IP address node. The second type is the target resource node, which specifically refers to the target port, but to enrich the graph structure, it can be expanded to include internal server labels associated with the target port (such as the business application or department).

[0133] Edges (E) and Relationships (R): Edges represent historical access relationships. The main relationship types are: 1) IP-Port Access Edge: Connects a source IP node to a target port node it has visited. Edge attributes may include the number of visits and the total number of bytes. 2) IP-IP Similarity Edge: If two source IP addresses exhibit highly similar behavior patterns over multiple historical periods (e.g., highly overlapping sets of accessed ports, similar access time distribution), an edge is established between them. The edge weight represents the similarity. 3) Port-Port Association Edge: If two ports are frequently accessed by the same group of IP addresses (business association), a connection is established.

[0134] Step 2: Node feature initialization.

[0135] Initialize a feature vector for each source IP node. This vector includes not only its long-term historical statistical characteristics (such as total historical access count and the number of times it has been marked as malicious), but more importantly, its historical behavioral snapshot characteristics for the current prediction period, such as access frequency, TCP flag entropy value, and burstiness of access in the historical records for that period. Characteristics of the target port / resource node can include its business importance level and regular access volume.

[0136] Step 3: Reputation propagation and aggregation based on graph neural networks.

[0137] A multi-layer graph attention network is used to process this heterogeneous graph. In each layer, nodes receive information from their neighboring nodes through the edges they are connected to.

[0138] Message generation: For each edge, generate a message from neighboring nodes to the current node based on its relation type and weight.

[0139] Attention aggregation: The current node assigns different attention weights to messages from different neighbors. The weights depend on the characteristics of the neighboring nodes, the characteristics of the current node, and the attributes and relationship types of the edges between them. For example, an IP node might pay more attention to the reputation information of other IP nodes whose behavior is highly similar to its own.

[0140] Node update: Each node aggregates messages from all its neighbors and updates its hidden state by combining them with the features of its previous layer. After several layers of such propagation, the final hidden state vector of each source IP node contains multi-layered information about its own behavior, its local network neighbors (IPs with similar behavior, the ports it accesses), and the global graph structure.

[0141] Step 4: Dynamic reputation decoding and set generation.

[0142] The final hidden state vector of each source IP node is input into a fully connected decoder network, which outputs a scalar score, representing the dynamic reputation of that IP address within the current predicted secure access context. This score is calibrated not only based on the IP itself but also on graph structure information such as "whether other IPs with similar behavior are trustworthy" and "whether the resources it accesses are high-risk." Finally, the dynamic reputation of all IPs is compared with a preset reputation threshold to select a set of source IP addresses that meet the criteria.

[0143] As can be seen, the embodiments of this application elevate reputation assessment from an isolated model to a context-aware system based on a dynamic relationship graph, which can greatly enhance the security of the rule generation process when facing complex and coordinated attacks.

[0144] S602. Based on the source IP address set, predict the start and end times of the secure access period, and generate preset rules for exposing temporary ports.

[0145] In the embodiments of this application, generating a temporary port exposure preset rule involves combining the selected trusted source IP addresses and the predicted security time window into a specific, time-limited firewall or access control policy rule.

[0146] In some embodiments of this application, generating a preset rule for temporary port exposure specifically involves creating an Access Control List (ACL) rule. The action field of this rule is set to "Allow". The target field is set to the port number and protocol of the target port (e.g., TCP / 80). The source address field is set to a set of source IP addresses, which can be expressed as a list of IP addresses or a network segment represented in CIDR (Classless Inter-Domain Routing) format. The time condition field is set to the start and end times, indicating that this rule is only effective within that time period.

[0147] As can be seen, the embodiments of this application generate port exposure rules by combining the predicted security period with the historical reputation assessment of the source IP address, thereby achieving dual access control based on time and identity. This enhances flexibility by utilizing dynamic openness while strengthening the security of port access through fine-grained authorization.

[0148] In some embodiments of this application, a pre-defined deep reinforcement learning agent is used to determine on / off control commands for a target port based on traffic feature vectors and pre-defined rules for temporary port exposure, including:

[0149] S701. Based on the traffic feature vector, the preset rules for exposing temporary ports, and the on / off status of the target port at the current moment, a reinforcement learning state vector is generated.

[0150] In the embodiments of this application, the state vector of reinforcement learning is a numerical vector used to comprehensively describe the environmental state in which the deep reinforcement learning agent makes decisions. It integrates real-time traffic characteristics, high-level security policy constraints, and the current operating state of the port itself.

[0151] In some embodiments of this application, generating the state vector includes: First, normalizing the traffic feature vector to fix its numerical range. Second, encoding the preset rules for exposing temporary ports into numerical form. For example, it can be encoded into a three-dimensional vector, representing "whether the current time is within the time window allowed by the rule" (1 if yes, 0 otherwise), "the number of source IP addresses allowed by the rule" (after normalization), and "the time difference between the current time and the end of the rule's allowed window" (after normalization). Then, encoding the on / off status of the target port at the current moment into a scalar (e.g., 1 for open, 0 for closed). Finally, concatenating the normalized traffic feature vector, the rule encoding vector, and the port state scalar along the vector dimension to form a comprehensive state vector.

[0152] S702. Define the action space for reinforcement learning, which includes multiple candidate on / off operations for the target port.

[0153] In the embodiments of this application, the action space is the set of all possible actions that a deep reinforcement learning agent can choose to execute in a given state. Candidate on / off operations are each specific, executable port management action in the action space.

[0154] In some embodiments of this application, the action space is defined as a discrete set of actions, including: {maintain current state, switch to open, switch to closed}. Here, "maintain current state" means not changing the existing on / off state of the port; "switch to open" means opening the port if it is currently closed; and "switch to closed" means closing the port if it is currently open.

[0155] S703. Input the state vector into the policy network of the deep reinforcement learning agent to obtain the action probability distribution for multiple candidate on / off operations in the action space.

[0156] In the embodiments of this application, the policy network of the deep reinforcement learning agent is an artificial neural network whose function is to map the input state vector to a probability distribution on the action space. The action probability distribution is a probability vector, where each element corresponds to a candidate on / off operation, and its value represents the probability of selecting that operation in a given state.

[0157] In some embodiments of this application, the policy network is a multilayer perceptron (MLP). The state vector is taken as input, processed through several fully connected layers and nonlinear activation functions, and finally passed through a softmax output layer to produce a probability vector with the same size as the action space. The sum of all elements of this vector is 1.

[0158] S704. Based on the action probability distribution, determine the target on / off operation from multiple candidate on / off operations.

[0159] In the embodiments of this application, the target on / off operation is the specific operation to be executed that is ultimately selected according to a specific strategy based on the action probability distribution.

[0160] In some embodiments of this application, a greedy strategy is used to determine the target on / off operation, that is, the candidate on / off operation with the highest probability value in the action probability distribution is directly selected as the target on / off operation. This method is simple and efficient, always selecting the action that the model considers optimal in the current state.

[0161] S705: Generate on / off control commands based on the target on / off operation.

[0162] In the embodiments of this application, the on / off control command is a specific operation command that can be executed by the gateway firewall, and its content is directly derived from the target on / off operation.

[0163] In some embodiments of this application, the generation of on / off control commands is a direct mapping process. If the target on / off operation is "switch to open", a command with the content "open target port" is generated. If the target on / off operation is "switch to close", a command with the content "close target port" is generated. If the target on / off operation is "maintain current state", a "no operation" command or an empty command is generated.

[0164] As can be seen, the embodiments of this application construct a state vector that integrates real-time traffic, preset rules and port status, and make decisions using a policy network of a deep reinforcement learning agent. This not only improves the real-time performance and accuracy of the response, but also enhances the ability to explore and adapt to unknown threats through probabilistic decision-making.

[0165] In some embodiments of this application, determining the traffic feature vector of real-time traffic data packets at the target port includes:

[0166] S801, Obtain the header information of real-time traffic data packets.

[0167] In the embodiments of this application, the header information of real-time traffic data packets refers to the field content contained in the protocol header of the network data packets during the transmission process. These fields define information such as the source, destination, protocol type, service type, and control status of the data packets.

[0168] In some embodiments of this application, obtaining header information specifically includes: performing mirroring capture or deep packet inspection (DPI) on real-time data packets flowing through the target port. This involves parsing the Ethernet frame header, IP header, and transport layer protocol (such as TCP or UDP) header of the data packets. Key fields extracted include, but are not limited to: source IP address, destination IP address, source port number, destination port number, protocol type, IP Time To Live (TTL), TCP flags, and UDP length.

[0169] S802. Determine the flow statistics characteristics within the preset sliding window before the current time.

[0170] In the embodiments of this application, the preset sliding window is a continuous time period of fixed length that extends in the historical direction with the current time as the endpoint. The traffic statistics feature is a numerical indicator that reflects the overall macroscopic behavior of traffic, calculated by aggregating all data packets flowing through the target port within the preset sliding window.

[0171] In some embodiments of this application, determining traffic statistics features includes: setting the length of a preset sliding window, such as the last 10 seconds or the last 1000 data packets. The following features are statistically analyzed within this window: total number of data packets, total number of bytes, average data packet size, data packet size variance, average data packet arrival interval, variance of data packet arrival interval, number of different source IP addresses (source IP entropy), number of different destination port numbers (only for gateway scenarios with multiple destination ports), ratio of TCP SYN packets to ACK packets, number of TCP RST packets, etc.

[0172] In some embodiments of this application, the traffic statistics features also include anomaly indices calculated based on historical baselines. First, historical traffic characteristic baselines (mean and standard deviation) for the target port are established at different time periods (e.g., hourly). Then, the deviation of each statistical characteristic value (e.g., packet rate) within the current preset sliding window from its corresponding historical baseline value is calculated, for example, using a Z-score as an anomaly index. These dynamic anomaly indices themselves constitute an important set of traffic statistics features.

[0173] S803. Generate a traffic feature vector based on header information and traffic statistics features.

[0174] In the embodiments of this application, generating a traffic feature vector involves combining and encoding information extracted from the header of a single data packet with statistical features calculated from aggregated traffic to ultimately form a unified, fixed-length numerical vector.

[0175] In some embodiments of this application, the step of generating a traffic feature vector includes the following sub-steps: First, the discrete information extracted from the header of the real-time data packet (such as protocol type and TCP flag combination) is converted into a numerical vector through one-hot encoding or embedding. Second, the calculated traffic statistical features (all continuous values) are normalized. Finally, the encoded header information vector and the normalized traffic statistical feature vector are concatenated in terms of dimension to form a comprehensive traffic feature vector.

[0176] As can be seen, the embodiments of this application construct a multi-level, multi-granularity traffic feature vector by combining the detailed information in the header of real-time data packets with the macro-level traffic statistics within a preset sliding window. This provides rich, accurate, and timely environmental state inputs for subsequent intelligent decision-making, thereby improving the accuracy and context awareness of port on / off control decisions.

[0177] In some embodiments of this application, the step of generating a traffic feature vector based on header information and traffic statistical features includes the following sub-steps:

[0178] S901. Construct a local interaction topology diagram with the target port as the core.

[0179] In the embodiments of this application, the target port is mapped as the central node of the topology graph, and all source IP addresses extracted from the header information that have interacted with the target port within a preset sliding window are mapped as neighbor nodes. The communication relationship between the target port and the source IP addresses is mapped as the edges of the graph, and the traffic statistics characteristics for that specific source IP (such as the average packet size and access frequency of that IP) are assigned as the feature attributes of that edge.

[0180] S902. Process the above topological graph using a pre-trained Graph Attention Network (GAT).

[0181] In the embodiments of this application, the graph attention network includes multiple layers of graph convolutional layers and attention mechanism layers. Through the message passing mechanism, the model calculates attention coefficients based on the feature attributes of the edges, and weighted aggregates the feature information of all neighboring nodes to the central node (target port).

[0182] S903. Extract the node embedding vector corresponding to the center node from the output layer of the graph attention network and use it as the final traffic feature vector.

[0183] In the embodiments of this application, the traffic feature vector not only integrates the attributes of a single data packet, but also implicitly encodes the group distribution topology characteristics of all current access sources through a graph structure (e.g., whether it is a single-point high-frequency attack or a multi-point coordinated botnet attack).

[0184] As can be seen, the embodiments of this application construct a network interaction topology graph and introduce a graph attention network to transform traditional flat traffic statistics into graph structured feature extraction. The generated traffic feature vector deeply aggregates the collective behavior characteristics of attackers, thereby improving the identification accuracy and generalization ability against collaborative attacks such as Distributed Denial of Service (DDoS).

[0185] Secondly, embodiments of this application provide an electronic device that integrates any of the gateway port on / off control systems provided in the embodiments of this application. The electronic device includes a processor and a memory, the memory storing a computer program configured to be executed by the processor to implement the gateway port on / off control method as described in any of the above embodiments.

[0186] Thirdly, embodiments of this application provide a computer-readable storage medium, which may include: read-only memory (ROM), random access memory (RAM), a magnetic disk, or an optical disk, etc. The computer-readable storage medium stores a computer program configured to be executed by a processor to implement the gateway port on / off control method as described in any of the preceding claims.

[0187] The embodiments of this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method for controlling the on / off state of a gateway port, characterized in that, The gateway port on / off control method is applied to the gateway and includes: Obtain historical access log data and threat intelligence data of the target port in the gateway; Based on the historical access log data and the threat intelligence data, a pre-trained long short-term memory network model is used to perform time series analysis to obtain the predicted safe access period of the target port in the future preset time period. Based on the predicted secure access period, generate a preset rule for exposing a temporary port to the target port; Determine the traffic feature vector of the real-time traffic data packets of the target port; Using a pre-defined deep reinforcement learning agent, based on the traffic feature vector and the pre-defined rules for exposing temporary ports, the on / off control command for the target port is determined; Execute the on / off control command to modify the access control list status of the gateway firewall for the target port; The step of performing time-series analysis using a pre-trained Long Short-Term Memory (LSTM) network model based on the historical access log data and the threat intelligence data to obtain the predicted secure access period for the target port within a preset time period includes: determining a historical access frequency sequence with a preset time granularity based on the historical access log data; determining a risk coefficient value for each preset time granularity in the historical access frequency sequence based on malicious IP information recorded in the threat intelligence data; correcting the historical access frequency sequence based on the risk coefficient value to obtain a secure access frequency sequence; and performing time-series analysis using a pre-trained LSM network model based on the secure access frequency sequence to obtain the predicted secure access period for the target port within a preset time period.

2. The gateway port on / off control method as described in claim 1, characterized in that, The step of performing time-series analysis based on the secure access frequency sequence using a pre-trained long short-term memory network model to obtain the predicted secure access period for the target port within a preset future time period includes: Determine the distribution of TCP control flag bits in the historical access log data within each preset time granularity; Based on the TCP control flag distribution data, a sequence of historical TCP flag entropy values is generated with the preset time granularity as the unit. Based on the historical TCP flag entropy sequence and the secure access frequency sequence, a pre-trained long short-term memory network model is used for time series analysis to obtain the predicted secure access period of the target port within a preset time period in the future.

3. The gateway port on / off control method as described in claim 2, characterized in that, The step of generating a historical TCP flag entropy value sequence with the preset time granularity based on the TCP control flag distribution data includes: Determine the set of flags for the TCP protocol, wherein the set of flags includes at least one of the following: synchronization flag, acknowledgment flag, end flag, and reset flag; For each preset time granularity, determine the probability of occurrence of each flag bit in the flag bit set within the current preset time granularity; Based on the occurrence probability, determine the TCP flag entropy value at the preset time granularity; Based on the TCP flag entropy values of multiple preset time granularities, a sequence of historical TCP flag entropy values is generated, with each preset time granularity as the unit.

4. The gateway port on / off control method as described in claim 2, characterized in that, The step of performing time-series analysis using a pre-trained Long Short-Term Memory (LSTM) network model based on the historical TCP flag entropy sequence and the secure access frequency sequence to obtain the predicted secure access period for the target port within a preset future time period includes: The historical TCP flag entropy value sequence is used as the first dimension feature, and the secure access frequency sequence is used as the second dimension feature. A multi-dimensional time-series feature matrix is generated based on the first-dimensional feature and the second-dimensional feature. Based on the multidimensional temporal feature matrix, a pre-trained long short-term memory network model is used to perform temporal analysis to obtain the predicted secure access period of the target port within a preset time period in the future.

5. The gateway port on / off control method as described in claim 1, characterized in that, The step of generating a preset rule for temporary port exposure of the target port based on the predicted secure access period includes: From the historical access log data, determine the set of source IP addresses whose historical reputation score is greater than a preset reputation threshold during the predicted secure access period; Based on the set of source IP addresses, the start and end times of the predicted secure access period, the preset rules for exposing temporary ports are generated.

6. The gateway port on / off control method as described in claim 1, characterized in that, The method of using a pre-defined deep reinforcement learning agent to determine the on / off control command for the target port based on the traffic feature vector and the pre-defined rules for exposing the temporary port includes: Based on the traffic feature vector, the temporary port exposure preset rules, and the on / off status of the target port at the current moment, a reinforcement learning state vector is generated; Define an action space for reinforcement learning, which includes multiple candidate on / off operations for the target port; The state vector is input into the policy network of the deep reinforcement learning agent to obtain the action probability distribution for multiple candidate on / off operations in the action space; Based on the action probability distribution, the target on / off operation is determined from a plurality of candidate on / off operations; Based on the target on / off operation, the on / off control command is generated.

7. The gateway port on / off control method as described in claim 1, characterized in that, The determination of the traffic feature vector of the real-time traffic data packets of the target port includes: Obtain the header information of the real-time traffic data packet; Determine the flow statistics characteristics within a preset sliding window prior to the current time. Based on the header information and the traffic statistics features, the traffic feature vector is generated.

8. An electronic device, characterized in that, The electronic device includes a processor and a memory, the memory storing a computer program configured to be executed by the processor to implement the gateway port on / off control method according to any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program configured to be executed by a processor to implement the gateway port on / off control method according to any one of claims 1 to 7.