A method, apparatus, equipment and storage medium for producing threat intelligence

By extracting features from raw threat information and using neural network models to produce threat intelligence, the problem of inaccurate threat intelligence production in existing technologies has been solved, achieving efficient and accurate threat intelligence generation and application.

CN116886440BActive Publication Date: 2026-06-30HANGZHOU DBAPPSECURITY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HANGZHOU DBAPPSECURITY CO LTD
Filing Date
2023-08-28
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Current threat intelligence production methods rely on processed data, and their accuracy is affected by false alarms from security device policies. Furthermore, analysts face heavy workloads and lack the ability to produce threat intelligence from raw data.

Method used

By acquiring raw threat information, aggregating and extracting features, using a trained neural network model to determine the threat type, and combining alarm logs and policy mechanisms to produce threat intelligence, a Transformer architecture and XGBoost classifier are adopted, and a Batch Normalization layer is added.

Benefits of technology

It improves the accuracy of threat intelligence production, enabling direct application to security devices, reducing processing difficulty, and achieving a complete process from raw data to application.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116886440B_ABST
    Figure CN116886440B_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, device, and storage medium for producing threat intelligence, applied in the field of network security. The method includes: acquiring raw threat information corresponding to IPs without threat intelligence within a preset time period; aggregating the raw threat information based on the IPs to obtain aggregated raw threat information; extracting features from the aggregated raw threat information to obtain feature vectors; inputting the feature vectors into a trained neural network model; if the IPs exhibit threatening behavior, determining the threat type of the IPs based on the trained neural network model, and producing corresponding threat intelligence based on the threat type. Compared to existing technologies that produce intelligence based on processed data analysis, this method directly utilizes machine learning technology to analyze raw threat information, ensuring the accuracy of threat intelligence production. Furthermore, the threat intelligence produced by this method can be directly applied to security devices, facilitating subsequent security monitoring.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cybersecurity, and in particular to a method, apparatus, device, and storage medium for producing threat intelligence. Background Technology

[0002] In the field of cybersecurity, threat intelligence refers to information about potential attacks an organization may face and how to detect and prevent these attacks. In recent years, threat intelligence technology has developed rapidly, becoming an indispensable part of enterprise security construction. Enterprises strengthen their security capabilities by purchasing or subscribing to threat intelligence services, producing their own threat intelligence, or integrating threat intelligence products.

[0003] Currently, threat intelligence production methods typically include: (1) generating threat intelligence by analyzing malicious samples and network behavior; (2) generating threat intelligence by capturing malicious traffic through security devices; and (3) extracting threat intelligence based on a large number of alert logs. All of these methods suffer from a difficult problem: they are based on existing threat intelligence, meaning the data is manually processed and not the original data. Furthermore, the accuracy of threat intelligence production heavily depends on the accuracy of security device policies. If there are a large number of false alarms in the security device's alert logs, the accuracy of threat intelligence production will be greatly reduced, and the large number of alert logs will also increase the workload of analysts. Summary of the Invention

[0004] In view of this, the purpose of the present invention is to provide a method, apparatus, equipment and storage medium for producing threat intelligence, which solves the problem of inaccurate threat intelligence production in the prior art.

[0005] To address the aforementioned technical problems, this invention provides a threat intelligence production method, comprising:

[0006] Obtain raw threat information within a preset time period, wherein the raw threat information is the raw threat information corresponding to IPs for which there is no threat intelligence;

[0007] The original threat information is aggregated based on the IP address to obtain the aggregated original threat information;

[0008] Feature extraction is performed on the aggregated original threat information to obtain a feature vector;

[0009] The feature vector is input into a trained neural network model. If the IP exhibits threatening behavior, the threat type of the IP is determined based on the trained neural network model, and threat intelligence corresponding to the IP is generated based on the threat type.

[0010] Optionally, the step of obtaining raw threat information within a preset time period, wherein the raw threat information is the raw threat information corresponding to IPs without threat intelligence, includes:

[0011] Obtain the traffic logs collected by the traffic probe within the preset time period;

[0012] The traffic logs are filtered based on whether the IP has threat intelligence to obtain the traffic logs of the IP, and the traffic logs are used as the original threat information.

[0013] Optionally, after filtering the traffic logs based on whether the IP has threat intelligence to obtain the traffic logs for that IP, the method further includes:

[0014] The traffic logs are standardized to obtain standardized traffic logs, and the standardized traffic logs are stored in a standardized database.

[0015] Accordingly, the use of the traffic logs as the original threat information includes:

[0016] The standardized traffic logs are then used as the original threat information.

[0017] Optionally, the step of extracting features from the aggregated original threat information to obtain a feature vector includes:

[0018] The feature vectors of data-type data in the aggregated original threat information are extracted using standard normalization, maximum-minimum normalization, calculation of extreme values, and calculation of statistical values.

[0019] The enumerated data in the aggregated original threat information is transformed using a one-hot encoding method in order to extract the feature vector of the enumerated data;

[0020] The map-type key-value pairs in the aggregated original threat information are split into individual fields to extract the feature vectors of the map-type data.

[0021] Optionally, the trained neural network model adopts the Transformer architecture, uses XGBoost as the classifier, and adds a Batch Normalization layer after the input layer.

[0022] Optionally, after determining the threat type of the IP based on the trained neural network model if the IP exhibits threatening behavior, the method further includes:

[0023] Obtain the alarm logs corresponding to the IP within the preset time period;

[0024] Obtain the preset strategy mechanism;

[0025] The alarm logs and threat types are correlated and analyzed, and the final threat type of the IP is determined according to the preset policy mechanism;

[0026] Accordingly, generating threat intelligence corresponding to the IP based on the threat type includes:

[0027] Based on the final threat type, generate threat intelligence corresponding to the IP.

[0028] Optionally, after generating the threat intelligence corresponding to the IP based on the final threat type, the method further includes:

[0029] The IP address is stored in a standardized format.

[0030] The threat intelligence is packaged and uploaded to the cloud to work with security devices to automatically generate the threat intelligence.

[0031] The present invention also provides a threat intelligence production apparatus, comprising:

[0032] The original threat information acquisition module is used to acquire original threat information within a preset time period, wherein the original threat information is the original threat information corresponding to IPs that have no threat intelligence.

[0033] The aggregation module is used to aggregate the original threat information based on the IP address to obtain aggregated original threat information;

[0034] The feature extraction module is used to extract features from the aggregated raw threat information to obtain a feature vector;

[0035] The threat intelligence production module is used to input the feature vector into a trained neural network model. If the IP has threatening behavior, the module determines the threat type of the IP based on the trained neural network model and produces threat intelligence corresponding to the IP based on the threat type.

[0036] The present invention also provides a threat intelligence production device, comprising:

[0037] Memory, used to store computer programs;

[0038] A processor is used to implement the steps of the threat intelligence production method described above when executing the computer program.

[0039] The present invention also provides a storage medium storing a computer program, which, when executed by a processor, implements the steps of the threat intelligence production method described above.

[0040] As can be seen, this invention acquires raw threat information within a preset time period, specifically the raw threat information corresponding to IPs without threat intelligence; aggregates the raw threat information based on the IPs to obtain aggregated raw threat information; extracts features from the aggregated raw threat information to obtain feature vectors; inputs the feature vectors into a trained neural network model; if the IP exhibits threatening behavior, the trained neural network model determines the threat type of the IP, and generates corresponding threat intelligence based on the threat type. Compared to existing technologies that generate intelligence based on processed data analysis, this method directly utilizes machine learning technology to analyze raw threat information, ensuring the accuracy of threat intelligence production. Furthermore, the threat intelligence generated by this application can be directly applied to security devices, which is beneficial for subsequent security monitoring.

[0041] In addition, the present invention also provides a threat intelligence production apparatus, equipment and storage medium, which also have the above-mentioned beneficial effects. Attached Figure Description

[0042] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0043] Figure 1 A flowchart illustrating a threat intelligence production method provided in an embodiment of the present invention;

[0044] Figure 2 This is a schematic diagram of a threat intelligence production device provided in an embodiment of the present invention;

[0045] Figure 3 This is a schematic diagram of a threat intelligence production device provided in an embodiment of the present invention. Detailed Implementation

[0046] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0047] Threat intelligence, also known as cyber threat intelligence (CTI), is data that details cybersecurity threats against an organization. Threat intelligence helps security teams proactively gather data-driven, effective measures to neutralize cyberattacks before they occur. It also helps organizations more effectively detect and respond to ongoing attacks.

[0048] Security analysts create threat intelligence by collecting raw security threat information and security-related information from multiple sources, then correlating and analyzing this data to discover trends, patterns, and relationships, gaining deeper insights into actual or potential threats.

[0049] Threat data production typically occurs through multiple avenues. Security teams can collect any raw threat data that helps them build more comprehensive enterprise security capabilities. For example, if a security team is investigating new ransomware, they need to collect relevant malware samples, ransomware teams that may be associated with these samples, and information such as alert logs and traffic logs generated by various security devices. 1) Security teams can typically subscribe to multiple open-source or commercial threat intelligence data sources. Different threat intelligence data sources may have different focuses, which is also beneficial for building comprehensive intelligence capabilities; 2) Information can be obtained through information-sharing communities. In some professional forums and social platforms, peers often share first-hand information; 3) Enterprises typically deploy many security devices, which generate a large number of alert logs and access logs every day. This data provides a record of the threats and cyberattacks faced by the enterprise, which can help the enterprise produce proprietary intelligence to improve its own protection capabilities. The above methods do not have the ability to produce threat intelligence from raw threat information; they only classify and process existing threat intelligence. Therefore, subsequent processing based on this data lacks reliability. Furthermore, the process of extracting keywords from existing threat intelligence information cannot be directly applied to security devices.

[0050] This invention proposes a threat intelligence production method that can solve the above-mentioned problems. Please refer to the following for details. Figure 1 , Figure 1 A flowchart illustrating a threat intelligence production method provided in an embodiment of the present invention. The method may include:

[0051] S101: Obtain raw threat information within a preset time period. The raw threat information is the raw threat information corresponding to IPs for which there is no threat intelligence.

[0052] This embodiment does not limit the preset time period. For example, it can collect raw threat information for the day; or it can collect raw threat information for the week. This embodiment does not limit the raw threat information, as long as it is unprocessed raw information. For example, raw threat information can be traffic logs collected using traffic probes; or it can be the raw code of a malicious sample.

[0053] It should be further clarified that the above-mentioned acquisition of raw threat information within a preset time period refers to the raw threat information corresponding to IPs for which no threat intelligence is available, and may include the following steps:

[0054] Step 21: Obtain the traffic logs collected by the traffic probe within the preset time period;

[0055] Step 22: Filter the traffic logs based on whether the IP has threat intelligence to obtain the IP's traffic logs, and use the traffic logs as raw threat information.

[0056] This embodiment considers the potential complexity of processing the original code. Therefore, it selects traffic logs collected by traffic probes as the raw material for generating threat intelligence. Given the massive amount of raw threat information, a daily cycle is used. Furthermore, traffic logs can be filtered based on whether IP addresses (Internet Protocol, the protocol for interconnecting networks) contain threat intelligence, resulting in IP-specific traffic logs. These traffic logs are then used as the raw threat information, thereby reducing the amount of data processed and improving efficiency. Traffic logs include various data such as source IP, access target, and access path. As the raw threat information for threat intelligence generation, they record the attacker's access and attack behaviors.

[0057] It should be further explained that after filtering traffic logs based on whether the IP has threat intelligence, the following steps can also be included:

[0058] Step 31: Standardize the traffic logs to obtain standardized traffic logs, and store the standardized traffic logs in a standardized database;

[0059] Accordingly, traffic logs are used as raw threat information, including:

[0060] Step 32: Use standardized traffic logs as raw threat information.

[0061] This embodiment considers that traffic log fields collected by different traffic probes may be ambiguous or the same meaning may be expressed in different fields. Therefore, it is necessary to design standardized fields and data tables, and use ELK (Elasticsearch (a search server based on Lucene, an open-source full-text search engine toolkit), Logstash (an open-source data collection engine), and Kibana (an open-source analytics and visualization platform)) to build a standardized database. ELK is mainly deployed in enterprise architectures to collect and integrate log information from multiple services on multiple mobile devices. The traffic logs are formatted to have a unified format, and then stored in the Hive data warehouse (a data warehouse tool based on Hadoop, a software platform for developing and running large-scale data processing) of the big data platform.

[0062] Examples of traffic log standardization are as follows: (1) The data types in the received traffic logs may not conform to the standard. For example, the time field "2022-05-05 11:11:11" is of type String and needs to be converted to type DateTime (which includes date and time). (2) The names of the fields in the traffic logs are not the standardized fields and need to be mapped and converted.

[0063] S102: Aggregate the original threat information based on the IP address to obtain the aggregated original threat information.

[0064] The collected raw threat information is aggregated by IP address and statistical values ​​related to IP network behavior are calculated. Raw threat information can be aggregated by time partition and source IP address, and the following statistical values ​​are calculated: access count, attack count, number of access targets, access domain name and corresponding count, attack target and corresponding count, and dozens of other aggregated statistical fields.

[0065] S103: Extract features from the aggregated original threat information to obtain feature vectors.

[0066] In this embodiment, feature extraction is performed on the aggregated original threat information to obtain a feature vector.

[0067] It should be further explained that the above-mentioned feature extraction of the aggregated original threat information to obtain the feature vector may include the following steps:

[0068] Step 41: Extract feature vectors from the aggregated raw threat information using standard normalization, maximum-minimum normalization, extreme value calculation, and statistical value calculation methods;

[0069] Step 42: Use one-hot encoding to transform the enumerated data in the aggregated original threat information to extract the feature vector of the enumerated data;

[0070] Step 43: Split the map-type key-value pairs in the aggregated raw threat information into separate fields to extract the feature vectors of the map-type data.

[0071] The aggregated original threat information in this embodiment may contain multiple fields of different data types. Therefore, various feature extraction methods are adopted for different data types. These methods can include the following three: (1) For data values, standard normalization, maximum-minimum normalization, and calculation of maximum value (minimum value, average value, and variance) can be used; (2) For enumerated data, one-hot encoding (which is a one-bit effective encoding) is used for conversion; (3) For map (a set of key-value pairs) type data, the key-value pairs are directly split into separate fields for processing.

[0072] S104: Input the feature vector into the trained neural network model. If the IP exhibits threatening behavior, determine the threat type of the IP based on the trained neural network model, and generate corresponding threat intelligence for the IP based on the threat type.

[0073] In this embodiment, the feature vector is input into a trained neural network model. When the IP exhibits threatening behavior, the trained neural network model outputs the threat type of the IP and generates threat intelligence based on the IP and the threat type.

[0074] It should be further explained that the trained neural network model adopts the Transformer architecture (a model that uses attention mechanism to improve model training speed), uses XGBoost (an optimized distributed gradient boosting library) as the classifier, and adds a Batch Normalization layer after the input layer.

[0075] In this embodiment, the trained neural network model primarily employs the standard Transformer architecture for representation learning, essentially functioning as a feature extractor. XGBoost is used as the classifier, ultimately outputting the corresponding classification result.

[0076] Transformer is a neural network architecture based on attention mechanism. It solves the shortcomings of sequence models such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Neural Network) in that they cannot be computed in parallel, and achieves fast and efficient sequence learning. The basic structure of Transformer includes three components: (1) Encoder: The encoder is used to encode the input sequence. It contains multiple identical layers, each of which includes a multi-head attention mechanism and a feedforward neural network. The multi-head attention mechanism focuses on different parts of the input sequence to obtain the feature representation of the sequence, and then the feedforward neural network further learns the sequence features. (2) Decoder: The decoder is used to generate the output sequence. Its structure is basically the same as that of the encoder, but the multi-head attention mechanism focuses on the encoder output and the historical output of the decoder. (3) Positional Encoding: Positional encoding is used to store the relative position information of sequence elements within the sequence. This is because attention mechanisms themselves cannot distinguish the order of elements within a sequence. Transformers can learn global contextual information, which is useful for feature extraction from arbitrarily long sequences. This allows Transformers to learn effective features even for long or complex non-linguistic sequences. Multi-head attention mechanisms can automatically focus on different parts of the sequence, providing a degree of feature selection capability, which is beneficial for learning key features.

[0077] Furthermore, considering the addition of many columns to the feature vector data, resulting in data sparsity and varying distributions for each attribute, a Batch Normalization layer is added after the input layer to improve training efficiency and ensure stable model training. This layer maintains the data input distribution within a standard normal distribution. The Batch Normalization layer primarily addresses the covariate shift problem, where input variables change and shift during training, affecting the model's performance. By adding the Batch Normalization layer, the various attributes in the input vector are kept within a standard normal distribution, thus guaranteeing efficient and stable model training.

[0078] The training process of the aforementioned model may include: utilizing previously produced threat intelligence, associating threat types with aggregated raw threat information based on IP addresses to obtain a coarse dataset. Through manual analysis and filtering, data with significant ambiguity or false positives are removed, resulting in a training dataset. When creating the training dataset, IPs of different threat types are selected as evenly as possible from the already produced intelligence. These IPs are then associated with raw threat information, filtered out, and labeled using their corresponding threat types. The model employs an end-to-end training approach, trained on a GPU (Graphics Processing Unit). Based on its performance on the validation set, the model is fine-tuned, finally achieving certain performance requirements on the test set. This yields a trained neural network model used to determine the threat type of an IP.

[0079] It should be further explained that, after determining the threat type of the IP based on the trained neural network model and generating corresponding threat intelligence based on the threat type if the IP exhibits threatening behavior, the following steps may also be included:

[0080] Step 61: Obtain the alarm logs corresponding to the IPs within the preset time period;

[0081] Step 62: Obtain the preset strategy mechanism;

[0082] Step 63: Perform correlation analysis between alarm logs and threat types, and determine the final threat type of the IP based on the preset policy mechanism;

[0083] Accordingly, threat intelligence is generated for IPs based on threat type, including:

[0084] Produce threat intelligence corresponding to the IP based on the final threat type.

[0085] This embodiment uses a trained model to infer and calculate IP addresses and corresponding threat types, and associates the threat types derived by the model with alarm logs. Through a preset policy mechanism, it determines whether the inference result of an IP address based on the original threat information is consistent with the judgment of the security device, and whether there is obvious attack behavior, thereby filtering out false alarms.

[0086] For example: if an IP is classified as a vulnerability exploit based on the model's classification results, and the alerts generated on the security device are also classified as vulnerability exploits, and the number of alerts generated is greater than 2, then the IP's threat type is confirmed as vulnerability exploitation; if an IP is classified as a vulnerability exploit based on the model's classification results, but the alerts generated on the security device are not classified as vulnerability exploits, then the production of threat intelligence for that IP will be abandoned.

[0087] It should be further noted that after generating threat intelligence corresponding to the IP based on the final threat type, the following steps may also be included:

[0088] Step 71: Store the IP address in a standardized format;

[0089] Step 72: Package the threat intelligence and upload it to the cloud to work with security devices to achieve automated generation of threat intelligence.

[0090] This embodiment stores malicious IPs that have undergone association decisions in a standardized format in a database, and periodically packages and uploads them to the cloud to work with security devices to achieve automated threat intelligence production.

[0091] The threat intelligence production method provided in this invention involves acquiring raw threat information within a preset time period, where the raw threat information corresponds to IPs for which no threat intelligence is available; aggregating the raw threat information based on the IPs to obtain aggregated raw threat information; extracting features from the aggregated raw threat information to obtain feature vectors; inputting the feature vectors into a trained neural network model; if the IP exhibits threatening behavior, determining the IP's threat type based on the trained neural network model, and generating corresponding threat intelligence based on the threat type. Compared to existing technologies that produce intelligence based on processed data analysis, this method directly utilizes machine learning technology to analyze raw threat information, ensuring the accuracy of threat intelligence production. Furthermore, the threat intelligence produced by this application can be directly applied to security devices, completing the entire process from production to application, which is beneficial for subsequent security monitoring. Furthermore, using traffic logs collected by traffic probes as raw threat information reduces processing difficulty compared to raw code; traffic logs are standardized to ensure a uniform format; targeted feature extraction methods are employed for fields of different data types; a Transformer-based neural network model improves learning speed; and the addition of a Batch Normalization layer maintains the attributes in the input vector within a standard normal distribution, ensuring efficient and stable model training; combined with alert logs, a comprehensive analysis and judgment of IP threat types is performed to produce accurate threat intelligence; and a complete production process exists for generating threat intelligence from raw threat information, standardizing and storing the intelligence, and then uploading it to the cloud for distribution to various security devices.

[0092] The threat intelligence production apparatus provided in the embodiments of the present invention will be described below. The threat intelligence production apparatus described below can be referred to in correspondence with the threat intelligence production method described above.

[0093] Please refer to the details. Figure 2 , Figure 2 A schematic diagram of a threat intelligence production device provided in an embodiment of the present invention may include:

[0094] The original threat information acquisition module 100 is used to acquire original threat information within a preset time period, wherein the original threat information is the original threat information corresponding to IPs without threat intelligence.

[0095] The aggregation module 200 is used to aggregate the original threat information based on the IP address to obtain aggregated original threat information;

[0096] Feature extraction module 300 is used to extract features from the aggregated original threat information to obtain feature vectors;

[0097] The threat intelligence production module 400 is used to input the feature vector into a trained neural network model. If the IP has threatening behavior, the threat type of the IP is determined according to the trained neural network model, and threat intelligence corresponding to the IP is generated according to the threat type.

[0098] Furthermore, based on the above embodiments, the original threat information acquisition module 100 may include:

[0099] The acquisition unit is used to acquire the traffic logs collected by the traffic probe within the preset time period;

[0100] The filtering unit is used to filter the traffic logs based on whether the IP has threat intelligence, obtain the traffic logs of the IP, and use the traffic logs as the original threat information.

[0101] Furthermore, based on the above embodiments, the filtering unit may include:

[0102] A standardization processing and storage subunit is used to standardize the traffic logs to obtain standardized traffic logs and store the standardized traffic logs in a standardization database; correspondingly, the step of using the traffic logs as the original threat information includes: using the standardized traffic logs as the original threat information.

[0103] Furthermore, based on the above embodiments, the feature extraction module 300 may include:

[0104] The data-type data feature extraction unit is used to extract the feature vector of data-type data in the aggregated original threat information using standard normalization, maximum-minimum normalization, calculation of extreme values, and calculation of statistical values.

[0105] The enumeration data feature extraction unit is used to transform the enumeration data in the aggregated original threat information using a one-hot encoding method in order to extract the feature vector of the enumeration data.

[0106] The Map-type data feature extraction unit is used to split the map-type key-value pairs in the aggregated original threat information into separate fields in order to extract the feature vectors of the map-type data.

[0107] Furthermore, based on the above embodiments, the trained neural network model in the threat intelligence production module 400 adopts the Transformer architecture, uses XGBoost as the classifier, and adds a BatchNormalization layer after the input layer.

[0108] Furthermore, based on any of the above embodiments, the threat intelligence production apparatus may further include:

[0109] The alarm log acquisition module is used to acquire the alarm logs corresponding to the IP within the preset time period after determining the threat type of the IP based on the trained neural network model if the IP has threatening behavior.

[0110] The preset strategy mechanism acquisition module is used to acquire preset strategy mechanisms;

[0111] The correlation analysis module is used to perform correlation analysis between the alarm logs and the threat types, and determine the final threat type of the IP according to the preset policy mechanism; correspondingly, the step of generating threat intelligence corresponding to the IP based on the threat type includes: generating threat intelligence corresponding to the IP based on the final threat type.

[0112] Furthermore, based on the above embodiments, the threat intelligence production apparatus may further include:

[0113] After generating the threat intelligence corresponding to the IP based on the final threat type, the method further includes:

[0114] A storage module for storing the IP in a standardized format;

[0115] The upload module is used to package the threat intelligence and upload it to the cloud to work with security devices to automatically generate the threat intelligence.

[0116] It should be noted that the order of the modules and units in the aforementioned threat intelligence production device can be changed without affecting the logic.

[0117] The threat intelligence production apparatus provided in this embodiment of the invention comprises: a raw threat information acquisition module 100, used to acquire raw threat information within a preset time period (the raw threat information is the original threat information corresponding to IPs without threat intelligence); an aggregation module 200, used to aggregate the raw threat information according to IPs to obtain aggregated raw threat information; a feature extraction module 300, used to extract features from the aggregated raw threat information to obtain feature vectors; a threat intelligence production module 400, used to input the feature vectors into a trained neural network model; and a threat intelligence production module, used to determine the threat type of an IP based on the trained neural network model if the IP exhibits threatening behavior, and to produce corresponding threat intelligence based on the threat type. This method directly utilizes machine learning technology to analyze raw threat information, ensuring the accuracy of threat intelligence production. Furthermore, the threat intelligence produced by this application can be directly applied to security devices, completing the entire process from production to application, which is beneficial for subsequent security monitoring. Furthermore, using traffic logs collected by traffic probes as raw threat information reduces processing difficulty compared to raw code; traffic logs are standardized to ensure a uniform format; targeted feature extraction methods are employed for fields of different data types; a Transformer-based neural network model improves learning speed; and the addition of a BatchNormalization layer maintains the attributes in the input vector within a standard normal distribution, ensuring efficient and stable model training; combined with alert logs, a comprehensive analysis and judgment of IP threat types is performed to produce accurate threat intelligence; and a complete production process exists for generating threat intelligence from raw threat information, standardizing and storing the intelligence, and then uploading it to the cloud for distribution to various security devices.

[0118] The threat intelligence production equipment provided in the embodiments of the present invention will be described below. The threat intelligence production equipment described below can be referred to in correspondence with the threat intelligence production method described above.

[0119] Please refer to Figure 3 , Figure 3 A schematic diagram of a threat intelligence production device provided in an embodiment of the present invention may include:

[0120] Memory 10 is used to store computer programs;

[0121] Processor 20 is used to execute computer programs to implement the threat intelligence production method described above.

[0122] The memory 10, processor 20, and communication interface 31 all communicate with each other through the communication bus 32.

[0123] In this embodiment of the invention, the memory 10 is used to store one or more programs. The programs may include program code, which includes computer operation instructions. In this embodiment of the invention, the memory 10 may store programs for implementing the following functions:

[0124] Obtain raw threat information within a preset time period. The raw threat information is the raw threat information corresponding to IPs that have no threat intelligence.

[0125] The original threat information is aggregated based on the IP address to obtain the aggregated original threat information.

[0126] Feature vectors are obtained by extracting features from the aggregated raw threat information;

[0127] The feature vector is input into the trained neural network model. If the IP exhibits threatening behavior, the threat type of the IP is determined based on the trained neural network model, and threat intelligence corresponding to the IP is generated based on the threat type.

[0128] In one possible implementation, the memory 10 may include a program storage area and a data storage area, wherein the program storage area may store the operating system and applications required for at least one function; and the data storage area may store data created during use.

[0129] Furthermore, memory 10 may include read-only memory and random access memory, providing instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores operating systems and operating instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic tasks and handling hardware-based tasks.

[0130] Processor 20 can be a central processing unit (CPU), an application-specific integrated circuit, a digital signal processor, a field-programmable gate array, or other programmable logic device. Processor 20 can be a microprocessor or any conventional processor. Processor 20 can call programs stored in memory 10.

[0131] Communication interface 31 can be an interface for the communication module, used to connect with other devices or systems.

[0132] Of course, it should be noted that, Figure 3 The structure shown does not constitute a limitation on the threat intelligence production equipment in the embodiments of the present invention. In practical applications, threat intelligence production equipment may include more than Figure 3More or fewer components as shown, or combinations of certain components.

[0133] The storage medium provided in the embodiments of the present invention is described below. The storage medium described below can be referred to in correspondence with the threat intelligence production method described above.

[0134] The present invention also provides a storage medium storing a computer program, which, when executed by a processor, implements the steps of the threat intelligence production method described above.

[0135] The storage medium can include various media that can store program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0136] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0137] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0138] Finally, it should be noted that in this document, relationships such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0139] The foregoing has provided a detailed description of a threat intelligence production method, apparatus, device, and storage medium provided by the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A threat intelligence production method characterized by, include: Obtain raw threat information within a preset time period, wherein the raw threat information is the raw threat information corresponding to IPs for which there is no threat intelligence; The original threat information is aggregated based on the IP address to obtain the aggregated original threat information; Feature extraction is performed on the aggregated original threat information to obtain a feature vector; The feature vector is input into a trained neural network model. If the IP exhibits threatening behavior, the threat type of the IP is determined based on the trained neural network model, and threat intelligence corresponding to the IP is generated based on the threat type. The step of extracting features from the aggregated original threat information to obtain a feature vector includes: The feature vectors of data-type data in the aggregated original threat information are extracted using standard normalization, maximum-minimum normalization, calculation of extreme values, and calculation of statistical values. The enumerated data in the aggregated original threat information is transformed using a one-hot encoding method in order to extract the feature vector of the enumerated data; The map-type key-value pairs in the aggregated original threat information are split into individual fields to extract the feature vectors of the map-type data.

2. The threat intelligence production method according to claim 1, characterized in that, The acquisition of raw threat information within a preset time period, wherein the raw threat information is the raw threat information corresponding to IPs without threat intelligence, includes: Obtain the traffic logs collected by the traffic probe within the preset time period; The traffic logs are filtered based on whether the IP has threat intelligence to obtain the traffic logs of the IP, and the traffic logs are used as the original threat information.

3. The threat intelligence production method according to claim 2, characterized in that, After filtering the traffic logs based on whether the IP has threat intelligence to obtain the traffic logs for that IP, the process further includes: The traffic logs are standardized to obtain standardized traffic logs, and the standardized traffic logs are stored in a standardized database. Accordingly, the use of the traffic logs as the original threat information includes: The standardized traffic logs are then used as the original threat information.

4. The threat alarm generation method according to claim 1, characterized in that, The trained neural network model adopts the Transformer architecture, uses XGBoost as the classifier, and adds a BatchNormalization layer after the input layer.

5. The threat intelligence production method according to any one of claims 1 to 4, characterized in that, After determining the threat type of the IP based on the trained neural network model if the IP exhibits threatening behavior, the method further includes: Obtain the alarm logs corresponding to the IP within the preset time period; Obtain the preset strategy mechanism; The alarm logs and threat types are correlated and analyzed, and the final threat type of the IP is determined according to the preset policy mechanism; Accordingly, generating threat intelligence corresponding to the IP based on the threat type includes: Based on the final threat type, generate threat intelligence corresponding to the IP.

6. The threat intelligence production method according to claim 5, characterized in that, After generating the threat intelligence corresponding to the IP based on the final threat type, the method further includes: The IP address is stored in a standardized format. The threat intelligence is packaged and uploaded to the cloud to work with security devices to automatically generate the threat intelligence.

7. A threat intelligence production device, characterized in that, include: The original threat information acquisition module is used to acquire original threat information within a preset time period, wherein the original threat information is the original threat information corresponding to IPs that have no threat intelligence. The aggregation module is used to aggregate the original threat information based on the IP address to obtain aggregated original threat information; The feature extraction module is used to extract features from the aggregated raw threat information to obtain a feature vector; The threat intelligence production module is used to input the feature vector into a trained neural network model. If the IP has threatening behavior, the module determines the threat type of the IP based on the trained neural network model and produces threat intelligence corresponding to the IP based on the threat type. The feature extraction module may include: The data-type data feature extraction unit is used to extract the feature vector of data-type data in the aggregated original threat information using standard normalization, maximum-minimum normalization, calculation of extreme values, and calculation of statistical values. The enumeration data feature extraction unit is used to transform the enumeration data in the aggregated original threat information using a one-hot encoding method in order to extract the feature vector of the enumeration data. The Map-type data feature extraction unit is used to split the map-type key-value pairs in the aggregated original threat information into separate fields in order to extract the feature vectors of the map-type data.

8. A threat intelligence production device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the steps of the threat intelligence production method as described in any one of claims 1 to 6.

9. A storage medium, characterized in that, The storage medium stores a computer program that, when executed by a processor, implements the steps of the threat intelligence production method as described in any one of claims 1 to 6.