Network anomaly asset discovery method and device based on asset characteristics
By employing a network anomaly asset discovery method based on the asset's own characteristics and utilizing machine learning and outlier detection technologies, the shortcomings of traditional network asset management in identifying unregistered assets and discovering anomalies are addressed, enabling comprehensive, secure, and real-time management of enterprise network assets.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSTITUTE OF INFORMATION ENGINEERING CHINESE ACADEMY OF SCIENCES
- Filing Date
- 2021-12-27
- Publication Date
- 2026-06-12
AI Technical Summary
Traditional network asset management methods rely on passive alarms and manual inspections, which make it difficult to effectively identify unregistered network assets and their abnormal behavior, resulting in incomplete asset management and insufficient security.
By constructing a network anomaly asset discovery method based on the asset's own characteristics, machine learning technology is used to analyze the traffic data of enterprise network assets. By combining outlier detection and cluster analysis, unknown assets are identified and the asset information database is updated in real time. The results of passive alarms are then integrated to generate comprehensive alarms.
It enables proactive identification and anomaly detection of unregistered devices, improving the comprehensiveness and security of asset management, timely detection of network anomalies, and enhancing the accuracy and real-time nature of asset management.
Smart Images

Figure CN116361720B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of information technology and is a method and apparatus for discovering network anomaly assets based on the characteristics of the assets themselves. Background Technology
[0002] With the continuous advancement of profiling technology and machine learning-based anomaly analysis, and the increasingly prominent role of internet asset discovery and management in the internet industry, traditional network asset management often relies on passive alerts and manual inspections for anomaly detection. Understanding the characteristics and attributes of network assets is limited to asset registration information and general knowledge of asset types. However, information such as whether a network asset is a user network asset or a server network asset, and the specific uses of server network assets, is hidden in data sources different from asset registration and alerts. There is a certain correlation between the asset's inherent attributes and related traffic data, requiring analysis through data mining. Identifying network asset users or servers, the type of service they provide, and the existence of abnormal assets—including unknown (unregistered) abnormal asset access, security issues causing asset anomalies, and equipment malfunctions—through feature analysis, and establishing an asset anomaly detection model based on asset profiling, plays an indispensable role in improving network asset management.
[0003] The application requirements of this invention are intended to explore how to identify unknown user network assets, server network assets, and specific types of server network assets by comprehensively analyzing the registration data, characteristic attribute descriptions, and private network traffic data of network assets in an enterprise's private network; to discover and identify network asset anomalies; to automatically update the network asset status values in the network asset information database; to improve the characteristic attributes of relevant enterprise network assets; and to achieve real-time discovery and alarm of relevant enterprise network asset profiles and asset anomalies. Summary of the Invention
[0004] To address the aforementioned needs and achieve network anomaly asset discovery based on asset characteristics, this invention provides a method and apparatus for network anomaly asset discovery based on asset characteristics. Ultimately, the results and characteristics of three types of asset anomaly analysis are integrated to form a real-time, highly accurate enterprise network asset information database. Combined with passive anomaly alarm results, comprehensive asset anomaly alarms are generated. This invention is a powerful supplement to ordinary passive network asset anomaly discovery and asset registration attributes. It supports enterprises in gaining a more comprehensive understanding of the characteristics, attributes, and status of network assets, thereby strengthening enterprise asset management.
[0005] The technical solution of the present invention includes:
[0006] A method for detecting abnormal network assets based on the asset's own characteristics, comprising the following steps:
[0007] A network access asset registration database is constructed using the flow data of known asset P, and based on the network access asset registration database, network flow data is identified to obtain unknown asset flow data.
[0008] Streaming data characteristics F for unknown asset stream data u Classify the assets to obtain unknown asset types and unknown asset application types;
[0009] Stream data characteristics F of known assets p Outlier detection is performed to identify anomalous known assets P′;
[0010] Based on the unknown asset, unknown asset type, unknown asset application type, and known asset P′, obtain the abnormal asset detection results.
[0011] Furthermore, the streaming data feature F u This includes: common port rules, inflow process traffic ratio, network traffic relationship with registered assets, and online duration.
[0012] Furthermore, the commonly used port rules include: the port numbers used by the Transmission Control Protocol and the User Datagram Protocol.
[0013] Furthermore, the inflow flow ratio includes the ratio of the number of logs showing inflows and outflows of unknown assets in different time periods within the unknown asset flow data logs.
[0014] Furthermore, the unknown asset types include: network terminal assets and server assets.
[0015] Furthermore, the application types of server assets include: database server assets, interface server assets, file server assets, real-time computing server assets, and big data platform server assets.
[0016] Furthermore, when classifying, the following strategies are used to obtain unknown asset types and unknown asset application types:
[0017] Based on the ratio of log entries for unknown assets, and using a first threshold, assets with high traffic are labeled as server assets, and assets with low traffic are labeled as terminal device assets.
[0018] Based on the network traffic relationship with the registered assets, and using a second threshold, assets with high traffic are labeled as server assets and assets with low traffic are labeled as terminal device assets.
[0019] Based on a third threshold, online duration is used to label longer durations as server assets and shorter durations as terminal device assets.
[0020] When a server asset is identified, a comprehensive analysis is performed based on common port rules and resource configuration information to determine the application type of the server asset.
[0021] Furthermore, when performing classification, a corresponding weight is assigned to each feature.
[0022] Furthermore, the streaming data feature F p This includes: inherent characteristics of assets and equipment, common port rules, inflow process traffic ratio, and online duration.
[0023] Furthermore, outlier detection is performed through the following steps:
[0024] 1) Based on streaming data characteristics F p Clustering of known assets yields several clusters;
[0025] 2) Calculate the centroid of each cluster;
[0026] 3) Calculate the distance and relative distance from each known asset P to the nearest centroid;
[0027] 4) The outlier detection results are obtained by comparing the distance and the relative distance with a given threshold respectively.
[0028] Furthermore, outlier detection is performed through the following steps:
[0029] 1) Based on streaming data characteristics F p The first clustering of all known assets P is performed to obtain several clusters C1 and a set of outliers L1, where the fitting value between each known asset in the outlier set L1 and each cluster C1 is less than a set value.
[0030] 2) For the set of outliers to be deleted L t-1 Perform the t-th clustering on the known asset set to obtain several clusters C. t and the set of outliers And the set of outliers L t-1 Delete strong members of any cluster C t Given the known assets, obtain the set of outliers.
[0031] 3) Merge outlier sets and the set of outliers Obtain the set of outliers L t ;
[0032] 4) When the set conditions are met, the set of outliers L will be... t The results were used as outlier detections.
[0033] Furthermore, unknown assets and known assets P′ are placed in a blacklist, and known assets P″ that are associated with unknown assets are placed in a gray list.
[0034] Furthermore, the accuracy of classification and outlier detection can be improved through the following steps:
[0035] 1) Flow data characteristics F of known asset P u Classify the assets and learn the classification by comparing the classification results with the known asset P.
[0036] 2) Stream data features F for unknown assets whose asset types have been identified. p Perform outlier detection and learn from the outlier detection results by comparing them with the unknown asset;
[0037] 3) Classify and detect outliers of manually discovered unknown assets and known assets P′ in order to learn classification methods or outlier detection methods.
[0038] Furthermore, the alarm information G from the abnormal asset detection results is merged with the system's passive alarm information G′ through the following steps:
[0039] 1) Generate a Watermark based on the alarm information G and passive alarm information G′ received from Flink;
[0040] 2) When the Watermark is later than the stop time of the currently untriggered window, the execution of the corresponding window is triggered to solve the timing problem of data and merge alarm information G and passive alarm information G′.
[0041] A storage medium storing a computer program, wherein the computer program is configured to execute the method described above when run.
[0042] An electronic device includes a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer to perform the methods described above.
[0043] Compared with the prior art, the present invention has the following advantages:
[0044] 1. This invention can analyze and detect unregistered network access devices and identify them as terminal devices or server devices. For server devices, the basic application type can be determined. This effectively supports the improvement of asset registration management, detects unauthorized network access devices, and strengthens the security of network assets.
[0045] 2. This method, based on the characteristics of the assets themselves, combines machine learning methods to analyze the behavioral flow data of the assets, and combines the discovered abnormal asset information with alarm information. It can discover asset anomalies earlier than traditional passive manual and program inspections; the discovery of anomalies is more proactive and comprehensive.
[0046] 3. This invention employs real-time stream access analysis technology and combines it with historical blacklist and graylist mechanisms to provide real-time alerts for abnormal assets, making anomaly detection more timely. Attached Figure Description
[0047] Figure 1 A method for identifying unknown assets based on the characteristics of enterprise private network assets and streaming data.
[0048] Figure 2 A method for discovering known asset anomalies based on the characteristics of enterprise private network assets and streaming data.
[0049] Figure 3 Enterprise asset data preprocessing.
[0050] Figure 4 Module for statistical query and verification of asset anomaly detection results. Detailed Implementation
[0051] To make the above features and advantages of the present invention more apparent and understandable, specific embodiments are described below, and detailed descriptions are provided in conjunction with the accompanying drawings.
[0052] The network anomaly asset discovery method of the present invention mainly includes three aspects: an unknown asset identification method based on the characteristics of enterprise private network assets and flow data, a known asset anomaly discovery method based on the characteristics of enterprise private network assets and flow data, and anomaly information fusion and comprehensive alarm based on enterprise private network assets.
[0053] A method for identifying unknown assets based on the characteristics of enterprise private network assets and streaming data.
[0054] Enterprise network assets are diverse. According to the client-server architecture, client assets are usually PC terminals and other commonly used mobile terminal devices. The server mainly includes various servers with different functions and roles, server-related independent storage devices, as well as network switches and routers that connect clients and servers and servers and servers to servers. Firewalls, endpoint antivirus devices, host antivirus devices, server vulnerability scanning devices and other devices are used for security protection.
[0055] The data used in this invention includes enterprise private network asset registration data and enterprise routing device flow data. First, enterprise network assets are divided into two main categories: registered assets and unknown assets. Since unknown assets cannot be detected and identified through traditional monitoring and alarm methods and pose a threat to the security of other enterprise network assets, they are classified as a separate category of asset anomalies. On the other hand, registered asset anomalies are generally divided into asset anomalies caused by equipment failure and asset anomalies caused by security anomalies. Based on the impact of asset anomalies on the enterprise, asset anomalies can be further divided into single-point anomalies, local anomalies, and widespread anomalies. Based on anomaly data and combined with asset attribute characteristics, assets with the same anomaly type are clustered. An anomaly classification model is constructed for identifying unknown anomalies.
[0056] Known asset traffic exhibits easily detectable characteristics and patterns. For example, server assets show stable patterns in port usage. Commonly used ports are those reserved by the Internet Corporation for Assigned Names and Numbers (ICANN) for Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Table 1 lists some common port numbers and their corresponding applications:
[0057]
[0058] Table 1
[0059] Based on the characteristics of server assets themselves, database server assets often require more memory and storage space; interface servers often require more concurrency support and more CPU thread resources; file servers require large disk space or fixed disk arrays; real-time computing servers have larger memory resources; and big data platforms adopt distributed deployment, which usually results in more balanced resources.
[0060] By comprehensively analyzing various information such as server asset registration information, open port information, and resource configuration information, the purpose of the server is determined. When a server cannot meet a specific purpose, such as when the server is offline, or when server resources (CPU, memory, remaining hard disk space, network bandwidth) are insufficient, or when special-purpose servers, such as database servers, are identified as asset abnormalities through abnormal ports and processes of the corresponding services.
[0061] Analyzing the characteristics of client terminal network assets reveals several key aspects. On the software side, terminal assets typically use relatively fixed operating systems, primarily Windows, Android, and iOS. On the hardware side, terminal assets have significantly lower specifications compared to servers; a typical server uses dual CPUs, while a typical terminal asset has only one. Furthermore, the memory and storage capacity of terminal assets are far smaller than those of a typical server. After client assets are registered with the network, they are usually assigned a fixed IP address, and mobile devices also have a unique ID that can be used to identify their traffic activity. Similarly, the method of excluding registered assets through stream data analysis was used to identify anomalies in unknown assets.
[0062] Regarding the identification of unknown assets, firstly, unknown assets accessing the enterprise network have specific, unknown purposes, and their status is mostly in normal working order, conforming to the judgment model for normal asset types. Analysis is performed based on real-time network flow data sampled from the enterprise's private routing devices. First, IP matching is performed, obtaining unknown asset flow data based on the registered IP address of the asset entering the network. Then, based on the unknown asset flow data, the characteristics of client terminal assets and server assets are identified, determining whether the unknown network asset is a server network asset or a client terminal network asset.
[0063] Unknown server assets and client terminal assets exhibit different behavioral characteristics, which lead to different network traffic patterns. Therefore, by analyzing the flow data of network assets, it is possible to distinguish the behavioral characteristics of different network assets, thereby differentiating between server assets and client terminal assets.
[0064] 1) Unknown server assets conform to port opening rules; use streaming data ports to analyze and determine asset type and function.
[0065] 2) Based on the real-time network traffic log analysis of unknown assets, the ratio of the number of logs flowing into and out of unknown assets in different time periods is analyzed. Generally, the high traffic is the server and the low traffic is the terminal device. Unknown assets are labeled according to the traffic characteristics of unknown assets.
[0066] 3) Analyze the network traffic relationship between unknown assets and known registered assets. For example, analyze unknown devices accessing known web servers or FTP file servers. Identify assets through IP association analysis and data interaction analysis. Combine the characteristic of terminal devices having low access volume to servers to identify unknown terminal devices.
[0067] 4) Learn and determine based on the different characteristics of the online duration of terminal devices and server devices;
[0068] 5) Update the asset network information database that includes unknown client terminal assets and unknown servers;
[0069] While each feature can be analyzed individually to determine the equipment type of an unknown asset, the accuracy is low. This invention employs a weighted voting method to comprehensively analyze various features and integrate all features for unknown asset identification. The weighted voting formula is as follows:
[0070]
[0071] In the formula, T represents the number of features, ω represents the feature weights, and j represents the network asset usage method. The output indicates the category label of feature i in usage method j. The final output is the network asset usage method with the highest weighted vote.
[0072] A method for detecting known asset anomalies based on the characteristics of enterprise private network assets and streaming data.
[0073] The known assets in this invention mainly refer to the assets registered by the enterprise. The equipment information such as the equipment type of the known assets is relatively clear, and the flow characteristics of the known assets are also relatively clear. The factors affecting the anomalies of the known assets are mainly divided into two categories: asset failure and asset security. Asset failure is usually passively managed through operation and maintenance monitoring, while asset security is usually performed by using specific equipment for security scanning and management.
[0074] This invention combines asset type characteristics and asset flow characteristics for comprehensive analysis, using outlier detection and other methods to analyze the flow of various assets and obtain information on assets with abnormal flow. Outlier analysis of abnormal asset flow first requires data denoising; noise is the random error or variance of the observed variable. Noise is not only invalid data in outlier analysis but can also interfere with the analysis results. This invention performs data cleaning during the data access stage, effectively filtering out noisy data. In outlier detection, it is crucial to understand why the detected outliers were generated by some other mechanism. Typically, various assumptions are made based on the remaining data, and it is proven that the detected outliers significantly violate these assumptions.
[0075] Cluster-based outlier detection utilizes cluster analysis to discover locally strongly correlated groups of objects, while anomaly detection identifies objects that are either related or unrelated to other objects. Therefore, cluster analysis can naturally be used for outlier detection. This invention employs two cluster-based outlier detection methods:
[0076] 1. Obtain small clusters that are far from other clusters:
[0077] One method for detecting outliers using clustering is to identify small clusters that are far from other clusters. Typically, this process can be simplified to identifying all clusters smaller than a certain minimum threshold.
[0078] 2. Prototype-based clustering:
[0079] First, all objects are clustered, and then the degree to which an object belongs to a cluster is evaluated (outlier score). In this method, the degree to which an object belongs to a cluster is measured by its distance from the cluster center. In particular, if removing an object leads to a significant improvement in the objective, that object can be considered an outlier. For example, in the K-means algorithm, removing objects far from their relevant cluster centers can significantly improve the sum of squared errors (SSE) of that cluster.
[0080] For prototype-based clustering, there are two main methods to evaluate the degree to which an object belongs to a cluster (outlier score): one is to measure the distance of the object to the cluster prototype and use it as the outlier score of the object; the other is to consider that clusters have different densities and measure the relative distance from the cluster to the prototype, which is the ratio of the distance from a point to the centroid to the median distance from all points in the cluster to the centroid.
[0081] It can be noted that in cluster-based outlier detection, whether an object is considered an outlier may be highly dependent on the number of clusters (e.g., noisy clusters when K is large). This invention provides two solutions: one is to change the number of clusters and repeat the detection analysis, and the other is to cluster to obtain a large number of small clusters. In this case, if an outlier exists, it is most likely to be a true outlier.
[0082] The identification steps are as follows:
[0083] 1) Perform clustering. Select a clustering algorithm to cluster the sample set into K clusters and find the centroid of each cluster.
[0084] 2) Calculate the distance of each object to its nearest centroid.
[0085] 3) Calculate the relative distance of each object to its nearest centroid.
[0086] 4) Compare the distance and the relative distance with the given threshold respectively.
[0087] If the distance to an object is greater than this threshold, the object is considered an outlier. Based on the above method, the following improvements are made:
[0088] 1) The impact of outliers on initial clustering: When detecting outliers through clustering, outliers can affect the clustering results. To address this issue, the following method is used: cluster objects, remove outliers, and then cluster the objects again.
[0089] 2) Take a special set of objects that do not fit any cluster well; this set represents potential outliers. As the clustering process progresses, the clusters change. Objects that no longer strongly belong to any cluster are added to the potential outlier set; test objects currently in this set, and if they now strongly belong to a cluster, they can be removed from the potential outlier set. Points remaining in this set at the end of the clustering process are classified as outliers.
[0090] Whether an object is considered an outlier may depend on the number of clusters. One strategy is to repeat the analysis for different numbers of clusters. Another approach is to identify a large number of small clusters.
[0091] 1) Smaller clusters tend to be more aggregated;
[0092] 2) If there are a large number of small clusters, an object that is an outlier is most likely a true outlier.
[0093] On the downside, a group of outliers may form small clusters that are therefore unidentifiable.
[0094] Based on enterprise private network asset anomaly information fusion and comprehensive alarm
[0095] The two methods described above analyze and identify asset anomalies in both unknown and known assets based on the inherent attributes and flow data of the enterprise's private network assets. This invention establishes an asset anomaly blacklist and graylist mechanism, adding discovered anomalies in both unknown and known assets to the asset anomaly identification blacklist, and adding registered assets with informational associations with unknown network assets that pose an anomaly risk to the asset anomaly identification graylist.
[0096] This method integrates the results of the two methods mentioned above, cross-validating their classification results. During the verification process, some known assets are treated as unknown assets, and the method for discovering unknown assets is used for learning, with the results compared. Some unknown assets whose asset types have been identified are treated as known assets, and the anomaly detection model for known assets is used for learning, with the results compared. In addition, the results of discovering unknown assets and anomaly detection of known assets are manually confirmed and corrected. Through continuous learning and verification, the discovery of anomalies becomes more accurate and comprehensive.
[0097] Using real-time stream computing technology based on distributed memory, the discovered abnormal assets are combined with passive alarm information to obtain a comprehensive alarm result. Watermarks are introduced during the data fusion process to address data timing issues. Due to network and concurrency uncertainties, the input data in real-time machine learning suffers from out-of-order and incompleteness, affecting the timeliness and accuracy of the results. The order in which Flink receives events in the system is not strictly arranged according to the Event Time order. When Flink receives data, it generates a Watermark according to certain rules. This Watermark is equal to the `maxEventTime` delay of all currently arriving data. In other words, the Watermark is generated based on the timestamp carried by the data. Once the Watermark is later than the stop time of a currently untriggered window, the execution of that window will be triggered. Since Event Time is carried by the data, if new data cannot be obtained during execution, untriggered windows will never be triggered.
[0098] The specific implementation of this invention comprises six parts:
[0099] 1) Categorization and data loading of enterprise asset filing information;
[0100] 2) Real-time data collection of enterprise assets;
[0101] 3) Data noise reduction processing;
[0102] 4) Data storage;
[0103] 5) Data Analysis, Modeling, and Machine Learning
[0104] 6) Result storage and display
[0105] 7) Verification of the accuracy of abnormal asset discovery results and comprehensive alerts
[0106] The stages of data aggregation, loading, streaming data acquisition, cleaning, and storage are collectively referred to as the data preprocessing stage, such as... Figure 3 As shown, the changes in asset status in the enterprise asset registration information database are obtained in real time through CDC technology, which serves as the main data source for known asset data characteristics. Enterprise asset network traffic data is obtained by accessing the enterprise's routing equipment.
[0107] Since the enterprise asset data brought in through dedicated traffic devices contains a huge amount of information, the data is first cleaned according to the actual data requirements of this invention, retaining useful information in the streaming data, such as: source asset IP, source asset port, destination asset IP, destination asset port, etc. The cleaned data is written to the Kafka message queue cluster in real time, waiting for the next step of processing. In the implementation of this invention, different topics are created for each type of data source, and the information retention time in the Kafka message queue is set to 24 hours.
[0108] To improve the utilization rate of cleaned data and facilitate backtracking to find problems, in the implementation of this invention, the real-time cleaned and denoised data is re-stored in Kafka. Through the Flink real-time stream computing engine, it is matched with the registered asset IP, and the denoised traffic is further divided into unknown device traffic and known device traffic using the test output stream. The two types of raw traffic are saved into files and loaded into the Hive data warehouse in batches at regular intervals. In the Hive data warehouse, tables are created and stored for unknown device traffic, known device traffic, and historical results data of various analyses, for subsequent analysis, display, and accuracy verification.
[0109] Based on the model described in the results, the collected and cleaned data is vectorized, and the data is processed in real time according to the algorithm proposed above. This invention uses Cloudera Manager Community Edition as a big data platform for storage and analysis, and converts the cleaned data in the Hive library into RDDs in real time. It performs correlation analysis and preprocessing through Spark SQL and Spark Streaming, and temporarily stores the preprocessed result set in memory for Spark MLlib machine learning.
[0110] In the implementation of this invention, some known terminal devices and servers are used as the result set. The scikit-learn decision tree algorithm is employed to calculate the weight of each element, analyze the characteristics of unknown assets, and label the types of unknown assets. Flink is used to fuse the data, and the fused results are also stored in the ClickHouse database.
[0111] The demonstration and verification section, such as Figure 4 As shown, the invention provides a simple web interface for real-time statistics of the total number of various unknown assets and the total number of known abnormal assets, and provides a batch query function for use in accuracy verification.
[0112] Accuracy verification is mainly achieved through a web project, which uploads a batch of known asset traffic and unknown asset traffic information to the system for comparative analysis, and finally outputs the accuracy of the results.
[0113] Based on the above process, the subsystem is designed with a data receiving and storage layer, a data analysis layer, and a data application and presentation layer, corresponding to the data warehouse, outlier detection, and aggregated alarm modules in the diagram, respectively. The data analysis layer uses a known asset anomaly identification method based on a clustering outlier detection algorithm to acquire anomaly assets from known assets in real time; it uses a weighted analysis model combining its own characteristics and asset flow analysis to obtain unknown assets; and finally, asset anomaly detection and comprehensive alarms are achieved through the fusion of asset anomaly information.
[0114] The above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit them. Those skilled in the art can modify or make equivalent substitutions to the technical solutions of the present invention without departing from the spirit and scope of the present invention. The scope of protection of the present invention should be determined by the claims.
Claims
1. A method for detecting network anomaly assets based on their own characteristics, wherein the assets include client assets and server assets, the client assets include PC terminals and mobile terminal devices, and the server assets include servers, server-related independent storage devices, network switches, routers, firewalls for security protection, terminal antivirus devices, host antivirus devices, and server vulnerability scanning devices, the steps of which include: Using known assets The network flow data is used to construct an asset registration database, and based on the network flow data, unknown asset flow data is obtained by identifying the network flow data. Streaming data characteristics of unknown asset stream data Classify the assets to obtain unknown asset types and unknown asset application types; Streaming data characteristics of known assets Outlier detection is performed to identify known assets that exhibit anomalies. ; Based on unknown assets, unknown asset types, unknown asset application types, and known assets Obtain abnormal asset detection results; among them, unknown assets are compared with abnormal known assets. Add to the asset anomaly identification blacklist and put registered assets with unknown information associations into the gray list; Alarm information from integrated abnormal asset detection results With the system's passive alarm information Among them, the alarm information of the fusion abnormal asset detection results With the system's passive alarm information ,include: According to the alarm information received by Flink With passive alarm information , generate Watermark; If the Watermark's timeout is later than the stop time of any currently untriggered window, the corresponding window will be triggered to resolve data timing issues and integrate alarm information. With passive alarm information .
2. The method as described in claim 1, characterized in that, The characteristics of the streaming data This includes: common port rules, the ratio of logs flowing into and out of unknown assets, the relationship between network traffic and registered assets, and online duration; the common port rules include: the port numbers used by the Transmission Control Protocol and the User Data Packet Protocol; the inflow process traffic ratio includes: the ratio of logs flowing into and out of unknown assets in different time periods in the logs of unknown asset flow data; the unknown asset types include: network terminal assets and server assets; the application types of server assets include: database server assets, interface server assets, file server assets, real-time computing server assets, and big data platform server assets.
3. The method as described in claim 2, characterized in that, When classifying assets, the following strategies are used to obtain unknown asset types and unknown asset application types: Based on the ratio of log entries for unknown assets, and using a first threshold, assets with high traffic are labeled as server assets, and assets with low traffic are labeled as terminal device assets. Based on the network traffic relationship with the registered assets, and using a second threshold, assets with high traffic are labeled as server assets and assets with low traffic are labeled as terminal device assets. Based on a third threshold, online duration is used to label longer durations as server assets and shorter durations as terminal device assets. When a server asset is identified, a comprehensive analysis is performed based on common port rules and resource configuration information to determine the application type of the server asset.
4. The method as described in claim 1, characterized in that, The characteristics of the streaming data This includes: inherent characteristics of assets and equipment, common port rules, inflow process traffic ratio, and online duration.
5. The method as described in claim 1, characterized in that, Outlier detection is performed using the following steps: 1) Based on streaming data characteristics Clustering of known assets yields several clusters; 2) Calculate the centroid of each cluster; 3) Calculate the value of each known asset. Distance to the nearest centroid and relative distance; 4) The outlier detection results are obtained by comparing the distance and the relative distance with a given threshold respectively.
6. The method as described in claim 1, characterized in that, Outlier detection is performed using the following steps: 1) Based on streaming data characteristics For all known assets The first clustering was performed, resulting in several clusters. and the set of outliers The set of outliers Each known asset and each cluster The fitted value between them is less than the set value; 2) Deleting the set of outliers The known set of assets is used for the first time Secondary clustering yields several clusters. and the set of outliers and set up at outlier points Delete strongly belonging to any cluster Given the known assets, obtain the set of outliers. ; 3) Merge outlier sets and the set of outliers The set of outliers is obtained. ; 4) When the set conditions are met, the outlier set will be created. The results were used as outlier detections.
7. The method as described in claim 1, characterized in that, Improve the accuracy of classification and outlier detection by following these steps: 1) For known assets Streaming data characteristics Classify the assets and compare the classification results with the known assets. Classification learning; 2) Stream data characteristics of unknown assets whose asset types have been identified Perform outlier detection and learn from the outlier detection results by comparing them with the unknown asset; 3) Unknown assets discovered manually and known assets Classification and outlier detection are performed to learn the classification method or outlier detection method.
8. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor being configured to run the computer program to perform the method as claimed in any one of claims 1-7.