Computer network equipment data processing method and system based on cloud computing

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing tenant-level splitting and dual-modal fusion encoding on multi-tenant traffic data of shared network devices in a cloud computing environment, the accuracy and integrity issues of fault data processing in mixed and encrypted environments with multi-tenant data superposition are solved, enabling accurate identification of tenant-level fault events and identification of propagation paths.

CN122241275APending Publication Date: 2026-06-19SHENZHEN DIWO VIDEO DIGITAL TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHENZHEN DIWO VIDEO DIGITAL TECH CO LTD
Filing Date: 2026-04-30
Publication Date: 2026-06-19

Application Information

Patent Timeline

30 Apr 2026

Application

19 Jun 2026

Publication

CN122241275A

IPC: G06F18/231; G06F18/213; G06F18/25; G06F40/30; G06N3/045; G06N3/0895; H04L41/0677; H04L67/10; G06F123/02

AI Tagging

Application Domain

Semantic analysis Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Database-based operation method, apparatus, device, and storage medium
CN122196021ADigital data information retrieval Semantic analysis
Test detection intelligent customer service method and system based on AI agent
CN122220479ADigital data information retrieval Semantic analysis
A method for retrieving and enhancing generation of a large model for historical building knowledge
CN122198113ASemantic analysis Biological models
A cross-platform intelligent publishing method based on a knowledge graph
CN122196191ASemantic analysis Inference methods
An international public opinion event information dissemination and control system based on artificial intelligence
CN122264577ARealize automatic collectionRealize analysisSemantic analysis Biological modelsInformation propagationControl system

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In cloud computing multi-tenant data centers, existing technologies struggle to effectively isolate and integrate multi-tenant traffic data from shared network devices, resulting in insufficient accuracy and completeness of fault data processing results. This is especially true in encrypted traffic environments, where tenant anomalies are easily masked, leading to serious misjudgments and missed detections.

Method used

The encrypted flow metadata is split by a tenant isolation identifier mapping table to extract tenant-level traffic features. A pre-trained network operation and maintenance semantic encoder is used to generate log semantic embedding vectors. A dual-modal fusion encoder is combined to perform tenant-level fusion processing to generate tenant-level fusion semantic vectors. Finally, hierarchical agglomeration clustering is performed to generate a multimodal semantic event cluster set.

Benefits of technology

It achieves effective isolation and fusion of multi-tenant traffic data, improves the accuracy and completeness of fault data processing, reduces the possibility of misjudgment and missed detection, and can accurately identify tenant-level fault events and propagation paths.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122241275A_ABST

Patent Text Reader

Abstract

This invention relates to the field of cloud computing network operation and maintenance technology, and discloses a method and system for processing computer network equipment data based on cloud computing. The method includes: acquiring the original log dataset, encrypted flow metadata record set, and tenant isolation identifier mapping table of shared network equipment; performing tenant-level splitting and traffic feature extraction based on tenant isolation identifiers; generating log semantic embedding vectors and performing tenant-level association pairing; generating tenant-level fused semantic vectors through a dual-modal fusion encoder; performing hierarchical agglomerative clustering according to the tenant dimension to generate a set of fault event clusters; inferring the independent fault propagation link graph of each tenant on the network topology; performing cross-tenant association analysis and generating comprehensive fault data processing results.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cloud computing network operation and maintenance technology, and more specifically, to a cloud computing-based computer network equipment data processing method and system. Background Technology

[0002] In a cloud-based multi-tenant data center, network devices such as core switches and top-of-rack switches are shared by multiple tenants, carrying fully encrypted network traffic. During operation, each network device continuously generates two types of heterogeneous data: device operation logs and encrypted stream metadata. When a network device fails, the operations and maintenance platform needs to process and analyze this data to locate the fault propagation path and identify the specific tenants affected.

[0003] Existing cloud computing network equipment data processing technologies typically perform feature extraction and cluster analysis on all data on shared devices, and directly perform fault event classification and link inference based on device-level granularity.

[0004] However, in a cloud computing environment, the data carried on shared network devices is a superimposed record of traffic from multiple tenants. When extracting features from this mixed data, the traffic behavior patterns of different tenants interfere with each other. Normal business changes of one tenant can alter the overall data characteristics of the device, leading to misjudgments, while genuine anomalies of one tenant may be masked by normal traffic from other tenants, resulting in missed detections. Furthermore, in encrypted traffic environments, device log text content is simplified and lacks semantic differentiation. Key fault clues are hidden in the encrypted traffic behavior patterns of specific tenants. Existing technologies do not perform tenant-level fusion processing of log semantics and flow metadata, resulting in insufficient accuracy and completeness of network device fault data processing results. Summary of the Invention

[0005] This invention provides a cloud computing-based computer network equipment data processing method and system, which solves the technical problems in related technologies such as the difficulty in effectively isolating fault data of shared network equipment in cloud computing according to the tenant dimension, the insufficient ability to fuse and analyze multi-source heterogeneous data, and the difficulty in accurately clustering fault semantic events.

[0006] This invention discloses a cloud computing-based computer network device data processing method, comprising: acquiring a raw log dataset, an encrypted flow metadata record set, and a tenant isolation identifier mapping table of a cloud-shared network device within a target time range; performing tenant-level splitting on the encrypted flow metadata record set based on the tenant isolation identifier mapping table, separating the flow metadata on the same network device into multiple tenant sub-flow metadata datasets according to tenant identifiers, extracting traffic features from each tenant sub-flow metadata dataset, and generating a tenant-level traffic feature vector; and generating log semantics for each log entry in the raw log dataset using a pre-trained network operation and maintenance semantic encoder. The embedded vector is spatiotemporally matched with the timestamps and IP port information in the log entries and the 5-tuples and timestamps in the metadata set of each tenant sub-flow to generate log-flow association pairs with tenant identifiers. For each association pair, the log semantic embedding vector and the corresponding tenant-level traffic feature vector are input into the bimodal fusion encoder, and a tenant-level fusion semantic vector is generated through bidirectional calculation of cross-attention. All tenant-level fusion semantic vectors are grouped according to the tenant dimension, and hierarchical agglomerative clustering is performed within each tenant group using the cosine distance between vectors as the merging criterion to generate a set of multimodal semantic event clusters exclusive to each tenant, which is output as the result of fault data processing.

[0007] Furthermore, the extraction of traffic features for each tenant sub-flow metadata includes: calculating the mean, variance, skewness, and kurtosis of the packet size sequence as statistical distribution features; calculating the autocorrelation coefficient of the packet interval time series as a time rhythm feature; counting the occurrence frequency of adjacent state pairs in the TLS handshake phase identifier sequence, dividing the occurrence frequency of each state pair by the total occurrence frequency of the corresponding predecessor state to obtain the transition probability between each handshake state, arranging the transition probabilities of all state pairs in a fixed order and concatenating them into a state transition feature sub-vector; concatenating the above features in each dimension to form a tenant-level traffic feature vector, and applying Z-score standardization to each dimension component.

[0008] Furthermore, for flow records in the 5-tuple where the source and destination belong to different tenants, the flow record is copied to the metadata set of each tenant's sub-flow, and a direction marker field is added to the copied record to distinguish whether the tenant is the sender or receiver of the traffic.

[0009] Furthermore, the spatiotemporal correlation matching includes: in the time dimension, using the timestamp of the log entry as the anchor point, searching for flow records whose timestamps fall within the preset time window as candidates; in the network address dimension, matching the IP address and port information parsed from the log entry with the five-tuple of the candidate flow record, and determining a successful match when the IP address and port information match the source or destination in the five-tuple; for cases where a log entry is successfully matched with the flow records of multiple tenants at the same time, the log entry is included in the association pairs of each matching tenant, and each association pair is processed independently in the subsequent fusion encoding stage.

[0010] Furthermore, the dual-modal fusion encoder comprises a log semantic projection layer, a traffic feature projection layer, a cross-attention layer, and a fusion output layer. Specifically: the log semantic projection layer projects the log semantic embedding vector onto a shared semantic space through a linear transformation, generating a log projection vector; the traffic feature projection layer projects the tenant-level traffic feature vector onto the same shared semantic space through a linear transformation, generating a traffic projection vector; the cross-attention layer uses the log projection vector as the query and the traffic projection vector as the key and value, calculating a first attention output through scaled dot product attention, and simultaneously uses the traffic projection vector as the query and the log projection vector as the key and value, calculating a second attention output through scaled dot product attention, where the scaling factor is the square root of the vector dimension of the shared semantic space; the fusion output layer concatenates the first and second attention outputs and maps them through a fully connected layer to generate a tenant-level fusion semantic vector.

[0011] Furthermore, the training of the dual-modal fusion encoder adopts a contrastive learning approach, using association pairs of the same tenant and the same fault type as positive sample pairs and association pairs of different tenants or different fault types as negative sample pairs. A contrastive loss function is used for training. This contrastive loss function is as follows: For each sample in a batch, the cosine similarity between the sample's tenant-level fusion semantic vector and the tenant-level fusion semantic vector of its positive sample pair is divided by the temperature hyperparameter and the resulting exponent is used as the numerator; the sum of the cosine similarities between the sample's tenant-level fusion semantic vector and the tenant-level fusion semantic vectors of all other samples in the batch is divided by the temperature hyperparameter and the resulting exponent is used as the denominator; the logarithm of the ratio of the numerator to the denominator is taken as the negative value, and the average of this negative logarithm value for all samples in the batch is used to obtain the contrastive loss function value.

[0012] Furthermore, the hierarchical agglomerative clustering includes: taking each vector as the initial cluster, calculating the cosine distance between all cluster pairs in each iteration, merging the two clusters with the smallest distance, and repeating this process until the cosine distance between any two clusters is greater than a preset merging threshold; the calculation of the cosine distance between cluster pairs adopts the average link criterion, that is, taking the average of the cosine distances between all vector pairs in two clusters as the distance metric of the cluster pair, wherein the cosine distance between a single vector pair is equal to 1 minus the cosine similarity of the vector pair.

[0013] Furthermore, it also includes: for each tenant's multimodal semantic event cluster set, extracting the device identifier and timestamp associated with the records in each event cluster, and locating the corresponding node on the network topology graph; for any two event clusters within the same tenant, calculating the topology hop count, time difference, and packet size distribution similarity of the tenant-level traffic feature vectors in the two event clusters; when the shortest path hop count of the network device nodes corresponding to the two event clusters on the network topology graph does not exceed a preset hop count threshold, the difference between the mean timestamps of the two event clusters is within the preset time interval formed by the minimum propagation delay and the maximum propagation delay, and the time sequence is consistent with the topology direction, and the cosine similarity between the packet size distribution features of the two event clusters is not lower than a preset distribution similarity threshold, establishing causal association edges and generating an independent fault propagation link graph for each tenant.

[0014] Furthermore, it also includes: performing cross-tenant correlation analysis on the independent fault propagation link graphs of all tenants, taking the average timestamp of a fault event cluster of a certain tenant on a certain device node as the center, and extending forward and backward by a preset time radius to form a time interval; when the average timestamp of a fault event cluster of another tenant on the same device node falls within this time interval, it is determined that the fault events of the two tenants have spatiotemporal overlap on that device node, and the device node is marked as a cross-tenant common fault node; the cross-tenant correlation information is attached to the independent fault propagation link graphs of each tenant, generating the comprehensive fault data processing results of cloud computing network devices and storing them in the fault analysis database.

[0015] This invention discloses a cloud computing-based computer network device data processing system, comprising: a multi-source data acquisition module, used to acquire raw log datasets, encrypted flow metadata record sets, and tenant isolation identifier mapping tables of cloud computing shared network devices within a target time range; a tenant-level splitting and feature extraction module, used to perform tenant-level splitting on the encrypted flow metadata record set based on the tenant isolation identifier mapping table, separating the flow metadata on the same network device into multiple tenant sub-flow metadata datasets according to tenant identifiers, extracting traffic features from each tenant sub-flow metadata dataset, and generating a tenant-level traffic feature vector; and an association and pairing module, used to process each log entry in the raw log dataset. A pre-trained network operation and maintenance semantic encoder is used to generate log semantic embedding vectors. Based on timestamps and IP port information, spatiotemporal association matching is performed with the sub-flow meta-datasets of each tenant to generate log-flow association pairs with tenant identifiers. A bimodal fusion encoding module is used to input the log semantic embedding vector and the corresponding tenant-level traffic feature vector into the bimodal fusion encoder for each association pair, and generate tenant-level fusion semantic vectors through bidirectional calculation of cross attention. A clustering processing module is used to group all tenant-level fusion semantic vectors according to the tenant dimension, and perform hierarchical agglomerative clustering within each tenant group using the cosine distance between vectors as the merging criterion to generate a set of multimodal semantic event clusters exclusive to each tenant.

[0016] The beneficial effects of this invention are as follows: This invention performs tenant-level splitting of encrypted flow metadata record sets based on a tenant isolation identifier mapping table. This separates the superimposed traffic data of multiple tenants on a shared network device into independent tenant sub-flow metadata datasets for each tenant during the feature extraction stage. This prevents traffic behavior patterns of different tenants from interfering with each other during feature extraction, reducing the possibility of misjudgments due to fluctuations in mixed features. Furthermore, it preserves the true anomaly signals of a single tenant within an independent feature space, reducing the possibility of missed detections. Simultaneously, a dual-modal fusion encoder performs tenant-level fusion processing on tenant-level traffic feature vectors and log semantic embedding vectors. Utilizing bidirectional computation with cross-attention, it allows insufficiently discriminative parts of the log semantics to obtain supplementary information from the encrypted traffic behavior patterns of the same tenant. This improves the accuracy of event semantic differentiation in fault data processing results, solving the technical problems of insufficient accuracy and completeness in fault data processing results caused by the mixed superposition of multi-tenant data on cloud computing shared network devices and insufficient log semantics in encrypted environments. Attached Figure Description

[0017] Figure 1 This is a flowchart of a cloud computing-based computer network device data processing method provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the distribution of encrypted stream metadata records for each tenant in the SW-TOR-07 provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the comparison (after standardization) of key dimensions of tenant-level traffic feature vectors provided in an embodiment of the present invention; Figure 4 This is a schematic diagram illustrating the tenant A TLS handshake state transition probability provided in an embodiment of the present invention; Figure 5 This is a schematic diagram illustrating the statistical results of the association and pairing of log entries and stream records provided in an embodiment of the present invention; Figure 6 This is a schematic diagram comparing the number of clustering input vectors for each tenant with the final number of event clusters provided in an embodiment of the present invention; Figure 7 This is a schematic diagram of the vector distribution within each event cluster of tenant A provided in an embodiment of the present invention; Figure 8 This is a schematic diagram illustrating the vector dimension changes at each stage of the dual-modal fusion encoder provided in this embodiment of the invention. Detailed Implementation

[0018] The subject matter described herein will now be discussed with reference to exemplary embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and changes may be made to the function and arrangement of the elements discussed without departing from the scope of this specification. Various processes or components may be omitted, substituted, or added as needed in the examples. Furthermore, some features described in the examples may be combined in other examples.

[0019] This embodiment provides a cloud computing-based computer network device data processing method, such as... Figure 1 As shown, it includes the following steps: Step 1: Obtain multi-source heterogeneous datasets from shared network devices in cloud computing; The system acquires the raw log dataset, encrypted stream metadata record set, and tenant isolation identifier mapping table reported by each shared network device in the cloud computing platform within a target time range, generating a multi-source device data set to be processed. The encrypted stream metadata record includes 5-tuple information, packet size sequence, packet interval time sequence, and TLS handshake phase identifier. The tenant isolation identifier mapping table records the tenant identifier corresponding to each stream record.

[0020] It should be noted that the mapping relationship between flow records and tenant identifiers in the aforementioned tenant isolation identifier mapping table refers to the correspondence between each flow record and a specific tenant based on the network isolation protocol field. The establishment of this mapping relationship is based on at least one of VLAN ID, VxLAN VNI, or MPLS tag. Different multi-tenant isolation schemes use different identifier fields, and the operation and maintenance platform reads the corresponding field to complete the mapping according to the isolation scheme deployed in the data center.

[0021] Step 2: Perform tenant-level splitting and feature extraction on encrypted stream metadata based on tenant isolation identifiers; Tenant-level splitting is performed on the encrypted flow metadata record set based on the tenant isolation identifier mapping table, separating the flow metadata on the same network device into multiple tenant sub-flow metadata datasets according to the tenant identifier. Traffic features are extracted from each tenant sub-flow metadata dataset to generate a tenant-level traffic feature vector for each tenant on each device.

[0022] It should be noted that the traffic features extracted from each tenant sub-stream metadata dataset mentioned above refer to multi-dimensional statistics calculated independently from the split single-tenant flow records. Specifically, statistical distribution characteristics, including mean, variance, skewness, and kurtosis, are calculated for the packet size sequence; the autocorrelation coefficient is calculated for the packet interval time series to characterize the temporal rhythm of the tenant's traffic; and state transition characteristics are calculated for the TLS handshake phase identifier, i.e., the transition probabilities between each handshake state are statistically analyzed to capture the behavioral patterns of the tenant's encrypted connection establishment process. The above-mentioned dimensional features are concatenated to form the tenant-level traffic feature vector for that tenant on this device.

[0023] Furthermore, the calculation method for the state transition features of the TLS handshake phase identifiers mentioned above is as follows: the TLS handshake phase identifier sequence in each flow record is regarded as a state sequence, the occurrence frequency of adjacent state pairs in the sequence is counted, the occurrence frequency of each state pair is divided by the total occurrence frequency of the corresponding predecessor state to obtain the transition probability between each handshake state, and the transition probabilities of all state pairs are arranged in a fixed order and concatenated into a state transition feature sub-vector, which is then incorporated into the tenant-level traffic feature vector.

[0024] After completing the feature concatenation of each dimension, the Z-score standardization process is applied to each dimension component in the tenant-level traffic feature vector to eliminate the impact of the difference in the scale between different statistics on subsequent projection and similarity calculations.

[0025] In this embodiment of the application, in order to process the shared flow records existing among multiple tenant identifiers during the splitting process, for flow records in the 5-tuple where the source and destination belong to different tenants, the flow record is copied to the metadata set of each tenant sub-flow involved, and a direction marker field is added to the copied record to distinguish whether the tenant is the sender or receiver of the traffic, thereby completing the independent splitting without omitting cross-tenant traffic information.

[0026] Step 3: Generate log semantic embedding vectors and perform tenant-level log and streaming data association pairing; For each log entry in the original log dataset, a pre-trained network operations semantic encoder is used to generate a log semantic embedding vector. Spatiotemporal association matching is performed based on the timestamp and IP port information in the log entry and the 5-tuple and timestamp in the metadata set of each tenant's sub-flow. The successfully matched log entries are paired with the corresponding tenant's flow metadata records to generate log-flow association pairs with tenant identifiers.

[0027] It should be noted that the aforementioned network operations semantic encoder refers to a text encoder pre-trained using a large-scale log corpus in the field of network operations as training data and through a masked language modeling task. The input of the network operations semantic encoder is a log text, and the output is a fixed-dimensional log semantic embedding vector. In this embodiment, the network operations semantic encoder is used as a pre-trained inference component and does not involve the training process.

[0028] It should be noted that the aforementioned spatiotemporal correlation matching refers to the process of establishing a correspondence between log entries and tenant sub-flow metadata records based on both the time and network address dimensions. In the time dimension, using the timestamp of the log entry as the anchor point, flow records whose timestamps fall within a preset time window are retrieved as candidates. In the network address dimension, the IP address and port information parsed from the log entry are matched with the five-tuple of the candidate flow records. A successful match is determined when the IP address and port information match the source or destination in the five-tuple.

[0029] In this embodiment of the application, when a log entry is successfully matched with multiple tenants' flow records during the matching process, the log entry is included in the association pair of each matching tenant according to the number of matched flow records, and each association pair is processed independently in the subsequent fusion encoding stage to avoid information overlap between different tenants.

[0030] Step 4: Perform bimodal fusion encoding on the log and streaming data association pairs; For each log-flow association pair with a tenant identifier, the log semantic embedding vector and the corresponding tenant-level traffic feature vector are input into a bimodal fusion encoder to generate a tenant-level fused semantic vector. For isolated log entries and isolated flow records that fail to pair, their unimodal vectors are used as semantic representations.

[0031] It should be noted that the aforementioned bimodal fusion encoder consists of four components: a log semantic projection layer, a traffic feature projection layer, a cross-attention layer, and a fusion output layer. The input to the bimodal fusion encoder is the log semantic embedding vector. With tenant-level traffic feature vector The resulting association pairs output as tenant-level fused semantic vectors. .in, For log semantic embedding vectors, This is a tenant-level traffic feature vector. This is a tenant-level fused semantic vector.

[0032] Furthermore, the log semantic projection layer receives the log semantic embedding vector. Log semantics are embedded into vectors through linear transformation. Projecting onto the shared semantic space, outputting log projection vectors. The calculation process is as follows: in, For log projection vectors, This is the weight matrix of the log semantic projection layer. This is the bias vector for the log semantic projection layer.

[0033] Furthermore, the traffic feature projection layer receives tenant-level traffic feature vectors. The tenant-level traffic feature vector is transformed by linear transformation. Projecting onto the same shared semantic space, outputting the flow projection vector. The calculation process is as follows: in, For flow projection vector, The weight matrix of the flow feature projection layer. This is the bias vector for the flow feature projection layer.

[0034] Furthermore, the cross-attention layer... As a query, The first attention output is computed as both key and value. At the same time As a query, The second attention output is calculated using the key and value. The attention outputs in both directions mentioned above are calculated using scaled dot product attention. The calculation process is as follows: in, To share the vector dimension of the semantic space, Indicates transpose. This represents the log modality's focus on the traffic modality. This represents the attention given to the log mode by the traffic mode.

[0035] Furthermore, the fusion output layer will and After concatenation, the vectors are mapped through a fully connected layer to generate tenant-level fused semantic vectors. The calculation process is as follows: in, To fuse the weight matrix of the output layer, To fuse the bias vector of the output layer, This indicates a vector concatenation operation.

[0036] Furthermore, the training of the dual-modal fusion encoder employs a contrastive learning approach. Association pairs of the same tenant and the same fault type are used as positive sample pairs, while association pairs of different tenants or different fault types are used as negative sample pairs. A contrastive loss function drives the tenant-level fusion semantic vectors to cluster similar fault events and separate dissimilar fault events in the semantic space. Specifically, the contrastive loss function... The calculation method is as follows: For a positive sample pair, its tenant-level fused semantic vector is denoted as... and Contains in the same training batch For each sample: in, To compare loss functions, The total number of samples within the batch. The index variable for summation takes values from... arrive , The index variable for summation takes values from... arrive , For the first batch Tenant-level fused semantic vectors of each sample To and The tenant-level fused semantic vector that constitutes a positive sample pair For the first batch Tenant-level fused semantic vectors of each sample Represents tenant-level fusion semantic vector and Cosine similarity between them Represents tenant-level fusion semantic vector and Cosine similarity between them This is a temperature hyperparameter used to control the smoothness of the similarity distribution. The smaller the value, the sharper the distribution and the more sensitive it is to distinguishing difficult negative samples. The larger the value, the smoother the distribution. For indicator functions, when The value is 1 if the condition is met, and 0 otherwise. This represents an exponential function. The Adam optimization algorithm is used to update all trainable parameters of the log semantic projection layer, traffic feature projection layer, cross-attention layer, and fusion output layer.

[0037] Furthermore, the positive sample pairs in the above-mentioned contrastive loss function are determined as follows: In the training dataset, for each association pair with tenant identifier and fault type label, one is randomly selected from the set of association pairs with the same tenant and the same fault type label as its positive sample pair; negative sample pairs are selected from the set of association pairs with different tenants or different fault type labels, and participate in the loss calculation together with the positive sample pairs in the same training batch.

[0038] Through the bidirectional calculation of the aforementioned cross-attention, the mutual information weights of the two modalities are quantified, enabling the parts of the logs with insufficient semantic distinguishability to obtain supplementary information from the traffic behavior patterns.

[0039] Step 5: Perform hierarchical agglomerative clustering based on the tenant dimension to generate a set of fault event clusters; All tenant-level fused semantic vectors and their corresponding unimodal semantic representations are grouped according to the tenant dimension. Within each tenant group, hierarchical agglomerative clustering is performed using the cosine distance between vectors as the merging criterion to generate a set of multimodal semantic event clusters specific to each tenant. Each event cluster corresponds to a semantically equivalent network device failure event for that tenant, and is output as the result of cloud computing network device failure data processing.

[0040] It should be noted that the above-mentioned hierarchical agglomerative clustering takes as input the set of all tenant-level fused semantic vectors and single-modal semantic representation vectors within a tenant group, and outputs the set of multimodal semantic event clusters for that tenant. The clustering process uses each vector as an initial cluster, calculates the cosine distance between all cluster pairs in each iteration, merges the two clusters with the smallest distance, and repeats this process until the cosine distance between any two clusters is greater than a preset merging threshold. The remaining clusters are the set of multimodal semantic event clusters for that tenant.

[0041] Furthermore, the cosine distance between the above-mentioned cluster pairs is calculated as follows: the average cosine distance between all vector pairs within two clusters is taken as the distance metric for that cluster pair, i.e., the average link criterion is adopted; where the cosine distance between a single vector pair is equal to 1 minus the cosine similarity of that vector pair.

[0042] It should be further clarified that the aforementioned preset merging threshold refers to the upper bound of the cosine distance controlling the termination condition of inter-cluster merging during hierarchical agglomerative clustering, and its value range is [value missing]. The merging threshold determines the granularity of the clustering results: a smaller merging threshold requires the two clusters to be semantically similar, resulting in more event clusters and more uniform semantics within each cluster; a larger merging threshold allows vectors with greater semantic differences to be merged into the same cluster, resulting in fewer event clusters. This merging threshold is set based on the semantic distribution characteristics of fault events in the actual deployment environment.

[0043] In this embodiment of the application, in order to further infer the propagation path of the fault from the fault event cluster, the following steps are also included: Step 6: Infer the independent fault propagation link graph of each tenant on the network topology based on tenant-level event clusters; For each tenant's multimodal semantic event cluster set, the device identifier and timestamp associated with the records within each event cluster are extracted, and the corresponding nodes are located on the network topology graph. For any two event clusters within the same tenant, the topology hop count, time difference, and packet size distribution similarity of the tenant-level traffic feature vectors in the two event clusters are calculated. When the topology adjacency condition, propagation delay constraint, and traffic association threshold are all satisfied, causal association edges are established, generating an independent fault propagation link graph for each tenant.

[0044] It should be noted that the above-mentioned topological adjacency condition means that the shortest path hop count of the network device nodes corresponding to the two event clusters on the network topology graph does not exceed a preset hop count threshold. The above-mentioned propagation delay constraint means that the difference between the mean timestamps of the two event clusters is within a preset time interval formed by the minimum and maximum propagation delays, and the chronological order is consistent with the topological direction. The above-mentioned traffic association threshold means that the similarity between the packet size distribution features in the tenant-level traffic feature vectors contained in each of the two event clusters is not lower than a preset distribution similarity threshold. This similarity is calculated using cosine similarity and is used to determine whether the two event clusters involve the same type of traffic behavior pattern. A causal propagation relationship is determined to exist between the two event clusters when all three conditions are met simultaneously.

[0045] Furthermore, the average timestamp of the two event clusters is calculated as follows: the arithmetic mean of the timestamps of all stream data records in each event cluster is taken as the average timestamp of that event cluster. The time sequence is determined by taking the event cluster with the smaller average timestamp as the propagation source and the event cluster with the larger average timestamp as the propagation destination. It is also required that the network device node corresponding to the propagation source is located upstream of the propagation destination in the topology direction. The two together constitute the directional determination basis for the propagation delay constraint.

[0046] It should be further explained that the aforementioned preset minimum propagation delay and maximum propagation delay together constitute the time interval of the propagation delay constraint. The minimum propagation delay is the lower bound of the physical link delay required for the fault signal to propagate between adjacent network device nodes, and the maximum propagation delay is the upper bound of the maximum time allowed for the fault impact to spread to the next hop device on the network path. Both are set according to the latency characteristics of the network link and the fault propagation rate in the actual deployment environment.

[0047] It should be further clarified that the aforementioned preset distribution similarity threshold refers to the cosine similarity value used to control the lower bound of the similarity between the size distribution characteristics of two event clusters in traffic association determination, and its value range is [missing value]. The distribution similarity threshold determines the strictness of establishing causal relationships: a higher threshold requires that the packet size distribution characteristics of two event clusters be more similar to establish a relationship, resulting in fewer edges and more accurate relationships in the generated independent fault propagation graph; a lower threshold allows relationships to be established between event clusters with significantly different packet size distribution characteristics, resulting in more edges in the generated independent fault propagation graph. This distribution similarity threshold is set based on the consistency of traffic distribution for similar fault events in the actual deployment environment.

[0048] In this embodiment of the application, in order to identify the common source of failure affecting multiple tenants, the following steps are included in addition to step 6: Step 7: Perform cross-tenant correlation analysis and generate comprehensive fault data processing results; Perform cross-tenant correlation analysis on the independent fault propagation path graphs of all tenants. Detect the spatiotemporal overlap of independent fault propagation path graphs of different tenants on the same network device node: when multiple tenants' independent fault propagation path graphs all contain fault event clusters within similar time windows of the same device node, mark that device node as a cross-tenant common fault node. Append the cross-tenant correlation information to the independent fault propagation path graphs of each tenant to generate comprehensive fault data processing results for cloud computing network devices and store them in the fault analysis database.

[0049] It should be noted that the aforementioned similar time window refers to a time interval formed by extending a preset time radius forward and backward from the average timestamp of a tenant's fault event cluster on the same device node. When the average timestamp of another tenant's fault event cluster on the same device node falls within this time interval, it is determined that the fault events of the two tenants have spatiotemporal overlap on that device node.

[0050] Furthermore, the aforementioned preset time radius refers to the time length parameter that controls the coverage of the time window in the cross-tenant spatiotemporal overlap determination. The value of the preset time radius is set according to the maximum time deviation that the same physical fault can produce observable effects on the traffic of different tenants in the actual deployment environment: the larger the preset time radius, the greater the deviation between the average timestamps of the fault event clusters of the two tenants is allowed, the higher the detection rate of common fault nodes across tenants, but the false labeling rate also increases accordingly; the smaller the preset time radius, the more concentrated the fault events of the two tenants are required in time, the lower the false labeling rate, but common faults with large time deviations may be missed.

[0051] The following is an example of an application of the present invention, such as Figure 2-8 As shown, the implementation process is as follows: A cloud computing service provider operates a multi-tenant data center, where a core top-of-rack switch (device number: SW-TOR-07) simultaneously carries fully encrypted network traffic for tenant A (financial business), tenant B (e-commerce business), and tenant C (video streaming business). In the early morning of a certain day in 20XX, the operations and maintenance platform received an anomaly alarm from SW-TOR-07. It was necessary to process the device's reported operation logs and encrypted stream metadata to locate the fault propagation path and identify the specific affected tenants. The initial conditions were: a target time range of 15 minutes before and after the alarm, resulting in the collection of 47 raw log entries and 312 encrypted stream metadata records.

[0052] The operations and maintenance platform collects raw log datasets and encrypted stream metadata record sets within the target time range from SW-TOR-07, and reads the tenant isolation identifier mapping table of the device. This data center adopts a VxLAN isolation scheme, establishing a mapping relationship between stream records and tenant identifiers through the VxLANVNI field: VNI 10101 corresponds to tenant A, VNI 20202 corresponds to tenant B, and VNI 30303 corresponds to tenant C. Each of the 312 encrypted stream metadata records carries five-tuple information, packet size sequence, packet interval time sequence, TLS handshake phase identifier, and the corresponding VNI field.

[0053] Table 1. SW-TOR-07 Tenant Isolation Identifier Mapping Table (Partial Example) According to statistics, there are 128 records corresponding to tenant A, 107 records corresponding to tenant B, 71 records corresponding to tenant C, and 6 cross-tenant records, forming a multi-source device data set to be processed.

[0054] Based on the VNI field, the 312 stream metadata records were split into three tenant sub-stream metadata datasets. For the 6 cross-tenant records such as FL-005 (source belonging to tenant B, destination belonging to tenant A), they were copied to the tenant B subset and the tenant A subset, and a direction marker field ("sender" or "receiver") was added to each, so that the tenant A subset ultimately contained 134 records, the tenant B subset contained 113 records, and the tenant C subset maintained 71 records.

[0055] Subsequently, traffic features were extracted from the three tenant sub-stream metadata datasets. Taking tenant A as an example, the mean, variance, skewness, and kurtosis were calculated for the packet size sequence of its 134 stream records; the autocorrelation coefficient was calculated for the packet interval time series; and the state transition probability was statistically analyzed for the TLS handshake phase identifier sequence. The calculation of the TLS handshake state transition probability, taking tenant A as an example: in the 134 stream records, ServerHello appears 131 times after ClientHello, and ClientHello appears a total of 134 times, so the transition probability is... ; The number of times the Alert message appears immediately after ClientHello is 3. The abnormal transfer rate was significantly higher than the normal value, indicating that tenant A exhibited a TLS handshake failure pattern during this period. Z-score standardization was performed on all components after concatenating the features from each dimension to eliminate differences in dimensionality.

[0056] Table 2 Key dimensions of tenant-level traffic feature vectors for three tenants (after standardization) As shown in Table 2, the standardized value of the TLS anomaly transfer probability of tenant A (2.15) is significantly higher and the packet interval autocorrelation coefficient (-1.82) is significantly lower, indicating that the traffic of tenant A is obviously abnormal during this period; the characteristics of tenant B and tenant C are close to the mean level in each dimension, with no obvious abnormal characteristics.

[0057] The operations and maintenance platform inputs each of the 47 original log entries into the network operations and maintenance semantic encoder to generate a log semantic embedding vector with a dimension of 768. Then, using the timestamp of each log entry as the anchor point, it searches for candidate flow records with matching timestamps within a preset time window (±2 seconds), and then compares the IP address and port information parsed from the log with the five-tuple of the candidate flow records.

[0058] Taking log entry LOG-031 as an example: the timestamp of this entry is 02:17:43, and the record content is "InterfaceGi0 / 1 TLS negotiation timeout detected, src 10.10.1.56:52340". The IP address 10.10.1.56 and port 52340 are resolved. Within the time window, FL-004 (timestamp 02:17:44, source 10.10.1.56:52340) is found. The source and source are completely consistent, so the match is successful and it is assigned to the associated pair of tenant A.

[0059] Of the 47 log entries, 38 were successfully matched with stream records (35 of which belonged to a single tenant and 3 matched stream records from multiple tenants), and 9 were isolated log entries.

[0060] Table 3. Results of pairing log entries with stream records (partial examples) For each log-flow association pair with a tenant identifier, the log semantic embedding vector (768 dimensions, denoted as log projection input) and the corresponding tenant-level traffic feature vector (42 dimensions, denoted as traffic projection input) are input into the bimodal fusion encoder to generate a tenant-level fusion semantic vector with a dimension of 256.

[0061] Taking the tenant A association pair consisting of LOG-031 and FL-004 as an example, the data flow of fusion encoding is illustrated: The log semantic projection layer projects the 768-dimensional log semantic embedding vector onto the shared semantic space through a weight matrix (shape 256×768), outputting a 256-dimensional log projection vector; the traffic feature projection layer projects the 42-dimensional traffic feature vector onto the same shared semantic space through a weight matrix (shape 256×42), outputting a 256-dimensional traffic projection vector. In the cross-attention layer, the scaling factor is calculated using the log projection vector as the query and the traffic projection vector as the key and value. The first attention output (256-dimensional) is obtained, which captures the supplementary information of traffic behavior to log semantics. The second attention output (256-dimensional) is obtained by using the traffic projection vector as the query and the log projection vector as the key and value. The fusion output layer concatenates the two attention outputs into a 512-dimensional vector, which is then mapped by a weight matrix (shape 256×512) to finally output a 256-dimensional tenant-level fusion semantic vector.

[0062] For the nine isolated log entries, including LOG-039, their log semantic embedding vectors (after dimensionality reduction to 256 dimensions) were directly used as the single-modal semantic representations to participate in subsequent clustering.

[0063] Table 4. Specifications of Fusion Encoding Input and Output Data All 256-dimensional vectors were grouped by tenant identifier: Tenant A group contained 35 fused semantic vectors and 2 single-modal vectors, for a total of 37 vectors; Tenant B group contained 21 vectors; Tenant C group contained 14 vectors; and the single-modal vectors of 9 isolated logs were grouped separately for clustering.

[0064] Hierarchical agglomerative clustering was performed within tenant A group, with a merging threshold set to 0.35 (cosine distance) and the average linkage criterion used. Initially, each vector formed its own cluster, resulting in 37 initial clusters. After iterative merging, the process stopped when the cosine distance between any two clusters exceeded 0.35, ultimately forming 3 event clusters.

[0065] Taking two vectors from tenant A as an example, let's illustrate the calculation of cosine distance: the fused semantic vector generated by LOG-028 and FL-001, and the fused semantic vector generated by LOG-031 and FL-004, have a cosine similarity of 0.91. Therefore, the cosine distance is... Since the value is much smaller than the merging threshold of 0.35, the two are merged into the same cluster, corresponding to the same type of TLS handshake failure event.

[0066] Table 5 Summary of clustering results for each tenant For tenant A's three event clusters, extract the device identifier and average timestamp associated with the flow metadata records within each cluster. The device associated with cluster A-cluster 1 is SW-TOR-07, with an average timestamp of 02:17:41; the devices associated with cluster A-cluster 2 are SW-TOR-07 and the upstream core switch SW-CORE-02, with average timestamps of 02:17:53 and 02:18:09, respectively.

[0067] Causal correlation determination was performed between A-cluster 1 and A-cluster 2: the topology hop count was 1 (direct connection from SW-TOR-07 to SW-CORE-02), meeting the hop count threshold (preset to 2 hops); the time difference was... The events within the preset propagation delay range (5 to 60 seconds) and whose timing is consistent with the topological direction (SW-TOR-07 is upstream) are within seconds. The cosine similarity of the packet size distribution features of the two clusters' traffic feature vectors is 0.87, which is higher than the preset distribution similarity threshold of 0.75. All three conditions are met, and a causal link is established from A-cluster 1 to A-cluster 2, forming an independent fault propagation path for tenant A: SW-TOR-07 (TLS handshake anomaly) → SW-CORE-02 (buffer backlog). The event clusters of tenants B and C exist independently because they do not meet the propagation delay constraints and do not form a propagation path.

[0068] Spatiotemporal overlap detection is performed on the SW-TOR-07 node for the independent fault propagation link graphs of the three tenants. The preset time radius is 30 seconds.

[0069] Centered on tenant A's average timestamp of cluster A1 on SW-TOR-07 (02:17:41), a time window is formed by extending 30 seconds forward and backward [02:17:11, 02:18:11]. Tenant B's average timestamp of cluster B1 on SW-TOR-07 is 02:17:58, falling within this window; tenant C's average timestamp of cluster C1 on SW-TOR-07 is 02:18:04, also falling within this window. Therefore, SW-TOR-07 is marked as a cross-tenant common fault node, affecting all three tenants.

[0070] Information on common fault nodes across tenants is appended to the independent fault propagation chain diagram of each tenant, generating comprehensive fault data processing results and storing them in the fault analysis database.

[0071] Table 6. Results of Comprehensive Fault Data Processing The data flow throughout the implementation process demonstrates a clear logical progression: Step 1 collects 312 mixed stream metadata records and 47 log entries, which serve as the raw input for all subsequent steps; Step 2 uses the VNI field to split the mixed data into three tenant subsets and extracts standardized tenant-level traffic feature vectors, completely isolating the behavioral patterns of different tenants; Step 3 establishes association pairs between the log semantic embedding vector and the tenant-level traffic feature vectors produced in Step 2 through spatiotemporal matching, achieving precise alignment of the two types of heterogeneous data; Step 4 inputs the association pairs into a dual-modal fusion encoder, using a cross-attention mechanism to supplement the log semantic representation with fault clues from traffic behavior patterns. Output a 256-dimensional fused semantic vector; Step 5: Based on the fused semantic vector, perform independent clustering within the tenant dimension, grouping semantically similar fault events into event clusters. Tenant A's TLS handshake anomaly cluster is formed by the high TLS anomaly transfer probability discovered in Step 2 and the fusion enhancement in Step 4; Step 6: Utilize the device identifier and timestamp information of the event clusters to establish a causal propagation link on the network topology, realizing the inference from isolated event clusters to propagation paths; Step 7: Overlay cross-tenant spatiotemporal overlap analysis on the independent results of each tenant, ultimately identifying SW-TOR-07 as the common fault root cause affecting all three tenants, forming a complete comprehensive fault data processing result.

[0072] It is understood that data preprocessing methods known to those skilled in the art include data cleaning, data transformation, and data reduction. Data transformation includes type conversion and normalization and standardization. Although the dimensions and types of data were omitted in the description of the preceding embodiments, data preprocessing is a technical knowledge known to those skilled in the art and a prerequisite step in data processing. Therefore, the previously described well-known data preprocessing steps were not described independently.

[0073] The embodiments of the present invention have been described above. However, the embodiments are not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make more equivalent embodiments under the guidance of the present embodiments, and all of them are within the protection scope of the present embodiments.

Claims

1. A data processing method for computer network devices based on cloud computing, characterized in that, Includes the following steps: Obtain the raw log dataset, encrypted stream metadata record set, and tenant isolation identifier mapping table of the cloud computing shared network device within the target time range; Based on the tenant isolation identifier mapping table, the encrypted flow metadata record set is split at the tenant level. The flow metadata on the same network device is separated into multiple tenant sub-flow metadata datasets according to the tenant identifier. Traffic features are extracted from each tenant sub-flow metadata dataset to generate a tenant-level traffic feature vector. For each log entry in the original log dataset, a log semantic embedding vector is generated using a pre-trained network operation and maintenance semantic encoder. Spatiotemporal association matching is performed based on the timestamp and IP port information in the log entry and the 5-tuple and timestamp in the metadata set of each tenant sub-flow to generate log-flow association pairs with tenant identifiers. For each of the association pairs, the log semantic embedding vector and the corresponding tenant-level traffic feature vector are input into the dual-modal fusion encoder, and a tenant-level fusion semantic vector is generated through bidirectional computation of cross-attention. All tenant-level fused semantic vectors are grouped according to the tenant dimension. Within each tenant group, hierarchical agglomerative clustering is performed using the cosine distance between vectors as the merging criterion to generate a set of multimodal semantic event clusters exclusive to each tenant, which is output as the result of fault data processing.

2. The data processing method for computer network equipment based on cloud computing according to claim 1, characterized in that, The extraction of traffic features for each tenant sub-flow meta-dataset includes: The mean, variance, skewness, and kurtosis of the packet size sequence are calculated as statistical distribution characteristics. The autocorrelation coefficient of the packet interval time series is calculated as a time rhythm feature; The occurrence count of adjacent state pairs in the TLS handshake phase identifier sequence is counted. The transition probability between each handshake state is obtained by dividing the occurrence count of each state pair by the total occurrence count of the corresponding predecessor state. The transition probabilities of all state pairs are arranged in a fixed order and concatenated to form a state transition feature sub-vector. The above-mentioned features are concatenated to form a tenant-level traffic feature vector, and the Z-score is applied to the components of each dimension for normalization.

3. The data processing method for computer network equipment based on cloud computing according to claim 1, characterized in that, For flow records in a 5-tuple where the source and destination belong to different tenants, the flow record is copied to the metadata set of each tenant's sub-flow, and a direction marker field is added to the copied record to distinguish whether the tenant is the sender or receiver of the traffic.

4. The data processing method for computer network equipment based on cloud computing according to claim 1, characterized in that, The spatiotemporal correlation matching includes: In terms of time dimension, the timestamp of the log entry is used as the anchor point, and stream records whose timestamps fall within the preset time window are retrieved as candidates. At the network address dimension, the IP address and port information parsed from the log entries are matched with the five-tuple of the candidate flow record. When the IP address and port information match the source or destination in the five-tuple, the match is considered successful. If a log entry is successfully matched with the flow records of multiple tenants at the same time, the log entry is included in the association pair of each matching tenant, and each association pair is processed independently in the subsequent fusion encoding stage.

5. The data processing method for computer network equipment based on cloud computing according to claim 1, characterized in that, The dual-modal fusion encoder consists of a log semantic projection layer, a traffic feature projection layer, a cross-attention layer, and a fusion output layer, wherein: The log semantic projection layer projects the log semantic embedding vector onto the shared semantic space through linear transformation, generating a log projection vector. The traffic feature projection layer projects the tenant-level traffic feature vectors onto the same shared semantic space through linear transformation, generating a traffic projection vector. The cross-attention layer uses the log projection vector as the query and the traffic projection vector as the key and value, and calculates the first attention output by scaling dot product attention. At the same time, it uses the traffic projection vector as the query and the log projection vector as the key and value, and calculates the second attention output by scaling dot product attention, where the scaling factor is the square root of the vector dimension of the shared semantic space. The fusion output layer concatenates the first attention output and the second attention output, and then maps them through a fully connected layer to generate a tenant-level fusion semantic vector.

6. The data processing method for computer network equipment based on cloud computing according to claim 5, characterized in that, The training of the dual-modal fusion encoder adopts a contrastive learning approach, with association pairs of the same tenant and the same fault type as positive sample pairs and association pairs of different tenants or different fault types as negative sample pairs. The contrastive loss function is used for training. The contrastive loss function is as follows: for each sample in the batch, the cosine similarity between the tenant-level fusion semantic vector of the sample and the tenant-level fusion semantic vector of its positive sample pair is calculated, divided by the temperature hyperparameter, and the exponential value is taken as the numerator. The sum of the cosine similarity between the tenant-level fused semantic vector of the sample and the tenant-level fused semantic vectors of all other samples in the batch, divided by the temperature hyperparameter, is used as the denominator. The negative value of the logarithm of the ratio of the numerator to the denominator is then calculated, and the negative logarithm is averaged over all samples in the batch to obtain the contrast loss function value.

7. The data processing method for computer network equipment based on cloud computing according to claim 1, characterized in that, The hierarchical agglomerative clustering includes: Using each vector as the initial cluster, calculate the cosine distance between all cluster pairs in each iteration, merge the two clusters with the smallest distance, and repeat the process until the cosine distance between any two clusters is greater than the preset merging threshold. The cosine distance between the cluster pairs is calculated using the average link criterion, which is to take the average of the cosine distances between all vector pairs within two clusters as the distance metric for that cluster pair. The cosine distance between a single vector pair is equal to 1 minus the cosine similarity of that vector pair.

8. The data processing method for computer network equipment based on cloud computing according to claim 1, characterized in that, Also includes: For each tenant's multimodal semantic event cluster set, extract the device identifier and timestamp associated with the records in each event cluster, and locate the corresponding node on the network topology map; For any two event clusters within the same tenant, calculate the topology hop count, time difference, and packet size distribution similarity of the tenant-level traffic feature vectors in the two event clusters; When the shortest path hop count of the network device nodes corresponding to two event clusters on the network topology graph does not exceed the preset hop count threshold, the difference between the average timestamps of the two event clusters is within the preset time interval formed by the minimum propagation delay and the maximum propagation delay, and the time sequence is consistent with the topology direction, and the cosine similarity between the packet size distribution characteristics of the two event clusters is not lower than the preset distribution similarity threshold, causal association edges are established, and an independent fault propagation link graph for each tenant is generated.

9. The data processing method for computer network equipment based on cloud computing according to claim 8, characterized in that, Also includes: Perform cross-tenant correlation analysis on the independent fault propagation link graph of all tenants, and form a time interval by extending a preset time radius forward and backward, with the average timestamp of the fault event cluster of a certain tenant on a certain device node as the center; When the average timestamp of a cluster of fault events of another tenant on the same device node falls within this time interval, it is determined that the fault events of the two tenants have spatiotemporal overlap on the device node, and the device node is marked as a cross-tenant common fault node. Cross-tenant association information is attached to the independent fault propagation link diagram of each tenant to generate comprehensive fault data processing results for cloud computing network devices and store them in the fault analysis database.

10. A cloud computing-based computer network device data processing system, used to execute the cloud computing-based computer network device data processing method according to any one of claims 1 to 9, characterized in that, include: The multi-source data acquisition module is used to acquire the raw log dataset, encrypted stream metadata record set, and tenant isolation identifier mapping table of cloud computing shared network devices within the target time range; The tenant-level splitting and feature extraction module is used to perform tenant-level splitting on the encrypted flow metadata record set based on the tenant isolation identifier mapping table, to separate the flow metadata on the same network device into multiple tenant sub-flow metadata datasets according to the tenant identifier, to extract traffic features for each tenant sub-flow metadata dataset, and to generate a tenant-level traffic feature vector. The association pairing module is used to generate log semantic embedding vectors for each log entry in the original log dataset using a pre-trained network operation and maintenance semantic encoder, and to perform spatiotemporal association matching with each tenant sub-flow meta-dataset based on timestamp and IP port information to generate log-flow association pairs with tenant identifiers. The dual-modal fusion encoding module is used to input the log semantic embedding vector and the corresponding tenant-level traffic feature vector into the dual-modal fusion encoder for each association pair, and generate a tenant-level fusion semantic vector through bidirectional calculation of cross attention; The clustering module is used to group all tenant-level fused semantic vectors according to the tenant dimension. Within each tenant group, hierarchical agglomerative clustering is performed using the cosine distance between vectors as the merging criterion to generate a set of multimodal semantic event clusters exclusive to each tenant.