A heterogeneous data-driven communication engineering efficiency evaluation cloud platform

By employing technologies such as adaptive acquisition frequency and homomorphic encryption, the problems of accessing and protecting privacy of multi-source heterogeneous data in modern communication networks have been solved, enabling efficient data fusion analysis and accurate performance evaluation, thereby improving network reliability and user experience.

CN122241376APending Publication Date: 2026-06-19GUANGDONG SOUTHERN COMM CONSTR CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGDONG SOUTHERN COMM CONSTR CO LTD
Filing Date
2026-04-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies cannot effectively solve the problems of unified adaptive access, privacy protection, and data fusion analysis of multi-source heterogeneous data in modern communication networks, resulting in low data quality, high risk of privacy leakage, and inability to accurately locate the root cause and optimize the network.

Method used

A cloud platform for evaluating the performance of communication engineering based on heterogeneous data was designed. Through technologies such as adaptive acquisition frequency, homomorphic encryption, spectral clustering, and federated graph neural networks, it achieves intelligent access, privacy protection, and in-depth analysis of multi-source data, and generates performance evaluation reports.

Benefits of technology

It enables intelligent and compliant access and privacy protection for multi-source heterogeneous data, improves the network reliability, resource utilization and user experience of communication networks, and provides accurate root cause identification and network optimization support.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241376A_ABST
    Figure CN122241376A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of performance evaluation and cloud computing technology, specifically relating to a cloud platform for performance evaluation of communication engineering based on heterogeneous data. The platform includes a data acquisition module, a data processing module, a data mining module, and a performance evaluation module. It accesses multi-source heterogeneous data from the source through adaptive acquisition and privacy labeling; employs an improved boxplot algorithm with fused spectral clustering pre-classification for accurate anomaly detection; innovatively combines federated graph neural networks and secure multi-party computation to achieve cross-domain deep mining and correlation analysis while protecting privacy; and utilizes dynamic Bayesian networks and uncertainty-driven models for adaptive performance evaluation. Finally, through closed-loop feedback, it dynamically optimizes the acquisition strategy and network configuration. This invention achieves synergy between privacy protection, accurate evaluation, and intelligent decision-making, providing an effective solution for the refined operation and maintenance and automated optimization of communication networks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of performance evaluation and cloud computing technology, specifically relating to a cloud platform for performance evaluation of communication engineering based on heterogeneous data. Background Technology

[0002] With the rapid evolution of 5G, IoT, and edge computing, modern communication networks exhibit complex characteristics of ultra-denseness, heterogeneity, and cloud-network integration. Their operation and management generate massive amounts of multi-source heterogeneous data. Analyzing this multi-source heterogeneous data helps determine network performance, identify communication efficiency, and pinpoint root causes of problems. This data originates from various sources, including network devices, performance probes, user signaling, service logs, and external environmental systems, exhibiting significant differences in structure, format, semantics, and timeliness, leading to increasingly prominent "data silos." Furthermore, this data often contains sensitive information such as user privacy, service configurations, and network topology. Therefore, throughout the entire data processing chain, it is crucial to ensure data utility and fusion analysis while effectively implementing end-to-end privacy protection.

[0003] Existing systems comprise three key data processing stages: acquisition and integration, preprocessing, and analysis and evaluation. At the data acquisition and integration level, traditional methods lack unified adaptive access capabilities for multi-source heterogeneous data. Acquisition strategies are mostly statically configured, unable to be dynamically optimized based on data value density, real-time network load, and privacy classification. This can lead to the acquisition process itself potentially over-exposing sensitive fields or relationships, causing not only data quality and timeliness issues but also creating privacy leakage risks for subsequent stages. The preprocessing stage typically relies on anomaly detection methods based on fixed thresholds or standard distribution assumptions, which are insufficiently adaptable to the prevalent non-Gaussian, skewed, and periodic sequences in communication data. Furthermore, during cleaning, alignment, and fusion, only basic desensitization or anonymization methods are usually employed, making it difficult to resist re-identification attacks in complex data association calculations and failing to meet increasingly stringent privacy compliance requirements. Consequently, this delivers a data foundation with insufficient credibility and privacy security to subsequent analysis modules.

[0004] During the data analysis and performance evaluation phase, the continuation and amplification of defects in the preceding stages result in low-quality, unintegrated data inputs that pose privacy risks. This leads to the generation of root cause analysis and optimization strategies based on in-depth mining of multi-source data being one-sided and delayed, failing to support accurate root cause localization and network optimization. Summary of the Invention

[0005] To address the aforementioned problems in existing technologies, this invention provides a cloud platform for evaluating the performance of communication engineering based on heterogeneous data. This platform solves the problem that existing technologies cannot comprehensively and accurately evaluate modern complex communication networks due to data silos, inefficient processing, static evaluation, and weak privacy.

[0006] The objective of this invention can be achieved through the following technical solution: a cloud platform for evaluating the performance of communication engineering based on heterogeneous data, the platform comprising the following components connected in sequence for communication: The data acquisition module is used to collect multi-source indicator data from multiple heterogeneous data sources according to the set adaptive acquisition frequency, and add an initial privacy label corresponding to its sensitivity level to the collected multi-source indicator data. The data processing module is used to perform anomaly detection and correction on the multi-source indicator data after homomorphic encryption based on the initial privacy identifier, and to perform entity alignment, semantic mapping and data fusion on the corrected data to generate a fusion vector and a knowledge graph of association relationships. The data mining module is used to predict network performance trends and identify abnormal propagation paths using a pre-built predictive diagnostic model based on fused vectors and association knowledge graphs. At the same time, in the encrypted state, it uses an encrypted association mining algorithm based on a secure multi-party protocol to obtain the association between data. The predictive diagnostic model is a federated graph neural network that combines attention mechanism and differential privacy. The performance evaluation module receives network performance trend prediction results, abnormal propagation paths, and data correlation analysis results. It uses a preset performance evaluation model to perform multi-dimensional performance quantification and generate a performance evaluation report that includes a comprehensive performance index, indicator weakness analysis, and root cause localization.

[0007] Preferably, in the data acquisition module, the importance weight of the data source, the real-time network load status, and the data's own rate of change are calculated and determined using an adaptive heartbeat protocol algorithm. The formula for calculating the next acquisition interval is: ; In the formula, For data source At the data collection interval of time t For data source At the data collection interval of time t+1 For data source Importance weights This indicates the real-time network load status. For the rate of change of data, The weighting coefficients are customized, and the sum of the three is 1.

[0008] Preferably, in the data processing module, anomaly detection is performed on the encrypted multi-source indicator data using a combination of spectral clustering algorithm and box plot algorithm: first, spectral clustering is used to group data subsequences with similar periodicity or trend characteristics; then, within each homogeneous group, a box plot algorithm based on skewness adjustment is independently applied to calculate the outlier determination boundary; finally, outlier detection is performed on the multi-source indicator data based on the outlier determination boundary of the group to which any multi-source indicator data belongs.

[0009] Preferably, in the data processing module, the specific process of using spectral clustering algorithm combined with box plot algorithm to perform anomaly detection on the encrypted multi-source indicator data includes: Obtain the time series data corresponding to the multi-source indicator data, and calculate the time delay embedding matrix of each indicator data in the time series data to construct a phase space representation; Based on the phase space representation, the similarity matrix between data points of each index data is calculated, and the spectral clustering algorithm is applied to divide the data sequence into multiple subgroups with similar intrinsic patterns. Within each subgroup, the first quartile, median, and third quartile of its data are independently calculated, and the interquartile range is calculated. Then, the local skewness adjustment factor for each subgroup is calculated. Based on the local skewness adjustment factor, the outlier determination boundary of the subgroup is calculated; Each data point is compared with the outlier threshold of its subgroup to determine whether it is an outlier.

[0010] Preferably, before publishing the relationship knowledge graph, the data processing module further includes adding adaptive noise that satisfies differential privacy to the edge weights between entities in the relationship knowledge graph. The final calculation formula for the edge weights published to the knowledge graph is: ; In the formula, The weight values ​​that will ultimately be published into the knowledge graph. Let i be the original edge weights of entity i and entity j. It is Laplace noise. For Laplace scaling parameters, These represent the global sensitivity of edge weights and the adaptively adjusted effective privacy budget, respectively.

[0011] Preferably, in the data mining module, constructing the predictive diagnostic model includes the following process: Step 1: Each participant initializes a graph attention network as a local model; Step 2: In each round of training, each party first calculates the parameter gradient of its local model, and performs norm clipping on the parameter gradient to control sensitivity. The clipped parameter gradient is then perturbed by injecting noise that meets the differential privacy requirements to obtain the final perturbed gradient. Step 3: Securely aggregate the final perturbation gradients calculated by each participant to obtain the aggregated gradient, and use the aggregated gradient to update the global model parameters to obtain the updated aggregated gradient; Each participant uses the updated aggregate gradient to perform the next round of local model update training; Step 4: After multiple rounds of iteration, the convergence condition is reached, and the final prediction and diagnostic model is obtained.

[0012] Preferably, the data mining module, in encrypted state, uses an encrypted correlation mining algorithm based on a secure multi-party protocol to obtain the correlation between data, including the following implementation process: Participant u sets a candidate itemset X and traverses the local transaction database to construct a binary indicator vector for the candidate itemset X, if and only if transaction t in the local transaction database contains itemset X, and the t-th element of the binary indicator vector is 1. Each participant calculates its confidence level and sums the multiple binary indicator vectors to obtain the local support. Using the GMW protocol, frequent itemsets are filtered based on local support, and rule strength is filtered based on confidence. When a rule is determined to be strongly associated if both filtering conditions are met, the participating parties collaborate to decrypt or reconstruct it to obtain the plaintext form of the strongly associated rule.

[0013] Preferably, in step 2, the formula for calculating the final perturbation gradient is: ; In the formula, The final perturbation gradient calculated for participant u. The gradient calculated for participant u is the cropped gradient from the original local gradient. Gaussian noise with zero mean and covariance. Let I be the covariance matrix and I be the identity matrix. C is the noise scale calculated based on the local privacy budget and training epochs, where C is the set clipping norm.

[0014] Preferably, in step 3, a multiplicative homomorphic encryption algorithm is used for secure aggregation, and the secure aggregation formula is: ; In the formula, In plaintext, U represents the sum of the aggregated perturbation gradients of all participants, where U is the total number of participants in the collaborative training. This is a multiplicative homomorphic encryption function. For participant u, the encrypted result of the final perturbation gradient. It is a multiplication function.

[0015] The beneficial effects of this invention are as follows: The data acquisition module of this invention achieves intelligent and compliant access to multi-source heterogeneous data through adaptive frequency and privacy labeling, breaking down data silos at the source and laying the foundation for end-to-end privacy protection. The data processing module employs an improved boxplot algorithm with fusion spectral clustering pre-classification for adaptive anomaly detection, and combines homomorphic encryption with evidence-based multi-source fusion to solve the problems of poor quality non-Gaussian data cleaning and cross-source data integration. The data mining module combines federated graph neural networks with secure multi-party association mining, achieving cross-domain deep analysis and association discovery under privacy protection. The performance evaluation module performs multi-dimensional performance quantification based on preprocessed multi-source data and generates a performance evaluation report. Under the comprehensive management of the privacy control policy distributor and the privacy risk monitoring module, each module achieves a dynamic balance between privacy protection and data utility, enabling the mining of data value from multi-source input data without leaving the domain or decryption, thus providing core enabling technologies for refined operation and cross-domain collaborative optimization, significantly improving network reliability, resource utilization, and user experience of the communication network. Attached Figure Description

[0016] To facilitate understanding by those skilled in the art, the present invention will be further described below with reference to the accompanying drawings.

[0017] Figure 1 This is a schematic diagram of the system structure of the present invention; Figure 2 This is a schematic diagram illustrating the process of constructing a predictive diagnostic model for the system data mining module of the present invention. Detailed Implementation

[0018] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided.

[0019] Please see Figures 1-2 This embodiment provides a cloud platform for evaluating communication engineering performance based on heterogeneous data. The cloud platform includes: The data acquisition module is used to collect multi-source indicator data from multiple heterogeneous data sources according to the calculated adaptive acquisition frequency, and add an initial privacy identifier corresponding to its sensitivity level to the collected multi-source indicator data. The adaptive acquisition frequency is determined by calculating the importance weight of the data source, the real-time network load status, and the data's own rate of change through an adaptive heartbeat protocol algorithm. Raw operational data is acquired in real-time or near real-time from various communication network devices, sensors, log systems, and databases to build a flexible and efficient unified data access layer. In implementation, a corresponding collector (such as an SNMP collector, log file listener, or database synchronization client) is configured for each data source. All collectors adhere to a unified metadata description standard and can automatically report the structure and characteristics of the data source. Based on preset data importance tags and real-time network monitoring, the system dynamically schedules the working frequency and data retrieval volume of these collectors, ensuring that critical data is collected preferentially and frequently, while automatically reducing the collection intensity of non-critical data during network congestion to avoid impacting the production network. The collected raw data, after being appended with timestamps, source markers, and other metadata, is pushed to the downstream preprocessing module in a unified message format.

[0020] The data acquisition module includes an adaptive heartbeat protocol algorithm that dynamically adjusts the next acquisition interval in real time based on the importance of the data source, the current network load, and the rate of change of the data itself. This ensures that critical data is not lost while optimizing the efficiency of network resource utilization and avoiding invalid or redundant data acquisition. The formula for calculating the next acquisition interval is as follows: ; In the formula, For data source At the data collection interval of time t For data source At the data collection interval of time t+1 For data source The importance weights are preset by the operations and maintenance personnel and are used to reflect the importance of the data source. The criticality of core business assessment This represents the real-time network load status, a normalized value obtained by continuously monitoring the bandwidth utilization of data transmission links. The rate of change is calculated by comparing the current and previous data collection values. The weighting coefficients are customized, and the sum of the three is 1.

[0021] The next data collection interval is calculated by updating the data at a set interval. If a data source is of high importance, the network is idle, and the data changes drastically, the calculation formula will output a smaller value, thereby shortening the waiting time for the next collection. Conversely, the collection interval is extended. The adaptive heartbeat protocol algorithm adjusts the collection interval for the next time step, avoiding the resource waste caused by traditional fixed collection frequency collection when the data is unchanged, or the loss of information caused by data fluctuations. This enables on-demand collection, improving the collection efficiency and resource utilization of the data collection module. When the network is under high load, the collection intensity of non-critical data is actively reduced, effectively preventing the data collection behavior itself from becoming a source of network congestion, ensuring the stable operation of the core network, and reducing the workload of manual optimization by automatically adapting to the characteristics of different data sources and fluctuations in the network environment.

[0022] The data processing module is used to homomorphically encrypt the multi-source indicator data according to the initial privacy identifier, and then use a spectral clustering algorithm combined with a box plot algorithm to detect and correct anomalies in the encrypted multi-source indicator data. The corrected data is then subjected to entity alignment, semantic mapping and data fusion to generate a fusion vector and a knowledge graph of association relationships. The data processing module includes the following units that are connected in sequence: a homomorphic encryption processing unit, an anomaly detection and correction unit, an alignment and fusion unit and a knowledge graph generation unit. A homomorphic encryption processing unit is used to perform homomorphic encryption on multi-source indicator data according to the initial privacy identifier; The anomaly detection and correction unit uses a spectral clustering algorithm combined with a box plot algorithm to perform anomaly detection and correction on the encrypted multi-source index data; The alignment and fusion unit is used to perform entity alignment, semantic mapping, and data fusion on the corrected data to generate a fusion vector. The knowledge graph generation unit receives data generated by the alignment and fusion unit, and each unit performs the following functions: The homomorphic encryption unit receives the raw multi-source indicator data set D from the data acquisition module. Each data item in set D is accompanied by an initial privacy identifier, which includes three levels: low, medium, and high, corresponding to the privacy level from low to high. The data items that need to be protected are subjected to additive homomorphic encryption based on the privacy identifier being medium or high.

[0023] Here, the determination of privacy identifiers includes: a) For data item metadata, scanning the metadata (such as field names, table names, source systems) of data items using pre-defined regular expressions and keyword libraries to match predefined sensitive patterns; b) For numerical data, using a pre-trained sensitive information recognition model (such as a NER model fine-tuned based on BERT) to identify whether the content contains personally identifiable information (PII) such as names, addresses, and ID card numbers, and determining the initial privacy identifier through a set threshold function; c) For text data, determining whether it is an identifier (such as IMSI with a fixed length), location data (such as latitude and longitude within a certain range), or a metric strongly correlated with personal behavior (such as a continuous call record pattern of a specific user); d) If the data item comes from a known data source or data stream that has been tagged, then it inherits the privacy policy tag from its upstream source.

[0024] Furthermore, since the privacy level of data items may change in subsequent data processing, a dynamic context risk assessment mechanism is introduced to adjust the initial label in real time. The dynamic context risk assessment mechanism first calculates the association risk amplification factor. By analyzing the potential association degree between the data item and other entities in the current processing task, it estimates the risk of the data item being re-identified or inferred in the fusion calculation. If the association risk exceeds the threshold, the label level is automatically upgraded. At the same time, based on the sensitivity of the specific processing operations (such as association mining and result publication) that the data item will undergo in the future, the label is further calibrated through the operation sensitivity factor, further refining the adjustment of the privacy label as the privacy level of the data changes.

[0025] Basic anonymization and access control are applied to data items with low privacy ratings to maximize data processing efficiency; Based on the privacy identifiers of medium and high, additive homomorphic encryption is applied to the data items that need protection. This includes: generating a homomorphic encryption key pair for the data to be encrypted, distributing the public key to the data processing module, and securely storing the private key in a trusted module or authorized party. By applying additive homomorphic encryption to the data items that need protection based on the privacy identifiers of medium and high, sensitive information is encrypted at the initial stage of data processing, ensuring that all subsequent calculations are performed in ciphertext or semi-ciphertext state, thus achieving data usability without visibility.

[0026] In the vicinity of the ciphertext domain or in a controlled environment after decryption (such as a TEE environment), the anomaly detection and correction unit first uses spectral clustering to group data subsequences with similar periodicity or trend characteristics. Then, within each homogeneous group, it independently applies a box plot algorithm based on skewness adjustment to calculate outlier determination boundaries. Finally, it performs outlier detection on the multi-source index data based on the outlier determination boundaries of the group to which any multi-source index data belongs. The specific process includes: S1: Obtain the time series data X corresponding to the multi-source indicator data, and construct the time trajectory matrix by calculating the time delay embedding method for each indicator data in the time series data. Phase space representation quantization is performed using the time trajectory matrix; S2: Based on phase space representation, calculate the similarity matrix between data points of each indicator data, and apply spectral clustering algorithm to divide the data sequence into multiple subgroups with similar intrinsic patterns. The similarity matrix embeds the time series into the obtained trajectory matrix. The Euclidean distance between two vectors is converted into a quantified similarity value, with an output range of (0,1]. Through nonlinear exponential transformation, it ensures that vector pairs that are close (similar patterns) have high similarity values, and vector pairs that are far apart (large pattern differences) have low similarity values. This provides a matrix input that reflects the similarity of time series patterns for subsequent spectral clustering. Finally, the clustering algorithm identifies time series subgroups with similar periods or trends. S2 includes the following execution process: S21: Based on phase space representation, the similarity between data points of each index is calculated using the Gaussian radial basis function (RBF) kernel similarity algorithm, and a similarity matrix W is constructed. The formula for calculating the element value of any element in W is as follows: ; In the formula, The element in the i-th row and j-th column of the similarity matrix W represents the i-th embedding vector in the trajectory matrix. and the j-th embedding vector Similarity value between them The preset bandwidth parameter (kernel width) of the Gaussian kernel is used to control the rate at which similarity decays with distance. Embedded vector The square of the Euclidean distance between them.

[0027] S22: Construct a diagonal matrix D of the similarity matrix. Each element on the diagonal of matrix D is the sum of all elements in the corresponding row of the similarity matrix. Construct a Laplace matrix based on the similarity matrix and the diagonal matrix. S23: Normalize the Laplacian matrix to eliminate the scale difference in the connection strength between different vectors and obtain the normalized Laplacian matrix. Perform eigenvalue decomposition on the normalized Laplacian matrix, extract the eigenvectors corresponding to the first k smallest eigenvalues, and combine these vectors column by column to form the feature matrix. S24: Take the row vectors of the feature matrix as low-dimensional features, perform K-means clustering on these row vectors, divide the original time series into k homogeneous subgroups, and classify time series patterns with similar periods or trends into the same class through homogeneous subgroups, thus completing the pre-classification.

[0028] S3: Within each subgroup, independently calculate the first quartile, median, and third quartile of the data items within the subgroup, and calculate the interquartile range. Then, calculate the local skewness adjustment factor for each subgroup. This is used to quantify the skewness of the data distribution in this subgroup; it is a local skewness adjustment factor. The calculation formula is: ; In the formula, M is the first quartile, and M is the median. The third quartile is the interquartile range (IQR), and the formula for its calculation is: .

[0029] S4: Based on the local skewness adjustment factor, calculate the outlier determination boundary of the subgroup. The outlier determination boundary includes an upper bound and a lower bound, and the calculation formula is as follows: The lower realm: ; Upper Realm: ; Traditional box plots use a fixed 1.5x IQR boundary, which is only suitable for approximately symmetrically distributed data. S4 improves this by using an improved box plot anomaly detection algorithm, employing a skewness adjustment factor to quantify the skewness of subgroup data, and utilizing an exponential function... By adjusting the anomaly boundaries and dynamically combining the skewness adjustment factor and the box plot boundaries, adaptive detection of skewed data is achieved. This addresses the common non-Gaussian, skewed, and periodic characteristics of communication engineering data. Existing technologies mostly rely on fixed thresholds or single statistical assumptions, resulting in poor data processing performance. The data preprocessing module, through an improved box plot anomaly detection algorithm with a designed skewness adjustment factor λ, dynamically calculates the skewness of the data distribution and adaptively scales the fixed anomaly boundaries of traditional box plots. This effectively solves the problem of high false alarm and false negative rates of traditional box plots when processing skewed communication data, and improves the adaptive processing capability for heterogeneous communication data.

[0030] Each data point is compared with the outlier threshold of its subgroup to determine whether it is an outlier. S5: Perform anomaly correction based on the initial privacy identifier: If the data is sensitive or non-sensitive and can be decrypted: use the median of the subgroup to which the point is located or perform interpolation correction through local regression; If the data is sensitive and must be kept in ciphertext: In the ciphertext state, replace it with the ciphertext corresponding to the median of the homomorphic encryption. Secure replacement is achieved by using the additive homomorphism of homomorphic encryption and calculating the ciphertext distance. The anomaly detection and correction unit solves the problem of high false alarm rate of traditional methods on complex multimodal data by adopting a strategy of clustering first and then detection. The improved dynamic boundary formula is specifically optimized for skewed distributions and combined with encrypted computation technology, it achieves accurate anomaly handling under privacy protection.

[0031] The alignment and fusion unit is used to perform entity alignment, semantic mapping and data fusion on the corrected data to generate a fusion vector. The corrected data is a mixed dataset containing ciphertext and plaintext. Entity recognition is performed on the structured data, unstructured text data and semi-structured data in the mixed dataset respectively. Among them, the entity and attribute set corresponding to the structured data are directly extracted through pattern matching. For unstructured data (a piece of text S), the BERT-BiLSTM-CRF model is used to extract the most likely entity label sequence of text S, where the entity label sequence output by the model is: ; In the formula, the sequence search that maximizes the conditional probability is calculated using the Dimension Bit algorithm. This involves traversing all possible label sequences Y to find the sequence that maximizes the conditional probability of label sequence Y occurring given text S. The maximum sequence is then set as the model's output. This is a normalization factor used to ensure that the sum of the probabilities of all possible label sequences is 1. It is the set of all possible label sequences used to calculate the exponential part and sum them to ensure the validity of the model's output probability. F is the CRF feature score, calculated using the following formula: In the formula, n is the number of tokens (sequence length) in the input text S, and K is the total number of feature functions. The weight of the k-th feature function is obtained through model training. A larger weight indicates a stronger influence of that feature on label prediction. For the k-th feature function, use an indicator function to calculate and output 0 or 1.

[0032] The introduction of the CRF layer solves the problem of ignoring the transfer dependencies between labels when BiLSTM outputs labels alone. For example, the entity label B-PER is more likely to be followed by I-PER rather than B-LOC. By modeling label transfer features, the entity annotation results are made more in line with grammatical and semantic rules, thereby improving the accuracy of multimodal entity recognition.

[0033] For semi-structured data, XPath / JSONPath is used to extract the entities of the corresponding data items from the platform's runtime logs.

[0034] This involves constructing a heterogeneous information network from entity sets from different sources. For two entities to be aligned, a multi-layer graph attention network (GAT) is used to learn the entity representations of the two entities separately. A similarity calculation algorithm is then used to calculate the alignment score. If the calculated score exceeds a threshold, the entities are determined to be the same entity and are merged. Furthermore, for a certain attribute `attr` of the aligned entity, if there are conflicts in values ​​from different data items, the credibility weight of each data item is evaluated separately. The Dempster-Shafer evidence theory is used for fusion, the credibility function of each candidate value is calculated, and the data item corresponding to the largest credibility function value is selected as the fused value. If the conflicting values ​​are all homomorphic ciphertext, a weighted average fusion is performed in the ciphertext state (using additive homomorphism). Generate a fusion vector. For each entity, encode all attribute values ​​containing plaintext and ciphertext into a unified feature vector using a multi-source fusion method. The unified feature vector serves as the digital representation of the entity in subsequent analysis, i.e., the fusion vector. The multi-source fusion method includes directly normalizing and concatenating numerical attributes, converting textual attributes into vectors using a sentence encoder (such as Sentence-BERT), and preserving the ciphertext form of ciphertext attributes or using their plaintext digest (such as hash value) as placeholders, with the ciphertext index annotated in the vector.

[0035] The knowledge graph generation unit is used to receive data from the alignment and fusion unit to generate a relational knowledge graph. The relational knowledge graph includes a node set and an edge set. In the node set, each graph node corresponds to an entity, and the node attributes contain the entity's fusion vector and metadata. In the edge set, the edge represents the relation between two nodes. The relation type and relation strength are determined by semantic mapping. Furthermore, before publishing the knowledge graph of relationships, the data processing module also includes adding adaptive noise that satisfies differential privacy to the edge weights between highly sensitive entities to protect the privacy of sensitive relationships. The noise level is dynamically adjusted based on the global connectivity of the graph, striking a balance between privacy protection and graph utility: when the graph is tightly connected, less noise is added to preserve the effectiveness of the relationship information; when the graph is loosely connected, more noise is added to hide the true values ​​of the original sensitive weights, preventing attackers from inferring sensitive relationships between entities through the weights. The final formula for calculating the weight values ​​published in the knowledge graph is: ; In the formula, The weight values ​​that will ultimately be published into the knowledge graph. Let i be the original edge weights of entity i and entity j. For Laplace noise, the noise values ​​sampled from the Laplace distribution, the Laplace noise sum, For Laplace scaling parameters, These are the global sensitivity of the edge weights and the adaptively adjusted effective privacy budget, respectively, which are jointly determined by the original privacy budget and the global clustering coefficients of the graph. The formula for calculating the effective privacy budget is as follows: ,in, For the original privacy estimate, It is an adjustable parameter used to control the impact of the global clustering coefficient C on the effective privacy budget. It is an adjustable factor that balances privacy protection and graph utility. The global clustering coefficient C reflects the tightness of the connection between nodes in the graph. The larger the value, the tighter the connection of the graph.

[0036] As mentioned earlier, existing communication engineering performance evaluation systems suffer from weak data correlation at the data mining level and data privacy leakage risks inherent in traditional distributed data mining. Specifically, in-depth network performance analysis (such as end-to-end fault tracing and cross-domain correlation optimization) relies on the fusion of data from multiple sources. However, because the data belongs to different management domains, operators, or departments, and contains user privacy and commercially sensitive information, it is subject to strict regulations (such as GDPR and cybersecurity laws). Data cannot be directly centralized or exchanged in plaintext, forming data silos. This leads to a one-sided analytical perspective and makes it difficult to discover global and deep-seated patterns. Furthermore, the essence of communication network data is a graph structure (connections between devices, users, and services). Traditional methods struggle to effectively model the dependencies, influences, and propagation mechanisms within this topology, resulting in inaccurate anomaly localization and significant deviations in trend prediction. In addition, traditional distributed data mining schemes (such as centralizing data only after anonymization) pose privacy leakage risks, as anonymized data can still be re-identified through correlation analysis, failing to meet increasingly stringent privacy protection requirements.

[0037] Therefore, a data mining module is set up to predict network performance trends and identify abnormal propagation paths based on the fused vectors and the knowledge graph of association relationships using a predictive diagnostic model. The predictive diagnostic model is a federated graph neural network combining attention mechanisms and differential privacy. Simultaneously, in encrypted state, an encrypted association mining algorithm based on a secure multi-party protocol is used to obtain the associations between data. The data mining module includes a parallel computing identification and prediction unit and an association mining unit. The identification and prediction unit is used to coordinate multiple participants (data holders) to jointly train a federated graph neural network without centralizing the original data, in order to predict network performance trends and identify abnormal propagation paths. The association mining unit uses an encrypted association mining algorithm based on a secure multi-party protocol to obtain the associations between data. The specific implementation process of the two units includes: Identification and prediction unit: Step 1: Multiple participants use the same graph attention network structure locally, and based on their respective partial network topology and node (such as base stations and links) data, initialize and train the GAT model as a local model: First, the model is initialized: It receives local relational knowledge graphs from each participant, fusion vectors corresponding to each knowledge graph, a global privacy budget, and local privacy budgets from each participant. Here, "participants" refers to different entities or logical units that have partial ownership or control of communication network data and participate in collaborative computing. These include network operation and maintenance centers in different geographical regions (each managing a portion of network elements), different network functional domains (such as wireless access network domains, core network domains, and service platform domains), different operators or network service providers (collaborating under compliance requirements), and different data centers or business units isolated within the same organization due to security policies. A global graph neural network model is trained using a collaborative training method for node-level (e.g., network element performance) regression / classification prediction and edge-level (e.g., fault propagation) importance assessment, without centralizing the original graph data. Each participant initializes a Graph Attention Network (GAT) as its local model, setting initial features as fusion vectors for nodes in the local relational knowledge graph.

[0038] Then, the LeakyReLU attention calculation formula is used to calculate the attention coefficient of any node in the l-th layer of each local model to its neighboring nodes. By setting the attention coefficient, each local model can dynamically learn the importance weight of the neighboring nodes to the central node, improve the learning of the relationship between each entity by each local model, and thus accurately model the mutual influence relationship between network entities. Step 2: In each round of training, each party first calculates the parameter gradient of its local model, and performs norm clipping on the parameter gradient to control sensitivity. Noise satisfying differential privacy requirements is then injected into the clipped parameter gradient to perturb it. The specific implementation process includes: Each participant calculates the gradient loss of its local model using the mean squared error, and calculates the gradient of the model parameters based on local data to obtain the original gradient vector reflecting the local data update requirements. The original gradient vector is then pruned to control its sensitivity; the pruning formula is as follows: ; In the formula, The gradient calculated for participant u is the cropped gradient from the original local gradient. The original local gradient calculated for participant u is a vector of partial derivatives of the loss function with respect to the model parameters, reflecting the direction and magnitude of the local data's update of the model parameters. is the L2 norm of the original gradient, used to measure the overall magnitude of the gradient, and C is the set clipping norm. This is the scaling factor for cropping.

[0039] The formula for calculating the final perturbation gradient after adding Gaussian noise, based on the local privacy budget and training epochs, is as follows: ; In the formula, The final perturbation gradient calculated for participant u is the gradient after pruning and noise addition, used for subsequent model parameter updates. Gaussian noise with zero mean and covariance. Let I be the covariance matrix, and let I be the identity matrix, used to ensure that the added noise is independent and identically distributed in each dimension of the gradient. The noise level is calculated based on the local privacy budget and the number of training rounds; the smaller the privacy budget, the larger the noise level.

[0040] The final perturbed gradient, obtained after pruning and noise perturbation, is used for subsequent model parameter updates (such as gradient aggregation in federated learning), thus protecting local data privacy while maintaining the effectiveness of model training. By adding Gaussian noise to the pruned gradient, the gradient satisfies (local privacy budget, δ)-differential privacy, hiding the details of the true gradient through noise and preventing attackers from back-engineering the participants' original local data through the gradient.

[0041] Step 3: Each participant sends its final perturbation gradient to the designated secure aggregation server or aggregates it through a secure multi-party computation protocol to obtain the aggregated gradient. The central server updates the global model parameters based on the aggregated gradient (total gradient) to obtain the updated aggregated gradient. The central server then distributes the updated aggregated gradient to each participant, and each participant performs the next round of local model update training.

[0042] To achieve local gradient privacy for all participants, the aggregation of perturbed gradients from all participants is completed. The server only performs multiplication operations on the encrypted gradients, obtaining the ciphertext of the aggregated gradients without decryption. A multiplicative homomorphic encryption algorithm is used for secure aggregation, and the secure aggregation formula is as follows: ; In the formula, In plaintext, U represents the sum of the aggregated perturbation gradients of all participants, where U is the total number of participants in the collaborative training. Let be a multiplicative homomorphic encryption function that satisfies the multiplicative homomorphic property, meaning that multiplication of the ciphertext corresponds to addition of the plaintext. Therefore, multiplying the encryption gradients of all participants is equivalent to summing the plaintext gradients and then encrypting the ciphertext. For participant u, the encrypted result of the final perturbation gradient. It is a multiplication function.

[0043] These perturbed, encrypted gradients are securely transmitted and aggregated to update a globally shared model; Step 4: After multiple rounds of iteration, the convergence condition is reached, and the final prediction and diagnostic model is obtained.

[0044] The association mining unit is used to discover strong association rules between cross-domain indicators while keeping data encrypted or secretly shared among all parties. This includes receiving local transaction datasets from multiple participants (u), where each transaction consists of a set of indicator items encoded by a fusion vector. The data items are homomorphically encrypted according to privacy tags, and preset minimum support thresholds (minsup) and minimum confidence thresholds (minconf) are used. Without revealing the specific transaction content of any party, the unit collaboratively calculates all frequent itemsets and their support, and derives strong association rules, including the following processes: Each participant locally transforms its transaction data into a securely computable format. For a candidate set X to be counted (e.g., "{wireless signal strength < threshold, surge in user complaint tickets}"), each party locally scans its own database, generating an encrypted or secretly shared "existence indicator bit" for each transaction record. Specifically, if the transaction data contains this combination, a ciphertext or secret share representing "1" is generated; otherwise, a ciphertext or secret share representing "0" is generated. The generation of these ciphertexts or shares relies on a pre-established homomorphic encryption or secret sharing system. Step 2: Measure the support level under encrypted or secret sharing using the Secure Multiparty Computation (SMPC) protocol. Support level is the frequency of a certain itemset appearing in all transaction records, used to measure the universality of a rule, such as {weak signal, user complaints}. Given a candidate set X (the target itemset for which support needs to be calculated), participant u traverses its local transaction database to generate binary indicator vectors for the candidate set. The length of the indicator vector is equal to the length of the local transaction data. In the binary indicator vector, the t-th element takes the value 1 if and only if the t-th transaction contains X, otherwise it takes the value 0. The local transaction database here contains a set of structured historical or real-time data records independently held by each participant (such as different regional operation and maintenance centers or different network domains), which is the basic material for subsequent relationship mining.

[0045] Participant u sums multiple binary indicator vectors to obtain local support, calculated using the following formula: ; Then each party uses its own ciphertext or share (local support) representing local counts as input to the SMPC protocol, which can be based on either Shamir secret sharing or obfuscated circuits. Based on the SMPC protocol, all participants collaboratively calculate the global support. Secret shared value Without needing to know the local count value of any party, each participant u will have its share of the local count. The data is sent to the corresponding participant q. Finally, each participant q summarizes all the shares received to obtain the share of global support. That is, in the state of ciphertext or secret share, it simulates the function of a distributed adder. Through a series of exchanges, operations and combinations in the encrypted state, each participant (or a designated coordinator) finally obtains a new, encrypted or secret shared form of global support. In the entire summation process, no party can decrypt or snoop on the local count intermediate value of any other party.

[0046] Step 3: Determine frequent itemsets using a secure comparison protocol, checking if the secret shared value (local support) is greater than or equal to the minimum support threshold minsup. Each participant then uses its held secret shared value... The share and minimum support threshold share are calculated via the GC or GMW protocol, outputting an encrypted or secretly shared Boolean result. If the Boolean result is "yes" (i.e. the itemset is frequent), the itemset can proceed to the next round of association rule generation. All itemset information that is not determined to be frequent is safely discarded during the calculation process to ensure that it is not leaked.

[0047] Step 4: Calculate the confidence level, which is the conditional probability that L also occurs in a transaction where Y occurs. It is used to quantify the strength of the association between the premise (Y) and the conclusion (L) of the rule and to measure the reliability of the rule.

[0048] For each itemset determined to be frequent, generate all possible non-empty proper subsets Y, and generate a candidate association rule for each subset Y. This is used to refer to the rule that if Y appears, then the rest of X will also appear; The confidence level is defined as: in the rule middle, The confidence score is equal to the support of itemset X divided by the support of subset Y. The expression for the confidence score is: ; By comparing the security protocol, the calculated confidence level is compared with the set minimum confidence threshold, and an encrypted or secretly shared Boolean value is output. The Boolean value indicates whether the confidence level of the current rule is not lower than the preset minimum confidence threshold, but the specific confidence level value itself is not decrypted and calculated, thus avoiding the leakage of additional information.

[0049] Step 4: Calculate Association Rules: Based on all candidate rules, repeat the above security confidence calculation process in parallel or serially. For each rule, compare the calculated confidence and support scores with the set thresholds. When a rule is determined to be strongly associated if both screening conditions are met, all participants collaborate to decrypt or reconstruct the rule in plaintext. Finally, all parties jointly obtain a list of strongly associated rules in plaintext, for example: "{Increased wireless access failure rate, during a specific time period}". {Core network signaling overload} (Support = 8.5%, Confidence = 82%). All unselected rules and all intermediate support and confidence values ​​remain encrypted or secret throughout this process, and will not be known to any single participant.

[0050] In a secure multi-party computation (MPC) environment, the above process securely performs support counting, threshold comparison, and confidence calculation in ciphertext or secret share state through homomorphic encryption or secret sharing technology. Ultimately, it only reveals strong rules that meet the two thresholds, thus achieving privacy-preserving association rule mining.

[0051] Traditional methods, constrained by data silos and privacy limitations, cannot effectively correlate and analyze metrics belonging to different management and technical domains (such as jointly analyzing wireless signal quality, core network signaling load, and user experience rate). The data mining module, by integrating federated graph neural networks and secure multi-party correlation mining techniques, provides a privacy-preserving environment for data mining and analysis, ensuring data remains within its domain, privacy is protected, and knowledge can be shared. This enables network operations personnel to securely discover deep-seated causal and correlational patterns across domains that are undetectable in traditional single-domain analysis. For example, it allows for the precise identification of complex root cause chains spanning multiple network domains that lead to service quality degradation. This provides unprecedented data-driven decision-making capabilities for implementing precise cross-domain collaborative optimization, preventing systemic risks, and improving overall network performance.

[0052] The performance evaluation module receives network performance index data and uses a comprehensive evaluation model (such as weighted summation) based on fixed weights or expert experience weights to quantify and score preset dimensions of the communication network (such as coverage, capacity, and quality), generating a performance evaluation report that includes a comprehensive performance index and scores for individual indicators. The implementation process of the performance evaluation module typically includes: reading pre-processed performance indicators from a database or upstream system; standardizing and weighting each indicator according to a preset, static evaluation indicator system and weighting coefficients; and finally summarizing to obtain a comprehensive performance index, possibly accompanied by simple indicator ranking or compliance analysis. This module focuses on the static, post-hoc, rule-based quantitative evaluation of input indicators.

[0053] The data acquisition module of this invention achieves intelligent and compliant access to multi-source heterogeneous data through adaptive frequency and privacy labeling, breaking down data silos at the source and laying the foundation for end-to-end privacy protection. The data processing module employs an improved boxplot algorithm with fused spectral clustering pre-classification for adaptive anomaly detection, and combines homomorphic encryption with evidence-based multi-source fusion to solve the problems of poor quality non-Gaussian data cleaning and cross-source data integration. The data mining module combines federated graph neural networks with secure multi-party association mining, achieving cross-domain deep analysis and association discovery under privacy protection. This enables deep cross-domain integration of privacy computing, graph neural networks, and communication network analysis through the collaborative cooperation of multiple modules. Specifically, it utilizes an improved boxplot algorithm based on skewed communication network data, a federated graph neural network combining attention and differential privacy (FedGAT-DP), and an efficient and secure multi-party association mining protocol for communication metrics. A graph-association dual-stream parallel privacy mining architecture is constructed. Through a collaborative controller, a closed-loop optimization of the entire chain from privacy-aware data collection to dynamic evaluation feedback is achieved, internalizing privacy protection as an intrinsic attribute of the system rather than an extrinsic one. This allows the system to perform deep cross-domain analysis that traditional systems cannot accurately complete while meeting strict privacy compliance requirements.

[0054] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.

Claims

1. A cloud platform for evaluating the performance of communication engineering based on heterogeneous data, characterized in that: The cloud platform includes the following communication connections: The data acquisition module is used to collect multi-source indicator data from multiple heterogeneous data sources according to the set adaptive acquisition frequency, and add an initial privacy label corresponding to its sensitivity level to the collected multi-source indicator data. The data processing module is used to perform anomaly detection and correction on the multi-source indicator data after homomorphic encryption based on the initial privacy identifier, and to perform entity alignment, semantic mapping and data fusion on the corrected data to generate a fusion vector and a knowledge graph of association relationships. The data mining module is used to predict network performance trends and identify abnormal propagation paths using a pre-built predictive diagnostic model based on fused vectors and association knowledge graphs. At the same time, in the encrypted state, it uses an encrypted association mining algorithm based on a secure multi-party protocol to obtain the association between data. The predictive diagnostic model is a federated graph neural network that combines attention mechanism and differential privacy. The performance evaluation module receives network performance trend prediction results, abnormal propagation paths, and data correlation analysis results. It uses a preset performance evaluation model to perform multi-dimensional performance quantification and generate a performance evaluation report that includes a comprehensive performance index, indicator weakness analysis, and root cause localization.

2. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 1, characterized in that: In the data acquisition module, the importance weight of the data source, the real-time network load status, and the data's own rate of change are calculated and determined using an adaptive heartbeat protocol algorithm. The formula for calculating the next acquisition interval is as follows: ; In the formula, For data source At the data collection interval of time t For data source At the data collection interval of time t+1 For data source Importance weights This indicates the real-time network load status. For the rate of change of data, The weighting coefficients are customized, and the sum of the three is 1.

3. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 1, characterized in that: In the data processing module, a combination of spectral clustering and box plot algorithm is used to detect anomalies in the encrypted multi-source indicator data: First, spectral clustering is used to group data subsequences with similar periodicity or trend characteristics. Then, within each homogeneous group, a box plot algorithm based on skewness adjustment is applied independently to calculate the outlier determination boundary. Finally, outlier detection is performed on the multi-source indicator data based on the outlier determination boundary of the group to which any multi-source indicator data belongs.

4. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 3, characterized in that: The specific process of anomaly detection in the encrypted multi-source indicator data using the spectral clustering algorithm combined with the box plot algorithm in the data processing module includes: Obtain the time series data corresponding to the multi-source indicator data, and calculate the time delay embedding matrix of each indicator data in the time series data to construct a phase space representation; The similarity matrix between data points of each index is calculated based on phase space representation, and the data sequence is divided into multiple subgroups with similar intrinsic patterns by applying spectral clustering algorithm. Within each subgroup, the first quartile, median, and third quartile of its data are independently calculated, and the interquartile range is calculated. Then, the local skewness adjustment factor for each subgroup is calculated. Based on the local skewness adjustment factor, the outlier determination boundary of the subgroup is calculated; Each data point is compared with the outlier threshold of its subgroup to determine whether it is an outlier.

5. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 3, characterized in that: Before publishing the relationship knowledge graph, the data processing module further includes adding adaptive noise that satisfies differential privacy to the edge weights between entities in the relationship knowledge graph. The final calculation formula for the edge weights published to the knowledge graph is as follows: ; In the formula, The weight values ​​that will ultimately be published into the knowledge graph. Let i be the original edge weights of entity i and entity j. It is Laplace noise. For Laplace scaling parameters, These represent the global sensitivity of edge weights and the adaptively adjusted effective privacy budget, respectively.

6. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 1, characterized in that: In the data mining module, constructing the predictive diagnostic model includes the following process: Step 1: Each participant initializes a graph attention network as a local model; Step 2: In each round of training, each party first calculates the parameter gradient of its local model, and performs norm clipping on the parameter gradient to control sensitivity. The clipped parameter gradient is then perturbed by injecting noise that meets the differential privacy requirements to obtain the final perturbed gradient. Step 3: Securely aggregate the final perturbation gradients calculated by each participant to obtain the aggregated gradient, and use the aggregated gradient to update the global model parameters to obtain the updated aggregated gradient; Each participant uses the updated aggregate gradient to perform the next round of local model update training; Step 4: After multiple rounds of iteration, the convergence condition is reached, and the final prediction and diagnostic model is obtained.

7. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 1, characterized in that: The data mining module, in encrypted state, uses an encrypted correlation mining algorithm based on a secure multi-party protocol to obtain the correlation between data, including the following implementation process: Participant u sets a candidate itemset X and traverses the local transaction database to construct a binary indicator vector for the candidate itemset X, if and only if transaction t in the local transaction database contains itemset X, and the t-th element of the binary indicator vector is 1. Each participant calculates its confidence level and sums the multiple binary indicator vectors to obtain the local support. Using the GMW protocol, frequent itemsets are filtered based on local support, and rule strength is filtered based on confidence. When a rule is determined to be strongly associated if both filtering conditions are met, the participating parties collaborate to decrypt or reconstruct it to obtain the plaintext form of the strongly associated rule.

8. The cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 6, characterized in that: In step 2, the formula for calculating the final perturbation gradient is: ; In the formula, The final perturbation gradient calculated for participant u. The gradient calculated for participant u is the cropped gradient from the original local gradient. Gaussian noise with zero mean and covariance. Let I be the covariance matrix and I be the identity matrix. C is the noise scale calculated based on the local privacy budget and training epochs, where C is the set clipping norm.

9. A cloud platform for evaluating communication engineering performance based on heterogeneous data as described in claim 6, characterized in that: In step 3, a multiplicative homomorphic encryption algorithm is used for secure aggregation. The secure aggregation formula is as follows: ; In the formula, In plaintext, U represents the sum of the aggregated perturbation gradients of all participants, where U is the total number of participants in the collaborative training. This is a multiplicative homomorphic encryption function. For participant u, the encrypted result of the final perturbation gradient. It is a multiplication function.