A robust multi-view clustering method and system based on cross-view adaptive fusion and clustering center enhancement

CN122286342APending Publication Date: 2026-06-26GUANGXI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGXI UNIV
Filing Date
2026-03-26
Publication Date
2026-06-26

Smart Images

  • Figure CN122286342A_ABST
    Figure CN122286342A_ABST
Patent Text Reader

Abstract

This invention discloses a robust multi-view clustering method and system based on cross-view adaptive fusion and cluster center enhancement, belonging to the field of artificial intelligence and big data processing technology. The invention introduces a dual weighting mechanism at the view level and sample level through a cross-view adaptive fusion module, weighting and summing the projection features of each view to generate a fused feature representation. A dual-driven cluster center enhancement framework calculates the cluster centers of single views and the fused view cluster centers, and constructs alignment loss and separation loss to align the cluster structure and enhance inter-cluster separability. A second-order proximity graph embedding module calculates a second-order similarity matrix based on target features to correct false negative samples, generating a corrected target matrix. A unified objective function is constructed by combining cluster center enhancement loss and contrastive learning loss to guide network training, outputting the final fused features for clustering. This invention achieves excellent clustering performance in both complete and incomplete multi-view scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence and big data processing technology, and in particular relates to a robust multi-view clustering method and system based on cross-view adaptive fusion and cluster center enhancement. Background Technology

[0002] With the rapid development of data acquisition technology, multi-view data is becoming increasingly common in various fields. Multi-view clustering aims to reveal the underlying structure of data by integrating complementary information from multiple perspectives. Compared with single-view clustering, it can provide a more comprehensive understanding of data by leveraging the complementarity between different views. In recent years, deep learning methods have significantly improved clustering performance through end-to-end representation learning. Among them, cross-view contrastive learning has become the mainstream direction. Its core mechanism is to independently encode each view, construct positive and negative sample pairs across views, and update the model by minimizing the distance between positive sample pairs and maximizing the distance between negative sample pairs.

[0003] While existing contrastive multi-view clustering methods have achieved some success, they still have significant limitations in information fusion, structural alignment, and sample identification. First, existing multi-view information fusion strategies often lack effectiveness. Most current fusion methods employ simple averaging or splicing operations, ignoring the differences in importance between different views and samples, easily leading to over- or under-fusion of heterogeneous information. Simple averaging or splicing may result in the loss of view-specific features or the inability to effectively integrate complementary information, leaving views isolated and thus reducing clustering performance. Second, most existing research focuses on optimizing false negative identification strategies, neglecting the alignment of cluster centers between single views and fused views. In multi-view clustering, the cluster structures of different views may be inconsistent. This misalignment between single-view cluster centers and fused view cluster centers creates ambiguous assignments, weakens the discriminative patterns of features, and results in a loose cluster structure. Furthermore, existing methods do not make sufficient use of higher-order neighborhood similarity when dealing with the false negative problem in contrastive learning. Traditional methods or methods based on first-order neighborhood and random walks often fail to capture samples that are far apart but structurally related in the feature space. Because they cannot effectively utilize the structure of higher-order neighborhoods, these methods are prone to misjudgment when identifying structurally similar samples, which in turn reduces the quality of representation learning.

[0004] In summary, existing multi-view clustering methods fail to effectively address the problem of adaptive weighted fusion of cross-view features, lack an explicit alignment mechanism between single-view and fused-view cluster centers, and are insufficient in utilizing higher-order neighborhood information to correct false negatives. There is an urgent need for a robust multi-view clustering method that can dynamically balance view and sample weights, achieve dual cluster center alignment, and effectively utilize second-order proximity. Summary of the Invention

[0005] To address the aforementioned technical problems, this invention proposes a robust multi-view clustering method and system based on cross-view adaptive fusion and cluster center enhancement, thereby resolving the issues present in the prior art.

[0006] Firstly, to achieve the above objectives, this invention provides a robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement, comprising the following steps: Obtain a dataset containing the original data of multiple samples across multiple views; The encoder is used to extract initial features for each view, and projection features for contrastive learning and target features for constructing neighborhood structures are separated from the initial features. The sample-level weight of each sample in each view is calculated based on the projection features and the predefined view-level weights, and the projection features are weighted and summed using the sample-level weights to obtain the fusion features of each sample. The higher-order neighborhood similarity between samples is calculated based on the target features, and the higher-order neighborhood similarity is used to generate a target matrix for correcting false negative samples in the contrastive learning process. The cluster centers of the fused view and each individual view are calculated based on the fusion features and the projection features, respectively. Constraints are then constructed based on the cluster centers to align the cluster structure and enhance the inter-cluster separation. The constraints are combined with the contrastive learning loss function, and the target matrix is ​​used to guide the network parameter update until the model converges. The final fused features are then output for clustering.

[0007] Optionally, the initial features extracted for each view using the encoder include: An encoder with shared weights processes the raw data of each view to obtain initial features for each view; a first branch is separated from the initial features as contrast features, and a second branch is separated as target features; the contrast features are input to a projection head, and the projection head maps the contrast features of different views to a unified common representation space to obtain the projection features.

[0008] Optionally, calculating the sample-level weights for each sample in each view includes: Perform a Softmax operation on the learnable parameters of each view to obtain view-level weights; perform a dot product between the projected features of each sample and the view-level weights of the corresponding view to obtain the original attention score; perform Softmax-like normalization on the original attention score to obtain sample-level weights; multiply the projected features of each view by the corresponding sample-level weights and then sum them across views to obtain the fused features.

[0009] Optionally, generating the target matrix for correcting false negative samples includes: The similarity between samples is calculated based on the target features, and a Gaussian affinity matrix is ​​constructed. The Gaussian affinity matrix is ​​then sparsified, retaining the maximum similarity connection for each sample, and the sparsified matrix is ​​normalized to obtain a transition probability matrix. The transition probability matrix is ​​multiplied by its transpose to obtain a second-order similarity matrix. The second-order similarity matrix is ​​then weighted and summed with the identity matrix, and the result is used as the target matrix.

[0010] Optionally, the process of constructing constraints includes: Calculate the single-view cluster center of each cluster under each view and the fusion cluster center of each cluster under the fusion view using high-confidence pseudo-labels; construct a first constraint term to minimize the cosine similarity between the cluster centers of the same cluster under different views; construct a second constraint term to maximize the separation between the cluster centers of different clusters under the fusion view, and use the sum of the first constraint term and the second constraint term as the constraint condition.

[0011] Optionally, the process of combining constraints with the contrastive learning loss function includes: In the first training phase, the identity matrix is ​​used as the target matrix, and the network parameters are updated using only the contrastive learning loss function. In the second training phase, the constraints are added to the total loss function, and the identity matrix is ​​replaced with the target matrix. The network parameters are then updated jointly using the contrastive learning loss function and the constraints.

[0012] Optionally, the process of outputting the final fusion features for clustering includes: after the model training converges, extracting the fusion features of all samples; and performing the K-means clustering algorithm on the fusion features to obtain the cluster assignment result for each sample.

[0013] Secondly, the present invention also provides a robust multi-view clustering system based on cross-view adaptive fusion and cluster center enhancement, for implementing a robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement, the system comprising: The data acquisition module acquires a dataset containing the original data of multiple samples across multiple views; The feature extraction and separation module uses the encoder to extract the initial features of each view, and separates the projection features for contrastive learning and the target features for constructing the neighborhood structure from the initial features. The cross-view adaptive fusion module calculates the sample-level weight of each sample in each view based on the projection features and predefined view-level weights, and uses the sample-level weights to perform a weighted summation of the projection features to obtain the fusion features of each sample. The neighborhood graph embedding module calculates the higher-order neighborhood similarity between samples based on the target features, and uses the higher-order neighborhood similarity to generate a target matrix for correcting false negative samples in the contrast learning process. The cluster center enhancement module calculates the cluster centers of the fused view and each individual view based on the fusion feature and the projection feature, and constructs constraints based on the cluster centers to align the cluster structure and enhance the inter-cluster separation. The joint training and output module combines the constraints with the contrastive learning loss function and uses the target matrix to guide the network parameter update until the model converges and outputs the final fused features for clustering.

[0014] Thirdly, the present invention also provides a computer terminal device, comprising: One or more processors; A memory, coupled to the processor, for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the steps of the robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement in the first aspect described above.

[0015] Fourthly, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, it implements the steps of the robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement in the first aspect described above.

[0016] Compared with the prior art, the present invention has the following advantages and technical effects: This invention provides a robust multi-view clustering method and system based on cross-view adaptive fusion and cluster center enhancement. By introducing a dual-weight mechanism at the view and sample levels through a cross-view adaptive fusion module, it dynamically balances view-specific reliability and sample-specific confidence, effectively solving the problems of over-fusion and under-fusion in heterogeneous information integration, preserving view-specific features and integrating complementary information. A dual-driven cluster center enhancement framework achieves dual alignment between single-view cluster centers and fused-view cluster centers, maximizing the separability between clusters while maintaining cluster consistency, thus obtaining a more discriminative clustering structure. A second-order proximity graph embedding method utilizes high-order neighborhood structural similarity instead of direct feature distance, effectively identifying and correcting false negative samples, significantly improving the robustness of feature learning. This invention achieves excellent clustering performance in both complete and incomplete multi-view scenarios, demonstrating its robustness and effectiveness in processing multi-view data. Attached Figure Description

[0017] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an undue limitation of the invention. In the drawings: Figure 1 This is a schematic diagram comparing the effects of different fusion strategies in multi-view learning according to an embodiment of the present invention; Figure 2 This is a flowchart illustrating a robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement according to an embodiment of the present invention. Figure 3 This is a schematic diagram of the workflow of the Cross-View Adaptive Fusion (CAF) module according to an embodiment of the present invention; Figure 4 This is a schematic diagram of the workflow of the Dual-Driven Cluster Center Enhancement (DCCE) module in an embodiment of the present invention; Figure 5 This is a line graph showing the clustering performance of this invention on the Scene-15 dataset under different view missing rates; Figure 6 This is a convergence analysis graph of an embodiment of the present invention on the Cifar100 dataset; Figure 7 This is a convergence analysis graph of an embodiment of the present invention on the Scene-15 dataset; Figure 8 This is a convergence analysis graph of an embodiment of the present invention on the Reuters dataset; Figure 9 This is a convergence analysis graph of an embodiment of the present invention on the LandUse21 dataset; Figure 10 This is a convergence analysis graph of an embodiment of the present invention on the CUB dataset; Figure 11 This is a convergence analysis graph of an embodiment of the present invention on the Hdigit dataset; Figure 12 This is a schematic diagram illustrating the parameter sensitivity analysis of the ACC indicator on the CUB dataset according to an embodiment of the present invention. Figure 13 This is a schematic diagram illustrating the parameter sensitivity analysis of the ACC metric on the Scene-15 dataset according to an embodiment of the present invention. Figure 14 This is a schematic diagram illustrating the parameter sensitivity analysis of the NMI index on the CUB dataset according to an embodiment of the present invention. Figure 15 This is a schematic diagram illustrating the parameter sensitivity analysis of the ACC metric on the Scene-15 dataset according to an embodiment of the present invention. Figure 16 This is a schematic diagram illustrating the parameter sensitivity analysis of the NMI index on the CUB dataset according to an embodiment of the present invention. Figure 17 This is a schematic diagram illustrating the parameter sensitivity analysis of the ACC metric on the Scene-15 dataset according to an embodiment of the present invention. Figure 18 This is a schematic diagram illustrating the parameter sensitivity analysis of the F-Score metric on the CUB dataset according to an embodiment of the present invention. Figure 19 This is a schematic diagram illustrating the parameter sensitivity analysis of the F-Score metric on the Scene-15 dataset according to an embodiment of the present invention. Figure 20 This is a t-SNE visualization of the original features on the Reuters dataset, as shown in this embodiment of the invention. Figure 21 This is a visualization of the t-SNE distribution of features trained on the Reuters dataset according to an embodiment of the present invention. Figure 22 This is a t-SNE visualization of the original features on the Hdigit dataset in an embodiment of the present invention; Figure 23 This is a visualization of the t-SNE distribution of features after model training on the Hdigit dataset, according to an embodiment of the present invention. Detailed Implementation

[0018] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other. The present invention will now be described in detail with reference to the accompanying drawings and embodiments.

[0019] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.

[0020] This embodiment provides a robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement, including: Obtain a dataset containing the original data of multiple samples across multiple views; The encoder is used to extract initial features for each view, and projection features for contrastive learning and target features for constructing neighborhood structures are separated from the initial features. The sample-level weight of each sample in each view is calculated based on the projection features and the predefined view-level weights, and the projection features are weighted and summed using the sample-level weights to obtain the fusion features of each sample. The higher-order neighborhood similarity between samples is calculated based on the target features, and the higher-order neighborhood similarity is used to generate a target matrix for correcting false negative samples in the contrastive learning process. The cluster centers of the fused view and each individual view are calculated based on the fusion features and the projection features, respectively. Constraints are then constructed based on the cluster centers to align the cluster structure and enhance the inter-cluster separation. The constraints are combined with the contrastive learning loss function, and the target matrix is ​​used to guide the network parameter update until the model converges. The final fused features are then output for clustering.

[0021] Furthermore, the initial features extracted for each view using the encoder include: An encoder with shared weights processes the raw data of each view to obtain initial features for each view; a first branch is separated from the initial features as contrast features, and a second branch is separated as target features; the contrast features are input to a projection head, and the projection head maps the contrast features of different views to a unified common representation space to obtain the projection features.

[0022] Furthermore, calculating the sample-level weights for each sample in each view includes: Perform a Softmax operation on the learnable parameters of each view to obtain view-level weights; perform a dot product between the projected features of each sample and the view-level weights of the corresponding view to obtain the original attention score; perform Softmax-like normalization on the original attention score to obtain sample-level weights; multiply the projected features of each view by the corresponding sample-level weights and then sum them across views to obtain the fused features.

[0023] Furthermore, generating the target matrix for correcting false negative samples includes: The similarity between samples is calculated based on the target features, and a Gaussian affinity matrix is ​​constructed. The Gaussian affinity matrix is ​​then sparsified, retaining the maximum similarity connection for each sample, and the sparsified matrix is ​​normalized to obtain a transition probability matrix. The transition probability matrix is ​​multiplied by its transpose to obtain a second-order similarity matrix. The second-order similarity matrix is ​​then weighted and summed with the identity matrix, and the result is used as the target matrix.

[0024] Furthermore, the process of constructing constraints includes: Calculate the single-view cluster center of each cluster under each view and the fusion cluster center of each cluster under the fusion view using high-confidence pseudo-labels; construct a first constraint term to minimize the cosine similarity between the cluster centers of the same cluster under different views; construct a second constraint term to maximize the separation between the cluster centers of different clusters under the fusion view, and use the sum of the first constraint term and the second constraint term as the constraint condition.

[0025] Furthermore, the process of combining constraints with the contrastive learning loss function includes: In the first training phase, the identity matrix is ​​used as the target matrix, and the network parameters are updated using only the contrastive learning loss function. In the second training phase, the constraints are added to the total loss function, and the identity matrix is ​​replaced with the target matrix. The network parameters are then updated jointly using the contrastive learning loss function and the constraints.

[0026] Furthermore, the process of outputting the final fusion features for clustering includes: after the model training converges, extracting the fusion features of all samples; and performing the K-means clustering algorithm on the fusion features to obtain the cluster assignment result for each sample.

[0027] Specifically, the implementation process of this embodiment includes: like Figures 1-4 As shown, this invention provides a robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement. This method is mainly for complete or incomplete multi-view datasets. In this application scenario, the input data contains... Multiview dataset of samples ,in Represents the total number of views, the output is to... Each sample was divided into Prediction results in each cluster.

[0028] The method specifically includes the following steps: Step 1. Feature Extraction and Projection: First, a feature extraction module based on a deep neural network is constructed. For the first... Data from each view It utilizes a shared-weight encoder to extract two types of features: contrastive features. And target features .

[0029] To address the issue of misalignment of features across different views, a multilayer perceptron (MLP) is introduced as a projector to map contrastive features onto a common representation space, thus obtaining projected features. : ; in , The dimension representing the projected feature.

[0030] Step 2. Cross-view adaptive blending (CAF): To address the over- or under-fusion issues caused by simple averaging or splicing in existing methods, this step introduces a dual weighting mechanism (view-level and sample-level) for feature fusion.

[0031] (1) Calculate view-level weights: Define a set of learnable parameters for each view. The first digit is calculated using the Softmax function. Global importance weight of each view : ; (2) Calculate sample-level weights: To measure the reliability of each sample in a specific view, the dot product of the projected features and the view-level weights is calculated as the raw attention score. The sample-level fusion weights are then normalized. : ; .

[0032] (3) Generate fusion features: Use the above weights to perform a weighted summation of the projection features of each view to obtain the fusion feature. Fusion feature representation of individual samples : ; Step 3. Second-order nearest neighbor graph embedding (SPGM): To address the false negatives problem, this step utilizes a higher-order neighborhood structure to correct the target matrix.

[0033] (1) Constructing the Gaussian affinity matrix: based on target features Calculate samples and The similarity between them is used to construct a matrix. : ; (2) Generate a sparse transition matrix: retain the top 100 values ​​of each sample. Calculate the maximum similarity to generate a sparse matrix. The transition probability matrix is ​​obtained by normalizing it. .

[0034] (3) Calculate the second-order similarity: Calculate the second-order similarity matrix by comparing the neighborhood distribution of the samples. ,Right now This allows samples that are geographically distant in the feature space but share similar neighborhood structures to be identified as similar.

[0035] (4) Correct the target matrix: align the second-order similarity matrix with the identity matrix. By combining these, we obtain the final corrected target matrix. : ; in These are the balancing parameters.

[0036] Step 4. Dual-Driven Cluster Center Enhancement (DCCE): To align the clustering structures of single views and merged views, this step designs a cluster center enhancement mechanism.

[0037] (1) Calculate cluster centers: using high-confidence pseudo-labels (For example, obtained through K-means), calculate the th... Cluster center of each view Cluster centers of merged views .

[0038] (2) Cross-view alignment loss Minimize different views (such as views) and (The same cluster) The center distance ensures the consistent identity of the cluster: ; (3) In-view separation loss ): Maximize the different clusters (clusters) in the merged view and cluster The distance between centers enhances discriminative power. ; (4) Total DCCE loss: .

[0039] Step 5. Two-stage training and optimization: This invention employs a two-stage training framework: (1) Warm-up Stage: In the preheating stage... Within each epoch, only the contrastive loss is used. Training the network, at this point the target matrix Set it to be the identity matrix. Includes cross-view contrast loss Contrast loss within view .

[0040] (2) Rectify Stage: When epoch At that time, the target matrix is ​​updated using SPGM. And introduce the DCCE module. Construct a unified objective loss function. Perform joint optimization: ; in and To balance the hyperparameters.

[0041] (3) Output results: After training, based on the fusion features The final result is obtained by performing K-means clustering.

[0042] Experimental verification: During experimental verification, the robust multi-view clustering method (hereinafter referred to as RELIABLE) based on cross-view adaptive fusion and cluster center enhancement proposed in this invention will be evaluated from the following aspects: RQ1: Does RELIABLE outperform other state-of-the-art (SOTA) benchmark models in a full multi-view scenario in terms of clustering performance? RQ2: How robust is RELIABLE in incomplete multi-view scenarios (where views are missing)? RQ3: Do the key components proposed in this invention (Cross-view Adaptive Fusion CAF, Dual-Driven Cluster Center Enhancement DCCE, and Second-Order Proximity Graph Embedding SPGM) all make substantial contributions to performance improvement? RQ4: Model's response to key hyperparameters (such as equilibrium parameters) and How sensitive is it? RQ5: How is the model's convergence and the clustering effect of feature visualization? 1. Dataset and Experiment Configuration: Detailed Explanation of Benchmark Datasets: To fully verify the effectiveness and robustness of the proposed RELIABLE model in handling multi-view clustering tasks, this embodiment conducted detailed experiments on six widely used public multi-view benchmark datasets. These datasets cover different data types (images, text, handwritten digits, satellite remote sensing, etc.), different sample sizes (from 600 to 50,000), different numbers of classes (from 6 to 100), and different view feature dimensions, as shown in Table 1.

[0043] Table 1

[0044] The specific dataset is described below: Cifar100: This is a large-scale image dataset containing 50,000 32x32 images distributed across 100 fine-grained categories. To fully capture the complementary information of the visual data, this embodiment extracts feature representations from three different viewpoints in experiments. This dataset is primarily used to validate the model's clustering ability in large-scale, multi-class complex scenes.

[0045] Scene15: This dataset contains 4,485 images belonging to 15 different scene categories. Following the feature extraction scheme proposed by Yang et al., this embodiment uses the Pyramid of Histograms of Oriented Gradients (PHOG) and Gabor-Inspired Scene Texture (GIST) as two complementary views for experiments.

[0046] Hdigit: This is a classic handwritten digit recognition dataset, consisting of two sources: the MNIST handwritten digit set and the USPS handwritten digit set. The dataset contains a total of 10,000 samples, and these two sources constitute two distinct views.

[0047] Reuters: This is a multilingual news text dataset containing 18,758 samples across six news topic categories. This example uses a standard autoencoder to map the English and French news texts into a 10-dimensional latent space, respectively, to construct two text views.

[0048] LandUse21: This is a dataset containing 2,100 satellite remote sensing images, covering 21 different land cover categories (such as agriculture, commerce, ports, etc.). This example utilizes PHOG features and Local Binary Patterns (LBP) as dual feature representations for this dataset.

[0049] CUB: This is a fine-grained bird image dataset. This experiment used a subset containing 600 samples from 10 different bird categories. The dataset uses deep visual features extracted by GoogLeNet as the first view and text features generated by doc2vec as the second view, forming a typical image-text multimodal scenario.

[0050] Experimental environment and implementation details: All experiments in this invention are conducted on a unified hardware and software platform to ensure the fairness and reproducibility of the results.

[0051] Hardware configuration: The experiment was conducted on a high-performance workstation equipped with an NVIDIA GeForce RTX 4090 GPU, with Ubuntu 22.04 as the operating system.

[0052] Software Framework: The model is implemented based on the PyTorch 2.4.0 deep learning framework. Network Training Strategy: This embodiment adopts a two-stage training framework. First, a pre-training stage is performed, training all encoders and decoders for 100 epochs; then, a rectification stage is entered, where the network is trained again for 100 epochs on each dataset.

[0053] Optimizer and Hyperparameters: The Adam optimizer is used for parameter updates, with a default learning rate of 0.0002 and a batch size of 1024. ReLU is used as the activation function. Two balancing hyperparameters are used in the total loss function. and Its value range is within the set A grid search is performed to determine the optimal value.

[0054] Missing View Simulation: To verify the performance of incomplete multi-view clustering, this embodiment simulates missing data by randomly removing some samples' views, with the missing rate set between 10% and 90%. For comparison, this embodiment inputs the concatenated embeddings of all views and uses the K-means algorithm to obtain the clustering results.

[0055] Evaluation Metrics and Benchmarks: This embodiment uses three widely accepted clustering performance metrics for evaluation: Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI). Higher values ​​for these metrics indicate better clustering performance.

[0056] To demonstrate the superiority of this invention, this embodiment compares RELIABLE with the following 12 state-of-the-art multi-view clustering methods: DAIMC, BMVC, COMPLETER, DCP, DSIMVC, EEIMVC, SURE, GCFAgg, ProImp, DIVIDE, IMC-MCL, and L2DC. For a fair comparison, the parameters of all compared methods were configured according to the settings recommended in their original papers.

[0057] 2. Complete Multi-View Clustering Performance Analysis (RQ1): To answer RQ1, namely, to verify the effectiveness of the proposed RELIABLE model in a complete multi-view scenario, this embodiment comprehensively compares RELIABLE with 12 existing state-of-the-art benchmark methods on six datasets. The experimental results are shown in Tables 2 and 3, and a detailed analysis follows: Table 2 shows the experimental results of full multi-view clustering on the Cifar100, Scene15, and Reuters datasets.

[0058] Table 2

[0059] (1) Overall performance advantages: On all six benchmark datasets, the RELIABLE method of this invention achieved best or best results on the vast majority of metrics. This demonstrates that RELIABLE can effectively integrate multi-view information and mine potential cluster structures, whether on image data (such as Cifar100, Scene15), text data (such as Reuters), or multimodal data (such as CUB).

[0060] Table 3 shows the experimental results of full multi-view clustering on the LandUse21, CUB, and Hdigit datasets.

[0061] Table 3

[0062] (2) Specific comparison with shallow and deep methods: Compared to shallow representation learning methods such as BMVC, RELIABLE demonstrates an overwhelming advantage. For example, on the Cifar100 dataset, BMVC's ACC is only 8.95%, while RELIABLE reaches 99.16%; on the Scene15 dataset, BMVC's ACC is 40.50%, while RELIABLE improves to 52.84%. This indicates that feature extraction based on deep neural networks combined with the enhancement mechanism of this invention can capture nonlinear patterns that are far more complex than those of shallow models.

[0063] Compared to deep contrastive learning methods, many existing deep clustering methods (such as COMPLETER, DCP, EEIMVC, and GCFAgg) perform worse than expected on the more challenging Reuters dataset. For example, DCP's ACC on Reuters is only 18.87%. This is mainly because Reuters contains multilingual text, and the heterogeneity between different views is extremely high, making it difficult for existing methods to align features in a unified space. However, RELIABLE achieved a significant breakthrough on this dataset, with an ACC of 63.41% and an NMI of 41.92%, representing improvements of 1.06% and 4.10% respectively over the second-best methods.

[0064] (3) Attribution analysis of performance improvement: Experimental results show that simply having multi-view data is not enough to guarantee the clustering effect. The key is to make full use of the separability and similarity between views.

[0065] RELIABLE outperforms the suboptimal method in terms of NMI by 1.43% on the Scene15 dataset; on the Hdigit dataset, RELIABLE achieves near-perfect clustering (ACC 99.56%). This performance improvement is mainly attributed to the Dual-Driven Cluster Center Enhancement (DCCE) module introduced in this invention, which ensures the separability of cluster centers by minimizing inter-cluster overlap; at the same time, the Second-Order Neighbor Graph Embedding (SPGM) module effectively captures high-order neighborhood similarity and corrects false negative samples, thus generating high-quality clustering results even on complex datasets.

[0066] 3. Incomplete multi-view clustering and robustness analysis (RQ2): Table 4 shows the comparative experimental results of incomplete multi-view clustering on the Cifar100, Scene15, and Reuters datasets.

[0067] Table 4

[0068] To answer RQ2, which assesses RELIABLE’s robustness in the face of missing views, this embodiment designed two sets of rigorous experiments: one is a horizontal comparison at a fixed missing rate (50%), and the other is a vertical trend analysis at a full range of missing rates (0% to 90%).

[0069] (1) The performance under a 50% missing rate is shown in Tables 4 and 5. Under the condition of randomly removing 50% of the view data, RELIABLE still maintains excellent performance, which is significantly better than other benchmark methods for handling incomplete views (such as COMPLETER, DSIMVC, ProImp, etc.).

[0070] Table 5 shows the comparative experimental results of incomplete multi-view clustering on the LandUse21, CUB, and Hdigit datasets.

[0071] Table 5

[0072] On the Reuters dataset, RELIABLE achieves a 42.62% improvement in NMI, a 5.93% improvement over the suboptimal method (ProImp); and a 60.83% improvement in ACC, a 6.13% improvement over the suboptimal method. On the LandUse21 dataset, RELIABLE achieves improvements of 2.26% and 2.01% over the suboptimal model in NMI and ACC, respectively. On the CUB dataset, RELIABLE achieves performance gains of 5.83% and 3.57% in NMI and ACC, respectively.

[0073] These data strongly demonstrate that even in the event of general data loss, the Cross-View Adaptive Fusion (CAF) module of this invention can still accurately identify and utilize the effective information in the remaining views through a dynamic weighting mechanism to compensate for the impact of missing views.

[0074] (2) Robustness trend across the full range of missing rates (0%-90%) To further explore the model's limiting capabilities, this embodiment tested the performance changes on the Scene15 dataset at different missing rates from 0% to 90% (e.g. Figure 5 (As shown).

[0075] Gentle performance degradation: As the missing data rate increased from 0% to 90%, RELIABLE's performance declined very gradually. Specifically, ACC decreased by only 7.64%, NMI by only 8.37%, and ARI by only 7.40%. This indicates that the model did not collapse due to the drastic reduction in data.

[0076] Comparison with benchmark methods: In contrast, the performance curves of benchmark methods (such as COMPLETER, DCP, SURE, ProImp) show a sharp decline as the missing data rate increases. For example, when the missing data rate reaches 60%-70%, the metrics of many benchmark methods have dropped significantly, and they are unable to maintain an effective clustering structure.

[0077] Contribution of higher-order neighborhoods: Notably, RELIABLE outperforms the DIVIDE method, which uses random walk graph embeddings. This further confirms that the proposed second-order neighborhood graph embedding (SPGM) can more accurately measure neighborhood similarity between samples when dealing with sparse and incomplete data, thus effectively solving the false negative problem caused by missing data.

[0078] In summary, the experimental results demonstrate that RELIABLE not only performs exceptionally well on complete data but also possesses strong robustness against interference. It can extract valuable features from highly incomplete data and effectively fuse them, maintaining high clustering accuracy even in extreme cases with a missing data rate as high as 90%. This proves the enormous potential of this invention for handling imperfect data in practical applications.

[0079] 4. Ablation experiments and analysis of key components (RQ3): To thoroughly verify the effectiveness of each innovative component in the proposed RELIABLE model and its contribution to the overall clustering performance, this embodiment conducted detailed ablation experiments on three datasets: Scene15, LandUse21, and CUB. This embodiment focused on evaluating the following three core components: Dual-Driven Cluster Center Enhancement (DCCE), Second-Order Nearest Neighbor Graph Embedding (SPGM), and Cross-View Adaptive Fusion (CAF). The experimental results are shown in Tables 6 and 7, and the specific analysis is as follows: Table 6

[0080] 1) Validation of the effectiveness of dual-driven cluster center enhancement (DCCE) The DCCE module proposed in this invention aims to solve the problem of misalignment between single-view cluster centers and fused-view cluster centers, and enhances the separability between clusters through a comparison mechanism. Experimental setup: This embodiment constructs a variant model with the DCCE module removed, i.e., cross-view alignment loss is not calculated during training. and in-view separation loss .

[0081] Performance Differences: Experimental data show that removing DCCE significantly degrades model performance. Specifically, on the Scene15 dataset, accuracy (ACC) decreased by 4.52% (from 52.84% to 48.32%), and normalized mutual information (NMI) decreased by 3.24% (from 50.22% to 46.98%); on the CUB dataset, ACC decreased by 2.83% (from 81.50% to 78.67%). Principle Analysis: These results strongly demonstrate that the DCCE module plays a crucial role in balancing cluster separation in fused views with cluster cohesion in individual views. By minimizing inter-cluster overlap and maintaining structural consistency within views, DCCE achieves more clearly defined cluster centers, thus significantly improving clustering accuracy.

[0082] 2) Validation of the second-order nearest neighbor graph embedding (SPGM) The SPGM module aims to correct the false negative problem in contrastive learning by using higher-order neighborhood structural similarity instead of direct feature distance.

[0083] Experimental setup: In this embodiment, the SPGM module is removed, and only the identity matrix is ​​retained as the correction target matrix. Second-order similarity is no longer calculated. .

[0084] Performance differences: As shown in Table 6, the absence of SPGM leads to a significant drop in model performance. On the Scene15 dataset, ACC, NMI, and ARI decreased by 5.56%, 3.39%, and 4.73%, respectively (compared to ACC 52.84% for the complete model vs. 47.28% for the model without SPGM).

[0085] Comparison with other strategies (Table 7): To further demonstrate the superiority of SPGM, this embodiment compares the SPGM strategy of the present invention with existing false negative identification strategies, including the k-neighborhood method and the random walk method. On the Scene15 dataset, SPGM's ACC (52.84%) is significantly higher than that of the k-neighborhood method (47.61%) and the random walk method (47.34%). On the LandUse21 dataset, SPGM's NMI (37.84%) is also superior to that of k-neighborhood (29.83%) and the random walk method (34.02%).

[0086] Table 7

[0087] Principle Analysis: This comparative result shows that traditional methods based on first-order neighborhoods or random walks struggle to capture complex structural relationships, easily introducing false positives or missing false negatives. In contrast, the SPGM of this invention, by comparing the neighborhood distribution of samples, can accurately identify samples that are geographically distant in feature space but structurally closely related, thus more effectively mitigating the false negative problem.

[0088] 3) Verification of the effectiveness of cross-view adaptive fusion (CAF): This module dynamically solves the problems of "over-fusion" and "under-fusion" in heterogeneous information fusion through a dual attention mechanism at the view level and the sample level.

[0089] Experimental Setup: This embodiment replaces the adaptive weighted fusion mechanism of this invention with a simple view concatenation or averaging operation. Performance Differences: Removing the CAF module alone resulted in a performance decrease on all three datasets. For example, on Scene15, ACC dropped from 52.84% to 47.72%. More importantly, the performance degradation was even more drastic when CAF was removed simultaneously from other modules (such as CAF+DCCE or CAF+SPGM).

[0090] Principle Analysis: Experimental results confirm that the dual-weighting mechanism of CAF can dynamically allocate weights based on the reliability of the view and the confidence level of the samples. This adaptability provides the subsequent DCCE module with more balanced and higher-quality fusion features, enabling DCCE to calculate cluster centers based on more accurate representations. Once CAF is lost, the interference of noisy views will significantly weaken the optimization effect of subsequent modules.

[0091] 4) Synergy Analysis Between Modules: By comparing row 8 of Table 6 (with all components removed: DCCE+SPGM+CAF are all ×) with the complete model (row 9), this embodiment shows a significant performance gap. For example, on Scene15, ACC drops sharply from 52.84% to 45.13%.

[0092] Furthermore, the performance after removing a single module (lines 1-3) consistently outperforms the performance after removing two modules (lines 4-6). This indicates that the three core components of this invention do not work in isolation, but rather exhibit a significant synergy: CAF provides high-quality input features for DCCE, SPGM optimizes the target matrix for contrastive learning, and DCCE further aligns the fused representations generated by CAF. Together, these three components establish the RELIABLE model's leading position in multi-view clustering tasks.

[0093] 5. Hyperparameter sensitivity, convergence, and visualization analysis (RQ4 & RQ5): To comprehensively evaluate the stability and intrinsic learning mechanism of the RELIABLE model proposed in this invention, this embodiment conducts in-depth experimental verification from three dimensions: hyperparameter sensitivity, training convergence behavior, and feature space visualization. First, it addresses the issue of balancing the multi-view contrastive learning loss in the model's unified objective loss function. and dual-drive cluster center enhancement loss Two key hyperparameters and In this embodiment, in the parameter space A detailed grid search and sensitivity analysis were performed. This embodiment records the changes in multiple clustering performance metrics, including accuracy, normalized mutual information, adjusted Land coefficient, and F-score, on two representative datasets, CUB and Scene15.

[0094] As in the appendix Figures 12-15 and Figures 16-19 The hyperparameter experiments show that the model's sensitivity to parameters varies across different datasets: on the CUB dataset, clustering performance decreases with... and The changes in the combination exhibit significant fluctuations, especially when When the values ​​are small, all metrics are significantly lower, indicating that a larger contrastive learning weight is needed to drive feature alignment when processing complex multimodal data. Conversely, on the Scene15 dataset, the model performance shows strong stability, maintaining a high clustering level under most parameter combinations. Combining the experimental performance on both datasets, when... Set to 1 and When set to 0.1, the model achieves the best clustering results in most cases. Therefore, this optimal parameter configuration is used in subsequent actual model training in this embodiment.

[0095] Secondly, to verify the convergence and stability of the optimization algorithm of this invention, this embodiment performed convergence analysis on all six benchmark datasets, including Cifar100, Scene15, Reuters, LandUse21, CUB, and Hdigit. Figures 6-11 As shown, by recording the changes in the loss function value and clustering evaluation index with each iteration during the training process, this embodiment observes that in all datasets, the total loss value shows a rapid decreasing trend in the early stage of training, then enters a stable decreasing phase and finally converges to a minimum value, proving the effectiveness of the optimization strategy of this invention. Meanwhile, the clustering performance index increases significantly in the initial iteration phase and gradually stabilizes. Specifically, for the relatively simple Hdigit and Reuters datasets, the model converges very quickly, typically reaching optimal performance before 100 epochs; while for the Cifar100 and CUB datasets with more categories and more complex data distribution, the performance index shows a steady increasing trend and converges at the end of training. This robust convergence behavior further confirms that the RELIABLE method has good robustness and learning ability when dealing with large-scale and complex multi-view data.

[0096] Finally, to visually demonstrate the effectiveness of the model in feature representation learning, this embodiment utilizes t-SNE technology to perform dimensionality reduction visualization analysis on the feature distributions of the Reuters and Hdigit datasets before and after training. For example... Figures 20-23As shown, in the initial training stage with raw data, samples of different categories exhibit a scattered and chaotic distribution in the feature space, with significant overlap between classes. This resulted in clustering accuracies of only 38.66% and 48.85% on the Reuters and Hdigit datasets, respectively, indicating a lack of sufficient discriminative power in the original features. However, after 200 rounds of training with the model of this invention, the feature distribution of the samples underwent a qualitative leap: sample points of the same category were tightly clustered, forming highly cohesive cluster structures, while the boundaries between clusters of different categories became clear, significantly improving the separation. At this point, the accuracy on the Reuters dataset increased dramatically to 59.89%, and the accuracy on the Hdigit dataset reached 99.46%. These visualization results strongly demonstrate that the RELIABLE model can gradually learn highly discriminative feature representations and successfully separate different cluster centers in the feature space, thus achieving high-quality clustering results.

[0097] Based on the same general inventive concept, this invention also provides a robust multi-view clustering system based on cross-view adaptive fusion and cluster center enhancement. The robust multi-view clustering system based on cross-view adaptive fusion and cluster center enhancement provided by this invention is described below. The robust multi-view clustering system based on cross-view adaptive fusion and cluster center enhancement described below can be referred to in conjunction with the robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement described above. The system includes: The data acquisition module acquires a dataset containing the original data of multiple samples across multiple views; The feature extraction and separation module uses the encoder to extract the initial features of each view, and separates the projection features for contrastive learning and the target features for constructing the neighborhood structure from the initial features. The cross-view adaptive fusion module calculates the sample-level weight of each sample in each view based on the projection features and predefined view-level weights, and uses the sample-level weights to perform a weighted summation of the projection features to obtain the fusion features of each sample. The neighborhood graph embedding module calculates the higher-order neighborhood similarity between samples based on the target features, and uses the higher-order neighborhood similarity to generate a target matrix for correcting false negative samples in the contrast learning process. The cluster center enhancement module calculates the cluster centers of the fused view and each individual view based on the fusion feature and the projection feature, and constructs constraints based on the cluster centers to align the cluster structure and enhance the inter-cluster separation. The joint training and output module combines the constraints with the contrastive learning loss function and uses the target matrix to guide the network parameter update until the model converges and outputs the final fused features for clustering.

[0098] In this embodiment, a computer terminal device is provided, including: One or more processors; A memory, coupled to the processor, for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the steps of the robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement described above.

[0099] In this embodiment, a computer-readable storage medium is also provided, on which a computer program is stored. When the computer program is executed by a processor, it implements the steps of the robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement described above.

[0100] This invention provides a robust multi-view clustering method and system based on cross-view adaptive fusion and cluster center enhancement. By introducing a dual-weight mechanism at the view and sample levels through a cross-view adaptive fusion module, it dynamically balances view-specific reliability and sample-specific confidence, effectively solving the problems of over-fusion and under-fusion in heterogeneous information integration, preserving view-specific features and integrating complementary information. A dual-driven cluster center enhancement framework achieves dual alignment between single-view cluster centers and fused-view cluster centers, maximizing the separability between clusters while maintaining cluster consistency, thus obtaining a more discriminative clustering structure. A second-order proximity graph embedding method utilizes high-order neighborhood structural similarity instead of direct feature distance, effectively identifying and correcting false negative samples, significantly improving the robustness of feature learning. This invention achieves excellent clustering performance in both complete and incomplete multi-view scenarios, demonstrating its robustness and effectiveness in processing multi-view data.

[0101] The above are merely preferred embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A robust multi-view clustering method based on cross-view adaptive fusion and cluster center enhancement, characterized in that, Includes the following steps: Obtain a dataset containing the original data of multiple samples across multiple views; The encoder is used to extract initial features for each view, and the projection features for contrastive learning and the target features for constructing the neighborhood structure are separated from the initial features. The sample-level weight of each sample in each view is calculated based on the projection features and the predefined view-level weights, and the projection features are weighted and summed using the sample-level weights to obtain the fusion features of each sample. The higher-order neighborhood similarity between samples is calculated based on the target features, and the higher-order neighborhood similarity is used to generate a target matrix for correcting false negative samples in the contrastive learning process. The cluster centers of the fused view and each individual view are calculated based on the fusion features and the projection features, respectively. Constraints are then constructed based on the cluster centers to align the cluster structure and enhance the inter-cluster separation. The constraints are combined with the contrastive learning loss function, and the target matrix is ​​used to guide the network parameter update until the model converges. The final fused features are then output for clustering.

2. The method according to claim 1, characterized in that, Extracting initial features for each view using the encoder includes: An encoder with shared weights processes the raw data of each view to obtain initial features for each view; a first branch is separated from the initial features as contrast features, and a second branch is separated as target features; the contrast features are input to a projection head, and the projection head maps the contrast features of different views to a unified common representation space to obtain the projection features.

3. The method according to claim 1, characterized in that, Calculating the sample-level weights for each sample in each view includes: Perform a Softmax operation on the learnable parameters of each view to obtain view-level weights; perform a dot product between the projected features of each sample and the view-level weights of the corresponding view to obtain the original attention score; perform Softmax-like normalization on the original attention score to obtain sample-level weights; multiply the projected features of each view by the corresponding sample-level weights and then sum them across views to obtain the fused features.

4. The method according to claim 1, characterized in that, Generating the target matrix for correcting false negative samples includes: The similarity between samples is calculated based on the target features, and a Gaussian affinity matrix is ​​constructed. The Gaussian affinity matrix is ​​then sparsified, retaining the maximum similarity connection for each sample, and the sparsified matrix is ​​normalized to obtain a transition probability matrix. The transition probability matrix is ​​multiplied by its transpose to obtain a second-order similarity matrix. The second-order similarity matrix is ​​then weighted and summed with the identity matrix, and the result is used as the target matrix.

5. The method according to claim 1, characterized in that, The process of constructing constraints includes: Calculate the single-view cluster center of each cluster under each view and the fusion cluster center of each cluster under the fusion view using high-confidence pseudo-labels; construct a first constraint term to minimize the cosine similarity between the cluster centers of the same cluster under different views; construct a second constraint term to maximize the separation between the cluster centers of different clusters under the fusion view, and use the sum of the first constraint term and the second constraint term as the constraint condition.

6. The method according to claim 1, characterized in that, The process of combining constraints with a contrastive learning loss function includes: In the first training phase, the identity matrix is ​​used as the target matrix, and the network parameters are updated using only the contrastive learning loss function. In the second training phase, the constraints are added to the total loss function, and the identity matrix is ​​replaced with the target matrix. The network parameters are then updated jointly using the contrastive learning loss function and the constraints.

7. The method according to claim 1, characterized in that, The process of outputting the final fusion features for clustering includes: after the model training converges, extracting the fusion features of all samples; and performing the K-means clustering algorithm on the fusion features to obtain the cluster assignment result for each sample.

8. A robust multi-view clustering system based on cross-view adaptive fusion and cluster center enhancement, characterized in that, The system for implementing the method of any one of claims 1-7 comprises: The data acquisition module acquires a dataset containing the original data of multiple samples across multiple views; The feature extraction and separation module uses the encoder to extract the initial features of each view, and separates the projection features for contrastive learning and the target features for constructing the neighborhood structure from the initial features. The cross-view adaptive fusion module calculates the sample-level weight of each sample in each view based on the projection features and predefined view-level weights, and uses the sample-level weights to perform a weighted summation of the projection features to obtain the fusion features of each sample. The neighborhood graph embedding module calculates the higher-order neighborhood similarity between samples based on the target features, and uses the higher-order neighborhood similarity to generate a target matrix for correcting false negative samples in the contrast learning process. The cluster center enhancement module calculates the cluster centers of the fused view and each individual view based on the fusion feature and the projection feature, and constructs constraints based on the cluster centers to align the cluster structure and enhance the inter-cluster separation. The joint training and output module combines the constraints with the contrastive learning loss function and uses the target matrix to guide the network parameter update until the model converges and outputs the final fused features for clustering.

9. A computer terminal device, characterized in that, include: One or more processors; A memory, coupled to the processor, for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors perform the steps of the method as described in any one of claims 1-7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1-7.