A slice optimal transmission method and system based on spherical parameterization projection

The optimal transmission method for slices, which combines spherical parameterized projection and sparse update caching mechanism, solves the problems of random projection blindness and stereoscopic projection distortion in spherical data distribution alignment. It achieves efficient and accurate distribution alignment and improves the training effect of self-supervised learning and generative models.

CN122244477APending Publication Date: 2026-06-19NANJING UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF SCI & TECH
Filing Date
2026-03-04
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing optimal transmission methods for spherical slices suffer from problems such as random projection blindness, large gradient noise, high computational complexity, and geometric distortion in high-dimensional spherical data distribution alignment tasks, making it difficult to meet the requirements of deep learning tasks for real-time training and geometric accuracy.

Method used

We adopt a slice optimal transmission method based on spherical parametric projection. By constructing a learnable parametric projection network and combining it with a sparse update caching strategy, we find the most discriminative slice direction, which reduces computational complexity and improves the accuracy and robustness of distribution alignment.

🎯Benefits of technology

It achieves reduced computational complexity and improved accuracy and robustness of distribution alignment while ensuring spherical geometric consistency. It is suitable for complex tasks such as self-supervised learning and generative models, and improves the stability and scalability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244477A_ABST
    Figure CN122244477A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for optimal slice transmission based on spherical parametric projection. The method includes: performing spherical mapping preprocessing on the features of the source and target domains; constructing a projection module and a cache unit containing learnable parameters; employing a sparse update scheduling strategy to reuse the static orthogonal basis in the cache to calculate the distance in non-update steps, blocking gradient backpropagation to reduce computational overhead; further constraining parameter updates through projection diversity regularization terms to maximize the slice distribution differences while suppressing subspace overlap, thereby effectively preserving the geometric structure of the spherical manifold. This invention can achieve stable constraints and efficient alignment of the spherical latent distribution without high-frequency random sampling, improving the representation quality of self-supervised learning and the regularization effect of the latent space of the generative model, and enhancing the applicability and generalization of the algorithm.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of optimal slice transmission and distribution difference measurement technology, and in particular, it is a method and system for optimal slice transmission based on spherical parametric projection. Background Technology

[0002] In the fields of deep representation learning and generative modeling, the measurement of differences and geometric alignment between high-dimensional probability distributions constitute the core of model optimization. In spherical generative models (such as the Sliced-Wasserstein Auto-Encoder, SWAE), the generator's topology is constrained and the generation quality is improved by minimizing the difference between the latent variable distribution and the pre-set spherical prior (such as the Von Mises-Fisher distribution). In self-supervised learning (SSL), the feature distribution of the enhanced view is aligned on a unit hypersphere and its uniformity is optimized to learn feature representations with strong discriminative power and robustness.

[0003] However, traditional divergence measures (such as Kullback-Leibler divergence and Jensen-Shannon divergence) are typically based on pointwise comparisons of probability densities. When the support sets of two distributions do not overlap or overlap very little, these divergences cannot provide effective gradient signals, leading to vanishing training gradients or mode collapse, which limits the convergence performance of the model.

[0004] Optimal Transport (OT) theory, with its unique geometric properties, has become a powerful tool for solving distribution alignment problems in deep learning. OT models the distribution alignment problem as an optimization process of transferring mass from one distribution to another at the minimum cost. The induced Wasserstein distance has two core advantages: first, it can naturally capture global geometric structure information of the data space; second, it can effectively handle the case where distribution support sets do not overlap; and third, it can provide stable gradient signals when traditional divergence fails. Despite the superior properties of OT theory, its engineering applications face severe computational bottlenecks. Solving the classic OT problem is equivalent to solving a linear programming problem, with a computational complexity of O(n log n). The entropy regularization algorithm reduces the complexity to zero by introducing entropy constraints. However, when processing large-scale, high-dimensional batch data in deep neural networks, the computational overhead and memory usage are enormous, and the choice of regularization parameters needs to be balanced between computational speed and solution accuracy.

[0005] To overcome the computational barriers of high-dimensional operational modeling (OT), Sliced ​​Optimal Transport (SOT) and its derivative, Sliced ​​Wasserstein Distance (SW), have emerged. The core mathematical idea is based on the Radon transform principle: mapping a high-dimensional probability distribution to a one-dimensional space through linear projection. Utilizing the existence of closed-form solutions in one-dimensional Wasserstein distance, the expected integral of the one-dimensional Wasserstein distance across numerous projection directions approximates the high-dimensional distance. Compared to classical OT, SW significantly reduces computational complexity and is easily parallelized, thus finding widespread application in tasks such as generative models (e.g., Sliced ​​Wasserstein Autoencoders, SWAE).

[0006] However, Euclidean slicing methods exhibit significant limitations when dealing with data possessing specific geometric constraints, particularly spherical data. Especially in self-supervised learning tasks such as SimCLR and MoCo, and directional statistical modeling tasks, feature vectors are typically normalized and strictly constrained to a unit hypersphere to eliminate scale ambiguity. In such scenarios, directly applying Euclidean-based linear slicing strategies not only ignores the inherent manifold structure of the data but also introduces severe geometric mismatches. To address this issue, the academic community proposed Spherical Sliced ​​Wasserstein (SSW), advocating for measuring distributional differences by slicing large circles on a hypersphere. Furthermore, as an alternative approach, Stereographic Spherical Sliced ​​Wasserstein (S3W) attempts to map spherical data back to Euclidean space through stereo projection, then utilizes established linear slicing methods for computation.

[0007] Although the above methods have theoretically adapted to spherical geometry to a certain extent, they still have significant technical defects and application bottlenecks in actual deep learning training scenarios.

[0008] The Wasserstein (SSW) method for spherical slicing suffers from the following limitations in deep training: First, random projection faces severe sampling failures and training instability. The SSW method relies on Monte Carlo sampling approximate integration, randomly sampling a set of projection directions from a uniform distribution in each training step. However, in high-dimensional spherical space, due to the phenomenon of high-dimensional geometric concentration, randomly selected directions are orthogonal to the low-dimensional subspace containing the main differences in the data with a very high probability, making it difficult to capture highly discriminative directions that can significantly distinguish between the two distributions. This "blind" sampling strategy results in a large variance in the SSW distance estimation, introducing significant gradient noise and causing slow model convergence or even training oscillations. Second, to achieve the goal of uniformly sampling large circular slices on the sphere, SSW requires performing QR decomposition on the newly sampled random matrix each time. This high-frequency matrix operation is time-consuming, and the complexity of QR decomposition increases rapidly with the increase of feature dimensions, becoming a major bottleneck restricting training speed.

[0009] For stereo projection-based S3W and its variants (including RI-S3W and ARI-S3W), their limitations in deep training are as follows: First, insufficient geometric consistency; the projection and distance are still constructed within the Euclidean linear slicing framework, essentially approximating the geodesic structure of the sphere, without explicitly performing geometrically consistent slicing measurements on the sphere. When the goal is spherical uniformity or spherical distribution fitting, it introduces curvature-related systematic errors. Second, resource consumption due to compensation strategies; to alleviate projection distortion, improved strategies such as RI-S3W and ARI-S3W rely on multiple random rotations, maintaining a large candidate rotation pool, and multiple re-evaluations. This approach of stacking sampling times causes the algorithm's time complexity and memory overhead to increase linearly or even exponentially with the number of rotations and pool size, limiting the training efficiency and scalability of the model on large-scale high-dimensional data.

[0010] Overall, existing optimal transfer methods for spherical slicing still have significant limitations in spherical data distribution alignment and their application in tasks such as self-supervised learning and generative models. On the one hand, basic spherical slicing methods (such as SSW) employ random projection and instantaneous orthogonalization strategies. In high-dimensional spherical spaces, these methods are affected by the curse of dimensionality and geometric concentration. The randomly selected projection directions often fail to hit the discriminative subspace containing the main distribution differences, resulting in large distance estimation variance and introducing significant gradient noise. This affects the convergence of feature uniformity in self-supervised learning and the pattern coverage quality of generative models. On the other hand, while existing improved methods (such as S3W and its variants) have improved metric accuracy or geometric fit to some extent through stereo projection correction, rotation pooling strategies, or optimization search mechanisms, they rely on expensive rotation matrix pooling maintenance and multiple random rotations, leading to high computation time and memory overhead. Furthermore, the stereo projection process inevitably introduces curvature-related geometric distortions and systematic errors. These factors collectively limit the efficiency and scalability of optimal transfer in high-dimensional spherical data distribution alignment, making it difficult to meet the requirements of real-time training and geometric accuracy in deep learning tasks. Therefore, there is an urgent need to design a new slice optimal transmission method based on spherical parameterized projection and caching mechanism, which can effectively overcome the blindness of random projection and the geometric distortion of stereo projection, while taking into account the amortization of computational cost, the discriminative power of projection direction and the geometric consistency of manifold, so as to achieve a more efficient and accurate spherical distribution measurement and provide a more robust foundation for complex tasks such as self-supervised representation learning and spherical generation models. Summary of the Invention

[0011] The purpose of this invention is to solve the problems mentioned in the background art. It proposes a method and system for optimal slice transmission based on spherical parametric projection. By constructing a learnable parametric projection network to find the most discriminative slice direction, and combining the closed solution of circumferential optimal transmission with a sparse update cache strategy, it achieves the reduction of computational complexity and the improvement of distribution alignment accuracy and robustness while ensuring the geometric consistency of the sphere.

[0012] The technical solution to achieve the objective of this invention is: an optimal transmission method for slices based on spherical parametric projection, the method comprising:

[0013] Step 1, Feature Extraction and Spherical Embedding: Obtain training data, input the training data into the feature extraction network of the target model to obtain a first feature set and a second feature set, and perform unit spherical normalization on the first feature set and the second feature set, constraining their feature vectors to a unit hypersphere to obtain normalized spherical feature data;

[0014] Step 2: Parametric projection network construction and cache initialization; Initialize the parametric projection network with learnable parameters, perform orthogonalization on the learnable parameters to generate an orthogonal projection matrix that satisfies the spherical geometric constraints, and write the orthogonal projection matrix into the cache unit to complete the cache initialization;

[0015] Step 3: Subspace projection mapping and circumferential transmission aggregation; Read the orthogonal projection matrix from the cache unit, project the spherical feature data onto the two-dimensional subspace spanned by the orthogonal projection matrix and map it into circumferential angle coordinates, calculate the optimal transmission distance of the slice between the first feature set and the second feature set based on the circumferential angle coordinates and aggregate them to obtain the optimal transmission distance of the slice based on the spherical parameterized projection.

[0016] Step 4: Sparse update scheduling and cache consistency maintenance; Execute the sparse update scheduling strategy, update the parameterized projection network parameters according to the preset time step interval during training, and refresh the cache unit after the parameter update; In time steps where no update is triggered, the parameters remain unchanged and the orthogonal projection matrix of the cache unit is reused for calculation.

[0017] Step 5: Model parameter update and result output; The optimal transmission distance of the slice based on spherical parameterized projection is used as part of the total loss function. Backpropagation is performed to update the target model parameters, and the updated target model parameters and the trained target model are output to achieve the downstream task objectives and perform data analysis.

[0018] The present invention also provides an optimal slice transmission system based on spherical parametric projection for implementing the above method, the system comprising:

[0019] The feature extraction and spherical embedding module acquires training data, inputs the training data into the feature extraction network of the target model to obtain a first feature set and a second feature set, and performs unit spherical normalization on the first feature set and the second feature set, constraining their feature vectors to a unit hypersphere to obtain normalized spherical feature data.

[0020] The parameterized projection network construction and cache initialization module initializes the parameterized projection network with learnable parameters, performs orthogonalization on the learnable parameters to generate an orthogonal projection matrix that satisfies the spherical geometric constraints, and writes the orthogonal projection matrix into the cache unit to complete the cache initialization.

[0021] The subspace projection mapping and circumferential transmission aggregation module reads the orthogonal projection matrix from the cache unit, projects the spherical feature data onto the two-dimensional subspace spanned by the orthogonal projection matrix and maps it into circumferential angle coordinates. Based on the circumferential angle coordinates, it calculates and aggregates the optimal transmission distance of the slice between the first feature set and the second feature set to obtain the optimal transmission distance of the slice based on the spherical parameterized projection.

[0022] The sparse update scheduling and cache consistency maintenance module executes the sparse update scheduling strategy. During training, it updates the parameterized projection network parameters according to the preset time step interval and refreshes the cache unit after the parameter update. In time steps where no update is triggered, the parameters remain unchanged and the orthogonal projection matrix of the cache unit is reused for calculation.

[0023] The model parameter update and result output module takes the optimal transmission distance of the slice based on spherical parameterized projection as part of the total loss function, performs backpropagation to update the target model parameters, and outputs the updated target model parameters and the trained target model to achieve the downstream task objectives and perform data analysis.

[0024] It should be noted that the technical details in the system and method provided in this application correspond one-to-one. The technical details of the system will not be repeated here. Please refer to the method provided in this application for details.

[0025] An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method described above.

[0026] A computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of the above-described method.

[0027] A computer program product includes a computer program that, when executed by a processor, implements the steps of the above-described method.

[0028] Compared with the prior art, the significant advancements of this invention are: (1) Geometric consistency and distortion-free measurement: Unlike the spherical slicing method based on stereo projection, this invention determines the projection direction through a parameterized orthogonal matrix, directly projecting high-dimensional spherical data onto a one-dimensional circular submanifold. This process avoids the measurement distortion caused by stereo projection transformation, and can truly reflect the geodesic distance relationship between spherical data, thus more accurately characterizing the topological structure. (2) Strong discriminability and convergence robustness: Compared with the random trial-and-error strategy of the spherical slicing method, this invention actively searches for the optimal slicing direction using a learnable parameterized orthogonal basis. This effectively avoids the interference of invalid random projection on distance estimation, reduces the variance of Monte Carlo integrals, provides accurate and stable gradient guidance for model training, and accelerates the convergence process of the model. (3) The method of this invention has good scalability and can be applied to various distribution comparison tasks, improving the stability and reliability of the model.

[0029] To more clearly illustrate the functional characteristics and structural parameters of the present invention, further explanation is provided below in conjunction with the accompanying drawings and specific embodiments. Attached Figure Description

[0030] Figure 1This is a flowchart of the overall method of the present invention;

[0031] Figure 2 A schematic diagram of the spherical feature distribution provided in this embodiment of the invention;

[0032] Figure 3 A visualization of the spherical feature distribution provided by the Wasserstein spherical slice embodiment;

[0033] Figure 4 A schematic diagram of the spherical feature distribution provided by the Wasserstein embodiment of stereoscopic projection spherical slices. Detailed Implementation

[0034] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0035] like Figure 1 As shown, this invention proposes a Parameterized Spherical Sliced ​​Wasserstein (PSSW) method to address the task of aligning high-dimensional spherical data distributions. This method, executed by a computer, is used to match and constrain the spherical feature distribution during the training of the target model. This invention finds more discriminative slice directions by constructing a learnable parameterized projection matrix, and combines a sparse update scheduling strategy and a cache consistency mechanism to reduce computational overhead while ensuring effective preservation of the spherical manifold topology. The method provided in this application is widely used in tasks based on inter-distribution distance metrics. This embodiment will focus on describing the specific implementation process of this method in two core scenarios: Sliced ​​Wasserstein Autoencoders (SWAE) and Self-Supervised Learning (SSL).

[0036] Step 1: Feature extraction and spherical embedding preprocessing;

[0037] Step 1-1: Obtain the first feature set With the second feature set The training data includes input training sample data and reference data; the input training sample data is input into the feature extraction network of the target model to obtain a first feature set. Obtain the second feature set The second feature set The data can be obtained in one of the following ways: firstly, by inputting a second dataset into the feature extraction network, wherein the second dataset is target domain sample data or augmented view data originating from the same source as the first dataset; secondly, by generating reference vector data by a reference generation module based on a preset reference distribution. For batch size, For feature dimension, where ;

[0038] Steps 1-2: For the first feature set With the second feature set Perform unit spherical normalization so that it lies on the unit hypersphere:

[0039] , ,get ;

[0040] in, and They represent respectively to and The eigenvectors after unit sphere normalization Represents the Euclidean norm. These are the first and second feature sets after normalization, respectively. express In Hypersphere is a unit of dimension.

[0041] This step forces all features to be constrained. 1-dimensional unit hypersphere Above. This preprocessing step eliminates the scale ambiguity of the feature modulus, ensuring that subsequent slicing and distance calculations strictly follow the geometric constraints of the Riemannian manifold, avoiding geometric mismatch caused by Euclidean space metrics.

[0042] Step 2: Constructing the parametric projection network and initializing the cache;

[0043] Step 2-1: Constructing the parameterized projection tensor; Constructing the parameterized tensor , of which Each slice is , , This represents the total number of slices.

[0044] Step 2-2, Cache initialization; for each slice Applying the two-dimensional orthogonalization operator Two-dimensional orthogonal basis is obtained and satisfy ,in This represents the transpose operator. for The identity matrix; and the set of two-dimensional orthogonal bases for all slices. Write it to the cache unit as a reused object for projection calculation in subsequent iterations.

[0045] Steps 2-3, Orthogonalization Operator Employ the Gram-Schmidt shrinkage method; divide each slice By successively normalizing and orthogonalizing, a two-dimensional orthogonal basis is obtained. and satisfy ;in, and They are respectively The first and second column vectors, For the reason The first basis vector obtained by normalization, To be right The second basis vector after orthogonalization and normalization, where This represents the transpose operator. for Identity matrix.

[0046] Step 3: Subspace Projection Mapping and Circular Transport Aggregation; In this embodiment of the invention, step 3 aims to map high-dimensional spherical features to a one-dimensional circular space using a cached orthogonal basis, and efficiently calculate distribution differences through a closed-form solution. The specific implementation process is as follows:

[0047] Step 3-1, Two-dimensional subspace projection: Project the spherical feature data onto a set of two-dimensional orthogonal bases. Zhang Cheng's two-dimensional subspace, for any The normalized feature vector and Projected onto a two-dimensional orthogonal basis Zhang Cheng's two-dimensional subspace:

[0048] , ;

[0049] in, and These represent the eigenvectors in the first normalized feature set and the eigenvectors in the second normalized feature set, respectively, at the 6th... Coordinates on a two-dimensional slice plane.

[0050] Step 3-2: Normalize to the unit circle; for two-dimensional projected coordinates and Perform Euclidean norm Normalization yields a two-dimensional vector lying on the unit circle:

[0051] , ;

[0052] in and These are the eigenvectors in the first normalized feature set and the eigenvectors in the second normalized feature set, respectively, at the 6th... The coordinates of the unit circle on the slice plane; this step eliminates the scaling effect that may occur during projection, ensuring the accuracy of subsequent angle calculations.

[0053] Step 3-3: Map to circumferential angles This represents mapping unit circle coordinates to circumferential angles and linearly normalizing them. Interval, denote the coordinates of the unit circle on the slice plane. , ,in , They represent The first and second components then correspond to the normalization angle as follows:

[0054] , ;

[0055] in This is the arctangent function in the four quadrants, and the output angle range is... After addition and divide by Later obtained Normalized circumferential parameters on; , These represent the eigenvectors in the first normalized feature set and the eigenvectors in the second normalized feature set, respectively, at the 6th... The circumferential angles on each slice plane are represented.

[0056] Steps 3-4: Construct the angle set; aggregate the samples within the batch to obtain the first angle set. A set of angles for slices

[0057] , ;

[0058] in, , These are the normalized first feature angle set and the normalized second feature angle set, respectively. For batch size.

[0059] Steps 3-5: Calculation of the optimal circumferential transmission distance; Based on the second-order Wasserstein distance, calculate the normalized set of the first characteristic angles. With the normalized second feature angle set The distribution differences between them; the calculation process includes two key steps: sequence sorting and optimal cycle alignment:

[0060] Sequence sorting; sorting two angle sets in ascending order to obtain the sorted sequence. and ,in and Indicates the sorted order of the first... The angle value at each position;

[0061] ;

[0062] ;

[0063] Optimal cyclic alignment based on binary search; on the circumference Introducing cyclic displacement By finding the optimal displacement To achieve optimal cyclic alignment that minimizes the mean square error of the two sorted sequences, the following definition is made: The second-order Wasserstein distance of a slice of circular surface for:

[0064] ;

[0065] in and Indicates the sorted order of the first... The angle value at each position, " indicates taking the modulo of 1 to ensure the result is still within the range of 1 / 2. The interval, the optimal displacement It can be obtained through binary search, based on The calculated distance value is denoted as the current distance. The final transmission cost of a slice ;

[0066] Steps 3-6: Aggregate spherical slice distances; aggregate the distances of all slices. Aggregation is performed to obtain the optimal transmission distance of the slice based on spherical parametric projection. :

[0067] ;

[0068] in and These are the first and second feature sets after normalization, respectively. This represents the generation or characterization of parametric projection networks. The set of learnable parameters , This represents the total number of slices.

[0069] Step 4: Preheating Initialization, Sparse Update Scheduling, and Cache Consistency Maintenance; To address the computational bottleneck caused by high-frequency orthogonalization, this embodiment employs a sparse update scheduling strategy. The preheating length is set. Update interval In the current iteration step The following scheduling logic is executed at that time:

[0070] Step 4-1, Preheating stage, First, establish an initial orthogonal basis. In the early stages of training, the features of the model's backbone network are not yet stable, and the projection directions do not require excessive refinement. At this point, perform optimal transport projection sampling (i.e., random sampling) on ​​spherical slices to generate a set of two-dimensional orthogonal bases. and write it directly back to the parameter tensor Initialization is complete. Subsequently, a cache refresh operation is triggered, storing the generated orthogonal basis into the cache unit. This stage ensures the exploratory nature of the early training phase and avoids getting trapped in local optima.

[0071] Step 4-2, Sparse Reuse Stage: When the iteration step count... and Cannot be During division, if the process is in a non-update step, it is determined that the projection direction does not need to be updated. Maintain the parameterized projection network. The gradient is truncated, backpropagation is skipped, and the expensive matrix orthogonalization process is avoided. The system performs a cache consistency check, compares the version identifier, and directly reuses the two-dimensional orthogonal basis set stored in the cache unit. Perform subsequent projection and distance calculations.

[0072] Step 4-3: Update and project diversity regularization term, when and Can be During division, the optimal discrimination direction is sought; when in the update step, it is determined that the projection direction needs to be refreshed to adapt to the changes in the current feature distribution. At this time, the backbone network parameters are frozen, the parameterized projection network is updated to maximize the optimal transmission distance of the spherical slice, and a projection diversity regularization term is introduced to suppress the overlap of different slice subspaces. for:

[0073] ,

[0074] in It is a two-dimensional orthogonal basis set. The total number of slices, It is the Frobenius norm. and The first The and the first An orthogonal basis corresponding to each slice; an adversarial update objective is constructed based on the regularization term:

[0075] ,

[0076] in, To determine the optimal transmission distance for slices based on spherical parametric projection, It is a two-dimensional orthogonal basis set. These are the weighting coefficients for the projection diversity regularization term. The regularization term is used to measure the degree of overlap between different slice projection subspaces. A larger value indicates a stronger overlap between different slices. This is achieved by subtracting from the objective function. To encourage projection diversity; by differentiating the adversarial update objective, the parameter tensor is... Perform gradient ascent update, then orthogonalize operator shrink, and perform cache refresh operation to ensure that the data in the cache always satisfies the orthogonality constraint.

[0077] Step 4-4: Cache consistency maintenance is achieved through a version identification mechanism: parameter tensor Set version identifier Two-dimensional orthogonal basis set Set cache version identifier ,when Update command When detected At that time, perform orthogonalization shrinkage on each slice. And refresh the cache unit, while making ;when Directly reuse the cache The objects stored in the cache satisfy Two-dimensional orthogonal basis set ,in for Identity matrix.

[0078] Step 5: Model parameter update and result output; The calculated PSSW distance... As part of the loss function, the parameters of the backbone network are updated through the gradient backpropagation algorithm to achieve specific downstream task objectives and perform data analysis.

[0079] Step 5-1: Construct the total loss function: Introduce the PSSW distance obtained from Step 3-6 into the overall optimization objective. ;

[0080] In the Spherical Generative Model (SWAE) scenario, the PSSW distance is used as a regularization term to constrain the latent variable distribution to approximate the prior distribution. The total loss is defined as the weighted sum of the reconstruction loss and the distribution matching loss:

[0081] ;

[0082] in Original input data, Reconstruct the data; MSE reconstruction loss is used to measure the difference between the reconstructed data and the original input data; The prior distribution is a pre-defined uniform spherical surface. For PSSW distance, For parameterized tensors, The encoder network encodes the original input data as a distribution matching regularization term, which is used to constrain the distribution of latent variables to approximate the prior distribution. The weighting coefficients are used to balance the reconstruction loss with the distribution matching regularization.

[0083] In self-supervised learning (SSL) scenarios: As a contrastive alignment loss, it is used to narrow the feature distance between homologous samples in different views. The total loss can be defined as:

[0084]

[0085] in, and The feature representation of the same data extracted by the encoder and mapped to a unit hypersphere under different augmented views; MSE alignment loss is used to shorten the feature distance between homologous samples; The prior distribution is a pre-defined uniform spherical surface. For parameterized tensors, The PSSW distance is used as a uniformity regularization term to prevent feature collapse. The weighting coefficients are used to balance alignment loss and uniformity regularization.

[0086] This invention, based on the proposed Parametric Spherical Sliced ​​Wasserstein (PSSW) method, addresses the deep model training problem in computer vision and machine learning that relies on unit hyperspherical representations. It conducts systematic experimental evaluations of the geometric consistency and computational efficiency of high-dimensional spherical feature distribution alignment, and compares these evaluations with existing optimal transfer methods for spherical slicing. The experiments cover two typical application scenarios: first, Spherical Generative Model Training (SWAE), which uses image data as input and matches the encoder's output spherical latent representation distribution with a preset reference distribution to improve the consistency of the latent space distribution and reconstruction stability; second, Self-Supervised Learning (SSL), which uses unlabeled images and their augmented views as input and applies alignment and uniformity regularization to the spherical feature distribution to suppress representation collapse and improve the quality of transferable representations, thereby serving downstream tasks such as image classification, image retrieval, and recognition. The above experiments aim to verify that, without relying on high-frequency random sampling projection direction, the present invention can still achieve stable constraint and efficient alignment of spherical distribution, thereby improving the numerical stability and engineering usability of the training process, and enhancing the applicability and promotion capability of the algorithm.

[0087] Combination Figures 2 to 4 , Figure 3 and Figure 4 This demonstrates the coverage blind spots and structural biases inherent in existing methods when attempting to approximate a uniform spherical distribution. Figure 3 Corresponding to the Wasserstein method for spherical tiling, which relies on randomly sampled tiling directions, it cannot actively capture the topological structure that maximizes the discriminative feature distribution. Visualization results show that feature points exhibit obvious local clustering on the sphere, with large areas of blank space. Uneven distribution means the model fails to fully utilize the latent space, resulting in limited feature richness and a tendency to get trapped in local optima. Figure 4 Corresponding to the Wasserstein method based on stereo projection, this method maps a sphere onto a plane using stereo projection, introducing unavoidable area distortion and density bias. Visualization results show that the feature distribution exhibits a distinct banded structure and compression effect, failing to accurately reflect the geometric relationships of the data on the original manifold; geometric distortion reduces the model's discriminative power for certain categories, thus affecting the accuracy of downstream classification tasks. Figure 2Corresponding to the PSSW method proposed in this invention, this method actively searches and optimizes high-information slice paths by introducing parametric projection and sparse update mechanisms, strictly adhering to spherical geodesic measurements to avoid geometric distortions introduced by stereo projection. Visual results show that features exhibit a highly uniform and continuous distribution on the hypersphere, eliminating unnatural banded structures and local compression effects, achieving full coverage and distortion-free representation of the latent space, and effectively preserving the geometric topology and semantic discriminativeness of the data. This realistic, distortion-free, and discriminative spherical geometric representation proves that this method can more accurately capture the essential structure of the data manifold, thereby significantly improving the model's ability to distinguish features and laying a solid foundation for improving the accuracy of downstream classification tasks.

[0088] As shown in Table 1, this invention demonstrates significant superiority in the feature linearity evaluation of self-supervised learning tasks. In existing technologies, the spherical tiling Wasserstein (SSW) method suffers from blind feature extraction due to its reliance on random tiling, while the stereo projection spherical tiling Wasserstein (S3W) method limits feature expressiveness due to area and density distortion introduced by stereo projection. To address these shortcomings, the PSSW method proposed in this invention abandons random sampling and planar projection, and by introducing learnable parameterized tiling, achieves efficient alignment of feature distribution while maintaining the integrity of the spherical topology. Experimental data show that this invention achieves 79.98% and 75.11% in Acc. E and Acc. P metrics, respectively, without adding additional computational burden. This fully demonstrates the effectiveness of this invention in resolving the contradiction between spherical feature collapse and alignment, significantly improving the representation quality of self-supervised learning.

[0089] Table 1: Feature Linearity Evaluation Table for Self-Supervised Learning (SSL) Tasks

[0090]

[0091] Table 2 details the quantitative evaluation results of this invention for latent spatial distribution alignment in the Sliced ​​Wasserstein Autoencoders (SWAE) generative model task. To verify whether the model can effectively prevent mode collapse, the experiment focused on examining the Wasserstein distance between the generated distribution and the prior distribution. The experimental data show that the spherical slicing Wasserstein (SSW) method suffers from low convergence efficiency due to its reliance on a large amount of random sampling to cover the high-dimensional space; while the stereo projection spherical slicing Wasserstein (S3W) method introduces deterministic projection, it destroys the intrinsic geometry of the sphere and introduces metric errors. In contrast, this invention eliminates projection distortion and achieves more accurate distribution matching by directly defining parameterized orthogonal slices on the sphere. The results show that PSSW... It outperforms the comparison method in terms of (-3.4052) and NLL (-0.0048), and the training time (15.4054 s / ep.) is only about 60% of that of SSW, successfully achieving efficient latent space regularization while ensuring geometric fidelity.

[0092] Table 2: Evaluation Table of Potential Distribution Alignment in SWAE Generation Tasks

[0093]

[0094] Experimental results show that the spherical parametric projection slicing Wasserstein (PSSW) method of the present invention outperforms existing methods in both discriminative and generative multitasking scenarios. It solves the randomness and distortion problems of traditional spherical slicing methods, achieves accurate capture of spherical topology, and has good theoretical significance and application value.

[0095] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A method for optimal transmission of slices based on spherical parametric projection, characterized in that, The method includes: Step 1, Feature Extraction and Spherical Embedding: Obtain training data, input the training data into the feature extraction network of the target model to obtain a first feature set and a second feature set, and perform unit spherical normalization on the first feature set and the second feature set, constraining their feature vectors to a unit hypersphere to obtain normalized spherical feature data; Step 2: Parametric projection network construction and cache initialization; Initialize the parametric projection network with learnable parameters, perform orthogonalization on the learnable parameters to generate an orthogonal projection matrix that satisfies the spherical geometric constraints, and write the orthogonal projection matrix into the cache unit to complete the cache initialization; Step 3: Subspace projection mapping and circumferential transmission aggregation; Read the orthogonal projection matrix from the cache unit, project the spherical feature data onto the two-dimensional subspace spanned by the orthogonal projection matrix and map it into circumferential angle coordinates, calculate the optimal transmission distance of the slice between the first feature set and the second feature set based on the circumferential angle coordinates and aggregate them to obtain the optimal transmission distance of the slice based on the spherical parameterized projection. Step 4: Sparse update scheduling and cache consistency maintenance; Execute the sparse update scheduling strategy, update the parameterized projection network parameters according to the preset time step interval during training, and refresh the cache unit after the parameter update; In time steps where no update is triggered, the parameters remain unchanged and the orthogonal projection matrix of the cache unit is reused for calculation. Step 5: Model parameter update and result output; The optimal transmission distance of the slice based on spherical parameterized projection is used as part of the total loss function. Backpropagation is performed to update the target model parameters, and the updated target model parameters and the trained target model are output to achieve the downstream task objectives and perform data analysis.

2. The method according to claim 1, characterized in that, Step 1, Feature Extraction and Spherical Normalization, includes: Step 1-1: Obtain the first feature set With the second feature set The training data includes input training sample data and reference data; the input training sample data is input into the feature extraction network of the target model to obtain a first feature set. Obtain the second feature set The second feature set The data is obtained through one of the following methods: firstly, by inputting a second dataset into the feature extraction network, wherein the second dataset is target domain sample data or enhanced view data originating from the same source as the first dataset; secondly, by generating reference vector data by a reference generation module based on a preset reference distribution; wherein... For batch size, For feature dimension, where ; Step 1-2: For the first feature set With the second feature set Perform unit spherical normalization so that it lies on the unit hypersphere: , ,get ; in, and They represent respectively to and The eigenvectors after unit sphere normalization Represents the Euclidean norm. These are the first and second feature sets after normalization, respectively. express In The unit of a hypersphere.

3. The method according to claim 2, characterized in that, Step 2, Parametric projection construction and cache initialization, includes: Step 2-1: Constructing the parameterized projection tensor; Constructing the parameterized tensor , of which Each slice is , , This represents the total number of slices. Step 2-2, Cache initialization; for each slice Applying the two-dimensional orthogonalization operator Two-dimensional orthogonal basis is obtained and satisfy ,in This represents the transpose operator. for The identity matrix; and the set of two-dimensional orthogonal bases for all slices. Write it to the cache unit as a reused object for projection calculation in subsequent iterations; Steps 2-3, Orthogonalization Operator Employ the Gram-Schmidt shrinkage method; divide each slice By successively normalizing and orthogonalizing, a two-dimensional orthogonal basis is obtained. and satisfy ;in, and They are respectively The first and second column vectors, For the reason The first basis vector obtained by normalization, To be right The second basis vector after orthogonalization and normalization, where This represents the transpose operator. for Identity matrix.

4. The method according to claim 3, characterized in that, Step 3, Subspace Projection Mapping and Circular Transport Aggregation, includes: Step 3-1, Two-dimensional subspace projection: Project the spherical feature data onto a set of two-dimensional orthogonal bases. Zhang Cheng's two-dimensional subspace, for any The normalized feature vector and Projected onto a two-dimensional orthogonal basis Zhang Cheng's two-dimensional subspace: , ; in, and These represent the eigenvectors in the first normalized feature set and the eigenvectors in the second normalized feature set, respectively, at the 6th... Coordinates on a two-dimensional slice plane; Step 3-2: Normalize to the unit circle; for two-dimensional projected coordinates and Perform Euclidean norm Normalization yields a two-dimensional vector lying on the unit circle: , ; in and These are the eigenvectors in the first normalized feature set and the eigenvectors in the second normalized feature set, respectively, at the 6th... Unit circle coordinates on a slice plane; Step 3-3: Map to circumferential angles This represents mapping unit circle coordinates to circumferential angles and linearly normalizing them. Interval, denote the coordinates of the unit circle on the slice plane. , ,in , They represent The first and second components then correspond to the normalization angle as follows: , ; in This is the arctangent function in the four quadrants, and the output angle range is... After addition and divide by Later obtained Normalized circumferential parameters on; , These represent the eigenvectors in the first normalized feature set and the eigenvectors in the second normalized feature set, respectively, at the 6th... The circumferential angles on the slice plane are represented; Steps 3-4: Construct the angle set; aggregate the samples within the batch to obtain the first angle set. A set of angles for slices , ; in, , These are the normalized first feature angle set and the normalized second feature angle set, respectively. Batch size; Steps 3-5: Calculation of the optimal circumferential transmission distance; Based on the second-order Wasserstein distance, calculate the normalized set of the first characteristic angles. With the normalized second characteristic angle set The distribution differences between them; the calculation process includes two key steps: sequence sorting and optimal cycle alignment. Sequence sorting; sorting two angle sets in ascending order to obtain the sorted sequence. and ,in and Indicates the sorted order of the first... The angle value at each position; ; ; Optimal cyclic alignment based on binary search; on the circumference Introducing cyclic displacement By finding the optimal displacement To achieve optimal cyclic alignment that minimizes the mean square error of the two sorted sequences, the following definition is made: The second-order Wasserstein distance of a slice of circular sphere for: ; in and Indicates the sorted order of the first... The angle value at each position, " indicates taking the modulo of 1 to ensure the result is still within the range of 1 / 2. The interval, the optimal displacement Obtained through binary search, based on The calculated distance value is denoted as the current distance. The final transmission cost of a slice ; Steps 3-6: Aggregate spherical slice distances; aggregate the distances of all slices. Aggregation is performed to obtain the optimal transmission distance of the slice based on spherical parametric projection. : ; in and These are the first and second feature sets after normalization, respectively. This represents the generation or characterization of parametric projection networks. The set of learnable parameters , This represents the total number of slices.

5. The method according to claim 4, characterized in that, Step 4, preheating initialization and sparse update scheduling and cache consistency maintenance, including: Step 4-1, Preheating Initialization: Set the preheating length When the iteration step count At that time, the optimal transmission projection sampling of the spherical slice is performed to generate an orthogonal projection matrix, which is then written back to the parameterized projection network to complete the initialization and refresh the cache unit; Step 4-2, Sparse Reuse Determination: Set Update Interval When the iteration step count and Cannot be When dividing, the parameters of the parameterized projection network remain unchanged, and the orthogonal projection matrix stored in the cache unit is reused to perform the calculation described in step 3; Step 4-3, Update and Project Diversity Regularization: When and Can be When dividing by integers, update the parameters of the parametric projection network. Maximize the objective function Then the parameter tensor Perform gradient updates, shrink using the orthogonalization operator, and refresh the cache units. The definition is as follows: ; in, To determine the optimal transmission distance for slices based on spherical parametric projection, It is a two-dimensional orthogonal basis set. These are the weighting coefficients for the projection diversity regularization term. The regularization term is used to measure the degree of overlap between different slice projection subspaces. A larger value indicates a stronger overlap between different slices. This is achieved by subtracting from the objective function. To encourage projection diversity; Step 4-4: Cache consistency maintenance is achieved through a version identification mechanism: parameter tensor Set version identifier Two-dimensional orthogonal basis set Set cache version identifier ,when Update command When detected At that time, perform orthogonalization shrinkage on each slice. And refresh the cache unit, while making ;when Directly reuse the cache The objects stored in the cache satisfy Two-dimensional orthogonal basis set ,in for Identity matrix.

6. The method according to claim 5, characterized in that, Step 4-3: When determining if it is an update step, a projection diversity regularization term is introduced to suppress the overlap of different slice subspaces. The regularization term is: , The total number of slices, It is the Frobenius norm. and The first The and the first The orthogonal basis corresponding to each slice.

7. A slice optimal transmission system based on spherical parametric projection, characterized in that, The system is used to implement any one of the methods of claims 1 to 6, and the system comprises: The feature extraction and spherical embedding module acquires training data, inputs the training data into the feature extraction network of the target model to obtain a first feature set and a second feature set, and performs unit spherical normalization on the first feature set and the second feature set, constraining their feature vectors to a unit hypersphere to obtain normalized spherical feature data. The parameterized projection network construction and cache initialization module initializes the parameterized projection network with learnable parameters, performs orthogonalization on the learnable parameters to generate an orthogonal projection matrix that satisfies the spherical geometric constraints, and writes the orthogonal projection matrix into the cache unit to complete the cache initialization. The subspace projection mapping and circumferential transmission aggregation module reads the orthogonal projection matrix from the cache unit, projects the spherical feature data onto the two-dimensional subspace spanned by the orthogonal projection matrix and maps it into circumferential angle coordinates. Based on the circumferential angle coordinates, it calculates and aggregates the optimal transmission distance of the slice between the first feature set and the second feature set to obtain the optimal transmission distance of the slice based on the spherical parameterized projection. The sparse update scheduling and cache consistency maintenance module executes the sparse update scheduling strategy. During training, it updates the parameterized projection network parameters according to the preset time step interval and refreshes the cache unit after the parameter update. In time steps where no update is triggered, the parameters remain unchanged and the orthogonal projection matrix of the cache unit is reused for calculation. The model parameter update and result output module takes the optimal transmission distance of the slice based on spherical parameterized projection as part of the total loss function, performs backpropagation to update the target model parameters, and outputs the updated target model parameters and the trained target model to achieve the downstream task objectives and perform data analysis.

8. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the method described in any one of claims 1-6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program implements the steps of the method described in any one of claims 1-6.

10. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the steps of the method described in any one of claims 1-6.