A topology-aware hierarchical index construction and load balancing method for multimedia content retrieval
By constructing a topology-aware hierarchical index and using a load balancing method, the bottleneck problem of geometric metric partitioning in large-scale multimedia data processing was solved, achieving efficient and stable multimedia similarity search and improving query performance and recall.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TAIYUAN UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies rely on geometric metric partitioning when processing large-scale multimedia data, resulting in high construction costs, a sharp increase in memory consumption, query performance bottlenecks, and poor performance when dealing with the complex semantic distribution of multimedia data.
A topology-aware hierarchical indexing method is adopted, which constructs a global nearest neighbor topology graph, multi-level graph partitioning, and parallel HNSW sub-indexes. Combined with lightweight gated neural networks and soft-label supervised training, it achieves high recall, low latency, and high scalability for multimedia similarity search.
It significantly improved recall, eliminated long-tail latency, maximized the utilization of multi-core CPUs, reduced query latency and build time, and improved the model's generalization ability and robustness.
Smart Images

Figure CN122309766A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of multimedia content retrieval, specifically to a topology-aware hierarchical index construction and load balancing method for multimedia content retrieval. Background Technology
[0002] With the rapid development of mobile internet and artificial intelligence technologies, multimedia content such as images, videos, and audio is experiencing explosive growth in various application scenarios. To achieve efficient management and retrieval of massive amounts of multimedia content, the industry commonly employs deep learning models to extract features from multimedia data, transforming them into high-dimensional feature vectors, and then calculating the similarity between these vectors to complete the retrieval task. Therefore, Approximate Nearest Neighbor Search (ANNS) technology has become the core computing engine of multimedia content retrieval systems.
[0003] Current mainstream ANNS indexing technologies mainly include spatial partitioning, hashing, product quantization, and graph-based index types. Among them, graph indexes, represented by HNSW, are widely used due to their excellent performance. However, with the surge in the scale of multimedia data (billions or even tens of billions) and the increase in vector dimensions, the construction cost and memory consumption of a single graph index have risen sharply, and query performance has also reached a bottleneck.
[0004] To address performance issues in large-scale scenarios, existing technologies have proposed two-stage hybrid indexing schemes. These schemes (such as HVS, ScaNN, SPANN, and Elpis) typically utilize spatial partitioning strategies like tree structures and clustering (such as K-means) to divide the vast multimedia data space into multiple clusters, and then build a local index within each cluster. During the query phase, the system first calculates the distance between the query vector and the centroids of each cluster, selects the closest candidate clusters, and then searches for the final result in the local indexes of these candidate clusters.
[0005] In addition, existing related technologies include classic high-dimensional indexes (such as LiteHST based on spatial partitioning, LIDER based on hashing, and RaBitQ based on product quantization), hybrid indexes (such as DIDS and SHG), and learning indexes (such as ML-index and Piecewise SFCs). However, when dealing with the complex semantic distribution of multimedia data, most of these methods still have not broken through the core logic of traditional geometric metric partitioning. Summary of the Invention
[0006] To address the problem that existing technologies still rely on geometric metric partitioning paradigms when processing large-scale multimedia data, this invention proposes a topology-aware hierarchical index construction and load balancing method for multimedia content retrieval. This method abandons the traditional partitioning paradigm that relies solely on geometric metrics, and through an innovative index construction process and query mechanism, with topology awareness and load balancing as its core, it achieves high recall, low latency, and high scalability in multimedia similarity search.
[0007] This invention is implemented using the following technical solution: a topology-aware hierarchical index construction and load balancing method for multimedia content retrieval, comprising two core processes: index construction and querying; index construction includes three stages: topology graph construction, multi-level graph partitioning based on METIS, and parallel HNSW sub-index construction. The topology graph construction stage aims to build a global nearest neighbor topology graph that can perceive the local topological relationships between data points of the multimedia content to be retrieved. This serves as the basic input for subsequent clustering algorithms; the METIS-based multi-level graph partitioning stage uses the METIS algorithm to partition the topology graph. Divided into Non-overlapping clusters Parallel HNSW sub-index stage in cluster Construct an independent HNSW subgraph Simultaneously execute the construction of sub-indexes for each cluster, ultimately obtaining a set of sub-indexes. The query process comprises three stages: feature modulation based on a gating mechanism, cluster association probability inference, and probability-driven parallel search and fusion. The feature modulation stage of the gating mechanism introduces a gated multilayer perceptron to process the query vector of the multimedia content to be retrieved. Feature enhancement is performed, and the enhanced query vector is used in the cluster association probability inference stage. Transform into cluster association probability distribution The probability-driven parallel search and fusion stage is based on cluster association probability distribution. It executes an efficient process of precise candidate cluster selection, parallel search, and result fusion.
[0008] The above-described method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval includes the following specific process in the topology graph construction phase:
[0009] (1) For the high-dimensional dataset where multimedia content is to be retrieved Build the HNSW index;
[0010] (2) Transform high-dimensional datasets Each data point in the HNSW index is executed. Nearest neighbor search is used to extract a set of approximate nearest neighbors for each data point, and then connects the data point with the data points in the approximate nearest neighbor set to form an edge.
[0011] (3) Further remove self-loop edges and isolated data points during the construction of the topology graph, and retain only high-confidence connections induced by approximate nearest neighbor relationships;
[0012] (4) Finally, the global nearest neighbor topology graph is obtained. ,in It is a high-dimensional dataset All data points, It is an edge set that captures the nearest neighbor relationships between data points.
[0013] The above-mentioned method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval, based on the METIS multi-level graph partitioning stage, specifically involves the following process:
[0014] (1) In the coarsening stage, the topology graph is processed using the Heavy Edge Matching (HEM) strategy. Merge adjacent data points to construct a series of coarser-grained maps with decreasing size;
[0015] (2) When the graph is coarsened to a sufficiently small scale, the algorithm performs K-way greedy graph growth on the coarsest-grained graph to generate the initial partition;
[0016] (3) Project the initial partition back to the topology graph layer by layer. After each layer is projected, local refinement is performed, ultimately resulting in a topological graph. The set of clusters after partitioning ;
[0017] The METIS-based multi-level graph partitioning stage simultaneously satisfies both minimum edge cut and strict load balancing constraints:
[0018] Minimum edge cut objective constraint: Minimize the sum of edge weights that cross different clusters, i.e. ,in, This represents an edge in the topological graph. The weight of that edge. This is an indicator function that takes the value 1 when the condition is true and 0 otherwise. Representing data points Cluster number to which it belongs Representing data points The cluster number to which it belongs;
[0019] Strict load balancing constraints: ,in For the allowed imbalance factor, Topology graph The total number of nodes, For the first Cluster The number of data points.
[0020] The aforementioned method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval includes the following steps in the parallel HNSW sub-index construction stage:
[0021] (1) Based on the multi-level graph partitioning results, in clusters Construct an independent HNSW subgraph This contains only the data points within the cluster and their local topological connections;
[0022] (2) Since the data between clusters does not overlap and the load is balanced, parallel working units are started to build synchronously. Each sub-index results in a final set of sub-indexes. .
[0023] The above-mentioned method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval, specifically the feature modulation stage based on a gating mechanism, is as follows:
[0024] (1) Feature extraction: First, the query vector The feature branches of the input gated neural network are used to extract their high-dimensional semantic features through a fully connected network.
[0025] (2) Gating mask generation: Simultaneously, the query vector is generated. The gating branch of the input gated neural network generates a feature selection mask through a fully connected layer and a sigmoid activation function;
[0026] (3) Feature fusion: Element-wise multiplication of high-dimensional semantic features with feature selection masks is performed to achieve fusion of the query vector. Nonlinear modulation enhances the model's sensitivity to cluster boundaries.
[0027] The above-mentioned method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval, specifically includes the following process in the cluster association probability inference stage:
[0028] (1) The nonlinearly modulated query vector Mapped through the linear output layer to dimensional vector;
[0029] (2) Then, the Softmax function is used to... Transforming a 3D vector into a cluster association probability distribution ,in Represents the query vector With HNSW subgraph The probability of topological association.
[0030] The above-mentioned method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval, specifically the probability-driven parallel search and fusion stage, is as follows:
[0031] (1) Candidate cluster selection: based on the cluster association probability distribution Select the one with the highest probability value Clusters were selected as candidate clusters;
[0032] (2) Parallel Local Search: Start Several parallel working units, the system operates in parallel on selected... Perform a local kNN search in the HNSW sub-index set corresponding to each candidate cluster;
[0033] (3) Result Fusion: Merge the search results returned by all sub-indexes, remove duplicates, perform a global sort based on the original distance, and filter out the final results. result.
[0034] The above-mentioned method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval proposes a nearest-neighbor-aware soft-label supervised training strategy to train the gated neural network model. The specific implementation steps are as follows:
[0035] Step S1: Construction of the topology-sensitive training set: using a high-dimensional dataset Data points in As training input;
[0036] Step S2: Generating soft labels based on neighborhood statistics:
[0037] (1) Neighborhood sampling: For each data point Search for it A set of real nearest neighbor data points constitutes a data point. local neighborhood set ;
[0038] (2) Distribution statistics: Statistical analysis of the local neighborhood set belonging to each cluster The number of data points;
[0039] (3) Soft label construction: Calculation ratio Generate a dimensional probability distribution vector This distribution characterizes the data points. The proportion of local nearest neighbor sets in different clusters is regarded as a soft label representation of its true neighborhood structure at the cluster level;
[0040] Step S3: Distribution alignment training based on KL divergence:
[0041] (1) Constructing the loss function: KL divergence is used as the loss function to calculate the cluster association probability distribution. Distribution of real soft labels Information difference loss ;
[0042] (2) Minimize the loss function: By minimizing this loss, the gated neural network model is forced to learn the high-dimensional manifold structure of the data rather than a simple geometric partition;
[0043] Step S4: Adaptive optimization iteration: The Adam optimizer is used to update the model parameters end-to-end; in order to further improve the convergence stability and generalization ability of the model, a cosine annealing learning rate scheduling strategy is introduced to dynamically adjust the learning rate during training to ensure that the model can escape local optima and finally obtain a robust model that can accurately map high-dimensional semantic features to cluster association probability distribution.
[0044] Compared with the prior art, the present invention has the following advantages:
[0045] (1) By using the global nearest neighbor graph and minimum edge cut partitioning, this scheme explicitly preserves the local connectivity of the data; combined with the gated neural network trained based on "soft label", the model can accurately identify the query point located at the cluster boundary and associate it with multiple related neighboring clusters with a high probability, thereby effectively avoiding "missed queries" and significantly improving the recall rate.
[0046] (2) Strict cardinality constraints are imposed during the graph partitioning phase to ensure that all The data volume of each cluster is almost exactly equal. This means that the load on each thread is highly balanced during parallel building and parallel querying, eliminating long-tail latency, maximizing the utilization of multi-core CPUs, and thus significantly improving QPS (queries per second) and build efficiency.
[0047] (3) The pre-trained gated neural network is used for inference. No matter how the number of clusters increases, the candidate cluster selection only requires one forward propagation, and the time complexity is always O(1). At the same time, since the gating mechanism filters out redundant features, the network is more lightweight and faster than traditional deep learning models.
[0048] (4) The soft label supervision mechanism enables the model to learn the probability distribution characteristics of the neighborhood, thus having a stronger generalization ability. Even if the query vector is noisy or in a complex non-convex cluster boundary region, the model can still give reasonable candidate cluster recommendations based on the probability distribution. Compared with the deterministic decision based on hard labels, it significantly reduces the probability of misjudgment in boundary scenarios and effectively improves the stability and robustness of the retrieval results. Attached Figure Description
[0049] Figure 1 A flowchart illustrating the logical process of the TALBI index building phase.
[0050] Figure 2 This is a flowchart illustrating the logic of the TALBI online query phase.
[0051] Figure 3 This is a diagram of the index structure.
[0052] Figure 4 This is a schematic diagram for model training and querying. Detailed Implementation
[0053] A topology-aware hierarchical index construction and load balancing method for multimedia content retrieval, specifically including two core processes: index construction and querying, in conjunction with... Figure 1 (Index construction flowchart) and Figure 2 (Query flowchart) are explained in detail below:
[0054] (I) Index Building Process (Appendix) Figure 1 , Figure 3 )
[0055] Index building is an offline process, consisting of three stages: topology graph construction, METIS-based multi-level graph partitioning, and parallel HNSW sub-index construction. Each step is executed sequentially and is closely linked.
[0056] Step S1: Construction of global topology graph based on HNSW acceleration
[0057] This phase aims to construct a global nearest neighbor topology graph that can perceive the local topological relationships between data points. This serves as the basic input for subsequent clustering algorithms. The specific process is as follows:
[0058] (1) In order to mine the potential local manifold structure of high-dimensional data, the high-dimensional dataset for which multimedia content is to be retrieved is first analyzed. Build an HNSW (Hierarchical Navigable Small World) index, which has Complexity.
[0059] (2) Transform high-dimensional datasets Each data point in the HNSW index is executed. Nearest neighbor search is used to efficiently extract a set of approximate nearest neighbors for each data point, and then connect the data point with the data points in the approximate nearest neighbor set to form an edge.
[0060] (3) To ensure the effectiveness of the topology, self-loop edges and isolated data points are further removed during the construction process, and only high-confidence connections induced by approximate nearest neighbor relationships are retained.
[0061] (4) Finally, the global nearest neighbor topology graph is obtained. ,in It is a high-dimensional dataset All data points, It is an edge set that captures the nearest neighbor relationships between data points.
[0062] Step S2: Graph Partitioning Based on Minimum Edge Cut and Strict Cardinality Constraints
[0063] This stage uses the METIS algorithm to visualize the topology graph. Divided into Non-overlapping clusters It simultaneously satisfies the minimum edge cut and strict load balancing constraints:
[0064] Constraint 1 (Minimum Edge Cut Objective): Minimize the sum of edge weights across different clusters, i.e. ,in, This represents an edge in the topological graph. The weight of that edge. This is an indicator function that takes the value 1 when the condition is true and 0 otherwise. Representing data points Cluster number to which it belongs Representing data points The cluster number to which it belongs.
[0065] Constraint 2 (Strict Load Balancing Constraint): ,in For the allowed imbalance factor, Topology graph The total number of nodes, For the first Cluster The number of nodes.
[0066] The specific process is as follows:
[0067] (1) In the coarsening stage, the topology graph is processed using the Heavy Edge Matching (HEM) strategy. By merging adjacent data points, a series of coarsened granularity maps with decreasing size are constructed.
[0068] (2) When the graph is coarsened to a sufficiently small scale, the algorithm performs K-way greedy graph growth on the coarsest-grained graph to generate the initial partition; specifically, when the scale of the coarsened graph is reduced to a smaller scale, the algorithm performs K-way greedy graph growth on the coarsest-grained graph to generate the initial partition. × When coarsening stops, among which Empirical parameters for controlling the coarsening stopping size.
[0069] (3) Project the initial partition back to the topology graph layer by layer. After each projection, local refinement is performed to obtain the optimized cluster set. .
[0070] Step S3: Parallel local sub-index construction
[0071] In each cluster Construct an independent HNSW subgraph This contains only data points within that cluster and their local topological connections. Parallel working units are initiated to synchronously construct sub-indexes for each cluster, ultimately resulting in a set of sub-indexes. .
[0072] (1) Based on the partitioning results in step S2, in each cluster Construct an independent HNSW subgraph It contains only the data points within the cluster and their local topological connections.
[0073] (2) Since the data between clusters does not overlap and the load is balanced, the system starts parallel working units to build synchronously. Multiple sub-indexes maximize the use of hardware parallelism and significantly reduce build time.
[0074] (II) Inquiry Process (Appendix) Figure 2 , Figure 4 )
[0075] TALBI proposes a candidate cluster selection strategy based on learned probabilities. This strategy is based on a lightweight gated neural network model, which is trained offline and performs only one forward inference during querying, thus completing candidate cluster selection in O(1) time complexity. The specific query process includes the following three stages:
[0076] Step S1: Feature modulation based on gating mechanism
[0077] The core of this stage is to introduce a gated multilayer perceptron (gMLP) to enhance the features of the query vector of the multimedia content to be retrieved, thus addressing the limitation of traditional geometric distance strategies in "failing to capture high-dimensional topological relationships." The specific process is as follows:
[0078] (1) Feature extraction. First, the query vector is... The feature branches of the input gated neural network are used to extract their high-dimensional semantic features through a fully connected network.
[0079] (2) Gating mask generation. Simultaneously, the query vector... The gating branch of the input gated neural network generates a feature selection mask through a fully connected layer and a sigmoid activation function. Elements close to 1 in the feature selection mask represent that the feature dimension has high discriminative power for cluster classification, while elements close to 0 represent redundancy.
[0080] (3) Feature fusion. The high-dimensional semantic features are multiplied element-wise with the feature selection mask to achieve feature fusion of the query vector. Nonlinear modulation enhances the model's sensitivity to cluster boundaries.
[0081] Step S2: Cluster Association Probability Inference
[0082] (1) The query vector of nonlinear modulation Mapped through the linear output layer to dimensional vector (corresponding) Logits of each cluster).
[0083] (2) Then, the Softmax function is used to... Transforming a 3D vector into a cluster association probability distribution ,in Represents the query vector With HNSW subgraph The probability of topological association.
[0084] Step S3: Probability-driven parallel search and fusion
[0085] Based on the cluster association probability distribution, an efficient workflow of "precise candidate cluster selection + parallel search + result fusion" is executed. Specifically:
[0086] (1) Candidate cluster selection: based on the cluster association probability distribution Select the one with the highest probability value The clusters are selected as candidate clusters. This process avoids the overhead of calculating the geometric distance between the query point and the centroid, as there is no need to calculate the geometric distance between the query point and the centroid.
[0087] (2) Parallel Local Search: Start Several parallel working units, the system operates in parallel on selected... Perform a local kNN search in the HNSW sub-index corresponding to each candidate cluster.
[0088] (3) Result Fusion: Merge the search results returned by all sub-indexes, remove duplicates, perform a global sort based on the original distance, and filter out the final results. result
[0089] (III) Model training process (supporting query phase performance)
[0090] To endow gated neural network models with the ability to perceive complex topological structures in high-dimensional spaces, and especially to address the problem of unclear identification at cluster boundaries in existing technologies, this invention proposes a nearest-neighbor perception soft-label supervised training strategy. This strategy abandons the black-and-white hard-label paradigm of traditional classification tasks and instead adopts a probability distribution alignment method for training. The specific implementation steps are as follows:
[0091] Step S1: Construction of the topology-sensitive training set
[0092] With high-dimensional datasets Data points in As training input, unlike traditional classification tasks that only utilize the sample's own label, this invention uses the local neighborhood information of the sample in the topological graph as a supervision signal.
[0093] Step S2: Generating soft labels based on neighborhood statistics
[0094] To address the problem of lost boundary information caused by traditional methods using "hard labels" (i.e., forcibly assigning samples to a unique geometric center cluster), this invention designs a statistically based soft label generation method to capture cross-cluster local structural characteristics. Specifically:
[0095] (1) Neighborhood sampling. For each data point Search for it A set of real nearest neighbor data points constitutes a data point. local neighborhood set .
[0096] (2) Distribution statistics. This involves statistically analyzing the distribution of data within the local neighborhood set to the clusters belonging to each cluster. The number of data points.
[0097] (3) Soft label construction. Calculation ratio. Generate a dimensional probability distribution vector This distribution characterizes the data points. The proportion of local nearest neighbors in different clusters can be regarded as a soft label representation of its true neighborhood structure at the cluster level. This soft label The data points were precisely quantified. The degree of fuzzy affiliation at different cluster boundaries. For example, if a data point at a boundary has 60% of its neighborhood belonging to cluster A and 40% belonging to cluster B, then its soft label is reflected as {cluster A: 0.6, cluster B: 0.4, others: 0}.
[0098] Step S3: Distribution Alignment Training Based on KL Divergence
[0099] To enable the gated neural network to accurately predict the aforementioned topological relationships, this invention constructs a loss function based on Kullback-Leibler (KL) divergence as the optimization objective. Specifically:
[0100] (1) Constructing the loss function. The Kullback-Leibler (KL) divergence is used as the loss function to calculate the neural network prediction distribution. Distribution of real soft labels Information difference loss .
[0101] (2) Minimize the loss function. By minimizing this loss, the gated neural network is forced to learn the high-dimensional manifold structure of the data rather than a simple geometric partition. The goal of training is to minimize this difference, forcing the output of the gated neural network to not only predict a classification result, but also to approximate the true neighborhood distribution of the sample. In this way, the model is forced to learn and "memorize" the complex boundary topology in high-dimensional space.
[0102] Step S4: Adaptive optimization iteration.
[0103] The Adam optimizer is used to update the model parameters end-to-end. To further improve the model's convergence stability and generalization ability, a cosine annealing learning rate scheduling strategy is introduced. The learning rate is dynamically adjusted during training to ensure that the model can escape local optima, ultimately obtaining a robust model that can accurately map high-dimensional semantic features to cluster association probability distributions.
[0104] This invention aims to protect a method for constructing, querying, and training a Topology Awareness and Load Balancing Index (TALBI) for high-dimensional similarity search. This method combines several original technical features:
[0105] (1) An index construction method based on an approximate global graph and strict equilibrium constraints:
[0106] Unlike traditional partitioning based on geometric space (such as K-means, hyperplane), this invention proposes a strategy of "constructing the graph first, then partitioning".
[0107] (a) Use HNSW to accelerate the construction of an approximate global nearest neighbor graph, replacing the high-cost exact graph construction;
[0108] (b) The Multilevel Graph Partitioning (METIS) algorithm is adopted, and the dual objectives of "minimizing edge cuts" and "strict cardinality constraints" are combined. That is, while ensuring that the nearest neighbor edges cut are minimized (preserving the topology), the number of nodes in each cluster is forced to be equal within a very small error range (achieving load balancing).
[0109] (2) A candidate cluster selection method based on a lightweight gated neural network:
[0110] Unlike traditional geometric routing methods that calculate the distance from a query point to its centroid, this invention uses a deep learning model to predict topology affiliation.
[0111] (a) Design a two-stream neural network architecture that includes feature branches and gating branches.
[0112] (b) A gating mask mechanism is introduced. Through element-wise product of feature branches and gating branches, non-linear modulation of the query vector features is achieved to enhance the model's ability to suppress irrelevant dimensions and its sensitivity to discriminative features. The output layer directly generates features targeting... The posterior probability distribution of each cluster is used to guide the selection of candidate clusters.
[0113] (3) A model training strategy based on nearest neighbor perception soft labels:
[0114] This addresses the issue of decreased recall caused by hard classification ("black or white") when the query point is located at the cluster boundary.
[0115] (a) Soft label generation: Based on the distribution ratio of the local neighborhood of the training sample in each cluster in the global graph, a probability distribution vector is constructed as a supervision signal (e.g., {cluster A: 0.6, cluster B: 0.4}), instead of a single hard label ({cluster A: 1.0}).
[0116] (b) Loss function: KL divergence is used as the loss function to minimize the difference between the predicted distribution and the true neighborhood distribution, forcing the model to learn the high-dimensional manifold structure of the data.
[0117] (4) End-to-end parallel index and query architecture:
[0118] A cluster structure based on load balancing enables full-process parallelization.
[0119] During the construction phase, non-overlapping local HNSW sub-indexes are built in parallel; during the query phase, the first index is selected based on probability. After selecting candidate clusters, start in parallel. Each thread searches and merges the results in the corresponding sub-index.
Claims
1. A method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval, characterized in that: The process includes two core steps: index building and querying. Index building comprises three stages: topology graph construction, multi-level graph partitioning based on METIS, and parallel HNSW sub-index construction. The topology graph construction stage aims to build a global nearest neighbor topology graph that can perceive the local topological relationships between data points in the multimedia content to be retrieved. This serves as the basic input for subsequent clustering algorithms; the METIS-based multi-level graph partitioning stage uses the METIS algorithm to partition the topology graph. Divided into Non-overlapping clusters Parallel HNSW sub-index stage in cluster Construct an HNSW subgraph independently Simultaneously execute the construction of sub-indexes for each cluster, ultimately obtaining a set of sub-indexes. The query process comprises three stages: feature modulation based on a gating mechanism, cluster association probability inference, and probability-driven parallel search and fusion. The feature modulation stage of the gating mechanism introduces a gated multilayer perceptron to process the query vector of the multimedia content to be retrieved. Feature enhancement is performed, and the enhanced query vector is used in the cluster association probability inference stage. Transform into cluster association probability distribution The probability-driven parallel search and fusion stage is based on cluster association probability distribution. It executes an efficient process of precise candidate cluster selection, parallel search, and result fusion.
2. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 1, characterized in that: The specific process of the topology graph construction phase is as follows: (1) For the high-dimensional dataset where multimedia content is to be retrieved Build the HNSW index; (2) Transform high-dimensional datasets Each data point in the HNSW index is executed. Nearest neighbor search is used to extract a set of approximate nearest neighbors for each data point, and then connects the data point with the data points in the approximate nearest neighbor set to form an edge. (3) Further remove self-loop edges and isolated data points during the construction of the topology graph, and retain only high-confidence connections induced by approximate nearest neighbor relationships; (4) Finally, the global nearest neighbor topology graph is obtained. ,in It is a high-dimensional dataset All data points, It is an edge set that captures the nearest neighbor relationships between data points.
3. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 2, characterized in that: The specific process of the multi-level graph partitioning stage based on METIS is as follows: (1) In the coarsening stage, the topology graph is processed using the Heavy Edge Matching strategy. Merge adjacent data points to construct a series of coarsened granularity maps with decreasing size; (2) When the graph is coarsened to a sufficiently small scale, the algorithm performs K-way greedy graph growth on the coarsest-grained graph to generate the initial partition; (3) Project the initial partition back to the topology graph layer by layer. After each projection, local refinement is performed to obtain the optimized cluster set. ; The METIS-based multi-level graph partitioning stage simultaneously satisfies both minimum edge cut and strict load balancing constraints: Minimum edge cut objective constraint: Minimize the sum of edge weights that cross different clusters, i.e. ,in, This represents an edge in the topological graph. The weight of that edge. This is an indicator function that takes the value 1 when the condition is true and 0 otherwise. Representing data points Cluster number to which it belongs Representing data points The cluster number to which it belongs; Strict load balancing constraints: ,in For the allowed imbalance factor, Topology graph The total number of nodes, For the first Cluster The number of data points.
4. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 3, characterized in that: The parallel HNSW sub-index construction phase specifically includes the following steps: (1) Based on the multi-level graph partitioning results, in clusters Construct an HNSW subgraph independently This contains only the data points within the cluster and their local topological connections; (2) Since the data between clusters does not overlap and the load is balanced, parallel working units are started to build synchronously. Each sub-index results in a final set of sub-indexes. .
5. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 4, characterized in that: The specific process of the feature modulation stage based on the gating mechanism is as follows: (1) Feature extraction: First, the query vector The feature branches of the input gated neural network are used to extract their high-dimensional semantic features through a fully connected network. (2) Gating mask generation: Simultaneously, the query vector is generated. The gating branch of the input gated neural network generates a feature selection mask through a fully connected layer and a sigmoid activation function; (3) Feature fusion: Element-wise multiplication of high-dimensional semantic features with feature selection masks is performed to achieve fusion of the query vector. Nonlinear modulation enhances the model's sensitivity to cluster boundaries.
6. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 5, characterized in that: The specific process of the cluster association probability inference stage is as follows: (1) The nonlinearly modulated query vector Mapped to via the linear output layer dimensional vector; (2) Then, the Softmax function is used to... Transforming a 3D vector into a cluster association probability distribution ,in Represents the query vector With HNSW subgraph The probability of topological association.
7. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 6, characterized in that: The specific process of the probability-driven parallel search and fusion stage is as follows: (1) Candidate cluster selection: based on the cluster association probability distribution Select the one with the highest probability value Clusters were selected as candidate clusters; (2) Parallel Local Search: Start Several parallel working units, the system operates in parallel on selected... Perform a local kNN search in the HNSW sub-index set corresponding to each candidate cluster; (3) Result Fusion: Merge the search results returned by all sub-indexes, remove duplicates, perform a global sort based on the original distance, and filter out the final results. result.
8. The method for constructing and load balancing a topology-aware hierarchical index for multimedia content retrieval according to claim 7, characterized in that: A nearest-neighbor perception soft-label supervised training strategy is proposed to train the gated neural network model. The specific implementation steps are as follows: Step S1: Construction of the topology-sensitive training set: using a high-dimensional dataset Data points in As training input; Step S2: Generating soft labels based on neighborhood statistics: (1) Neighborhood sampling: For each data point Search for it A set of real nearest neighbor data points constitutes a data point. local neighborhood set ; (2) Distribution statistics: Statistical analysis of the local neighborhood set belonging to each cluster The number of data points; (3) Soft label construction: Calculation ratio Generate a dimensional probability distribution vector This distribution characterizes the data points. The proportion of local nearest neighbor sets in different clusters is regarded as a soft label representation of its true neighborhood structure at the cluster level; Step S3: Distribution alignment training based on KL divergence: (1) Constructing the loss function: KL divergence is used as the loss function to calculate the cluster association probability distribution. Distribution of real soft labels Information difference loss ; (2) Minimize the loss function: By minimizing this loss, the gated neural network model is forced to learn the high-dimensional manifold structure of the data rather than a simple geometric partition; Step S4: Adaptive optimization iteration: The Adam optimizer is used to update the model parameters end-to-end; in order to further improve the convergence stability and generalization ability of the model, a cosine annealing learning rate scheduling strategy is introduced to dynamically adjust the learning rate during training to ensure that the model can escape local optima and finally obtain a robust model that can accurately map high-dimensional semantic features to cluster association probability distribution.