Vector retrieval method, device, computer equipment, readable storage medium and program product

By combining a single-layer hierarchical navigable small-world graph and a local inverted file planar vector index in the vector retrieval method, and dynamically adjusting search parameters and topologically expanding neighbor clusters, the problems of high memory consumption and insufficient recall precision in high-dimensional space are solved, thus achieving efficient vector retrieval.

CN121542272BActive Publication Date: 2026-06-16CHINA TELECOM CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA TELECOM CLOUD TECH CO LTD
Filing Date
2026-01-19
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing vector retrieval methods suffer from high memory consumption and insufficient recall precision in high-dimensional spaces. In particular, under large-scale datasets, the query latency of the IVFFlat index increases and the memory overhead of the HNSW index is huge. Hybrid indexing schemes have failed to effectively solve these problems.

Method used

We employ a global vector index based on a single-layer hierarchical navigable small-world graph for initial cluster search, and combine it with a local inverted file planar vector index for expanded search. By dynamically adjusting search parameters and expanding neighboring clusters through topology, we improve recall accuracy and reduce memory usage.

🎯Benefits of technology

It significantly improves recall accuracy and search efficiency in large-scale vector spaces, reduces memory consumption, and solves the problems of insufficient memory consumption and recall accuracy in high-dimensional spaces that exist in traditional indexing methods.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121542272B_ABST
    Figure CN121542272B_ABST
Patent Text Reader

Abstract

The application relates to a vector retrieval method and device, computer equipment, a computer readable storage medium and a computer program product. The method comprises the following steps: receiving a query vector; performing a search on a global vector index of a vector index according to the query vector based on a search parameter to obtain an initial cluster set; wherein the global vector index is determined based on a single-layer hierarchical navigable small world graph; the search parameter is used to control the size of the initial cluster set; in the case that the vector distribution in the initial cluster set does not satisfy a preset condition, performing an expansion search in the global vector index starting from the initial cluster set to obtain each adjacent cluster; determining a target cluster set based on the initial cluster set and each adjacent cluster; performing a search on the target cluster set in a local vector index of the vector index to obtain a target vector, wherein the local vector index is determined based on an inverted file plane vector index. The method can improve the vector recall precision.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a vector retrieval method, apparatus, computer device, computer-readable storage medium, and computer program product. Background Technology

[0002] With the rapid development of big data and artificial intelligence technologies, vector retrieval has become a core technical component in key applications such as modern vector databases, search engines, recommendation systems, and multimedia analytics. In these application scenarios, systems typically need to achieve millisecond-level similarity searches from massive datasets containing millions or even billions of high-dimensional vectors. Faced with such massive data volumes and stringent performance requirements, efficient vector indexing technology is crucial.

[0003] Among current mainstream indexing methods, Inverted File with Flat Index (IVFFlat) indexing and Hierarchical Navigable Small World (HNSW) indexing are widely used, but they also reveal significant limitations. IVFFlat vector indexing technology has the advantage of high memory efficiency, but when the number of cluster centers (nlist) becomes extremely large and uneven, query latency increases significantly, key clusters are missed, and the recall accuracy of vector retrieval is severely affected. HNSW vector indexing technology has the advantage of fast retrieval speed, but its memory overhead is huge and its performance is unstable in high-dimensional spaces. When faced with a large number of vectors, memory consumption is high and the accuracy of less frequently accessed queries is insufficient.

[0004] Traditional hybrid vector indexing schemes (such as IVF+HNSW in the FAISS library) attempt to combine the advantages of both technologies. However, simply using the full HNSW index on the IVFFlat cluster center to accelerate candidate cluster selection fails to fundamentally solve the above problems and instead introduces new defects: resource waste due to static parameters and poor vector recall accuracy due to missing vectors. Summary of the Invention

[0005] Therefore, it is necessary to provide a vector retrieval method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve the accuracy of vector recall in order to address the above-mentioned technical problems.

[0006] Firstly, this application provides a vector retrieval method, the method comprising:

[0007] Receive query vector;

[0008] An initial cluster set is obtained by searching based on the query vector and the global vector index of the vector index according to the search parameters; wherein, the global vector index is determined based on a single-layer hierarchical navigable small-world graph; the search parameters are used to control the size of the initial cluster set.

[0009] If the vector distribution in the initial cluster set does not meet the preset conditions, an extended search is performed in the global vector index, starting from the initial cluster set, to obtain each neighboring cluster.

[0010] Based on the initial cluster set and each of the neighboring clusters, the target cluster set is determined;

[0011] A search is performed on the target cluster in the local vector index of the vector index to obtain the target vector, wherein the local vector index is determined based on the inverted file plane vector index.

[0012] In one embodiment, if the vector distribution in the initial cluster set does not meet a preset condition, before performing an expanded search in the global vector index, starting from the initial cluster set, the method further includes:

[0013] Calculate the variance of the distance from the query vector to each cluster center in the initial cluster set;

[0014] If the distance variance is less than the distance threshold, it is determined that the vector distribution in the initial cluster does not meet the preset conditions.

[0015] In one embodiment, performing a search on the target cluster set in the local vector index of the vector index to obtain the target vector includes:

[0016] For each cluster in the target cluster set, calculate the distance between the query vector and each vector in the cluster in the local vector index corresponding to the cluster;

[0017] Determine the initial vector for each cluster based on the distances described for each cluster;

[0018] Merge the initial vectors of each of the clusters to form an initial vector set;

[0019] The initial vectors in the initial vector set are sorted to obtain the target number of target vectors.

[0020] In one embodiment, the method for determining the search parameters includes:

[0021] Obtain the current state of the query vector for this query, the current state including query complexity, average recall rate and average latency;

[0022] Based on the query complexity, the average recall rate, and the average latency, select a current action from the preset action space;

[0023] The current action is used as the search parameter for this query.

[0024] In one embodiment, the search parameters are updated in the following ways:

[0025] Obtain the actual recall and actual latency for this query;

[0026] The reward for this query is calculated based on the actual recall rate and the actual latency.

[0027] Based on the current state, the current action, the reward, and the next state, determine the next action corresponding to the next state;

[0028] Use the next action as the updated search parameter.

[0029] In one embodiment, the method for determining the vector index includes:

[0030] Obtain the vector dataset;

[0031] The vector dataset is processed using cluster analysis to obtain a set of cluster centers;

[0032] A global vector index for a single-layer hierarchical navigable small-world graph is constructed based on cluster center sets;

[0033] For each cluster center in the cluster center set, a local vector index is constructed.

[0034] In one embodiment, the construction of a single-layer hierarchical navigable small-world graph global vector index based on cluster center sets includes:

[0035] Create an initial graph structure, which includes nodes and edges connecting the nodes; wherein the initial graph structure is an empty graph structure, and the nodes are used to store cluster centers;

[0036] Starting from any node of the initial graph structure, traverse each cluster center in the cluster center set and insert each cluster center into each node of the initial graph structure.

[0037] In one embodiment, constructing a local vector index for each cluster center in the cluster center set includes:

[0038] For each cluster center, obtain the vector set of the cluster corresponding to that cluster center;

[0039] Obtain the storage space for each of the vector sets;

[0040] The vectors in each of the vector sets are copied to the corresponding storage space, using local vector indexing.

[0041] In one embodiment, the vector index is updated in the following way:

[0042] When the update method is to add a vector, the new vector is obtained, and the target cluster center of the new vector is found in the global vector index. The target cluster is determined based on the target cluster center.

[0043] Insert the newly added vector into the local vector index corresponding to the target cluster;

[0044] Update the target cluster center of the target cluster into which the new vector is inserted;

[0045] If the updated target cluster center meets the preset conditions, the target node corresponding to the updated target cluster center in the global vector index and the edges connecting the target node are recalculated.

[0046] In one embodiment, before recalculating the target node corresponding to the updated target cluster center in the global vector index and the edges connecting the target node when the updated target cluster center meets the preset conditions, the method further includes:

[0047] Obtain the target cluster center of the target cluster;

[0048] If the offset between the updated target cluster center and the original target cluster center exceeds a preset offset threshold, then the updated target cluster center is determined to meet the preset condition.

[0049] In one embodiment, the vector index is updated in the following way:

[0050] When the update method is to delete a vector, obtain the deletion instruction for the target vector;

[0051] According to the deletion instruction, the target vector is marked as deleted in the local vector index corresponding to the cluster to which the target vector belongs;

[0052] Vectors marked as deleted based on scheduled tasks;

[0053] If the number of vectors marked as deleted reaches a certain threshold, a local vector index reconstruction operation is performed on the cluster containing vectors marked as deleted.

[0054] Secondly, this application provides a vector retrieval device, the device comprising:

[0055] The receiving module is used to receive query vectors;

[0056] The first search module is used to perform a search based on the query vector and the global vector index of the vector index according to the search parameters to obtain an initial cluster set; wherein, the global vector index is determined based on a single-layer hierarchical navigable small-world graph; the search parameters are used to control the size of the initial cluster set;

[0057] An extended search module is used to perform an extended search in the global vector index, starting from the initial cluster set, to obtain each neighboring cluster when the vector distribution in the initial cluster set does not meet the preset conditions.

[0058] The determination module is used to determine the target cluster set based on the initial cluster set and each of the neighboring clusters;

[0059] The second search module is used to perform a search on the target cluster in the local vector index of the vector index to obtain the target vector, wherein the local vector index is determined based on the inverted file plane vector index.

[0060] Thirdly, this application provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described method.

[0061] Fourthly, this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method.

[0062] Fifthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the above-described method.

[0063] The aforementioned vector retrieval method, apparatus, computer device, computer-readable storage medium, and computer program product first receive a query vector; then, based on search parameters, perform a search using a global vector index derived from the query vector and the vector index to obtain an initial cluster set; wherein, the global vector index is determined based on a single-layer hierarchical navigable small-world graph; the search parameters are used to control the size of the initial cluster set; in the global layer, a single-layer HNSW structure is used to index the cluster centers, retaining only basic connections, to achieve efficient coarse-grained candidate cluster screening, avoiding the high memory consumption and graph construction complexity caused by multi-level HNSW structures, and simultaneously utilizing the nearest neighbor graph navigation capability of HNSW to achieve rapid screening of candidate clusters, significantly shortening the initial retrieval path and improving the efficiency of coarse-grained search in a large-scale vector space. Secondly, if the vector distribution in the initial cluster set does not meet the preset conditions, an extended search is performed in the global vector index, starting from the initial cluster set, to obtain neighboring clusters. Based on the initial cluster set and neighboring clusters, the target cluster set is determined. When the number of initial clusters in the global layer is insufficient or the distribution is sparse, neighboring clusters are expanded through HNSW graph topology to alleviate the missed detection problem caused by the fuzzy clustering boundaries and further improve recall accuracy. Finally, a search is performed on the target cluster set in the local vector index of the vector index to obtain the target vector. The local vector index is determined based on the inverted file plane vector index. In the local layer, an independent plane vector storage is maintained for each cluster based on the IVFFlat index, further integrating the advantages of the IVFFlat index and using IVFFlat to ensure the accuracy of the local search while significantly reducing memory consumption. Attached Figure Description

[0064] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0065] Figure 1 This is a flowchart illustrating a vector retrieval method in one embodiment;

[0066] Figure 2 This is a schematic diagram of the process for obtaining the target vector in one embodiment;

[0067] Figure 3 This is a flowchart illustrating the method for updating search parameters in one embodiment;

[0068] Figure 4 This is a flowchart illustrating how a vector index is determined in one embodiment;

[0069] Figure 5This is a flowchart illustrating the method for updating a vector index in one embodiment;

[0070] Figure 6 This is a flowchart illustrating the vector index update method in another embodiment;

[0071] Figure 7 This is a flowchart illustrating the vector retrieval method in another embodiment;

[0072] Figure 8 This is a structural block diagram of a vector retrieval device in one embodiment;

[0073] Figure 9 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0074] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0075] In one embodiment, such as Figure 1 As shown, a vector retrieval method is provided. This embodiment illustrates the application of this method to a terminal. It is understood that this method can also be applied to a server, and further to a system including both a terminal and a server, and is implemented through the interaction between the terminal and the server. In this embodiment, the method includes the following steps S102 to S110:

[0076] Step S102: Receive the query vector.

[0077] Optionally, the terminal receives the input query vector. , ,in Let represent the d-dimensional real space.

[0078] Step S104: Based on the search parameters, perform a search using the global vector index of the query vector and the vector index to obtain the initial cluster set.

[0079] The global vector index is pre-determined based on a single-layer hierarchical navigable small-world graph (HNSW) index. The search parameter efSearch controls the size of the initial cluster. The search parameter efSearch is preset and is automatically optimized after each retrieval to ensure that the search parameters used in the next retrieval are correct, thus achieving a balance between recall accuracy and retrieval efficiency.

[0080] Optionally, the terminal uses a global vector index, such as an HNSW graph index, at pre-determined cluster centers. Search of The nearest cluster centers are used to obtain the candidate cluster index set, as shown in formula (1). This process is performed on only the nearest cluster centers. Executing on a single-layer graph with a few nodes results in a fast search speed (typically <5ms), far superior to the original... Direct search on the vector.

[0081] Formula (1)

[0082] In the formula, Represents the candidate cluster index set.

[0083] Step S106: If the vector distribution in the initial cluster set does not meet the preset conditions, start from the initial cluster set and perform an extended search in the global vector index to obtain each neighboring cluster.

[0084] Optionally, if the vector distribution in the initial cluster set does not meet the preset conditions, the terminal takes the initial cluster set as the starting point and performs an extended search in the global vector index to obtain each neighboring cluster.

[0085] Furthermore, if If the vector distribution is sparse, neighboring clusters are added. If the vector distribution in the initial cluster does not meet the preset conditions, the terminal determines the query vector. Located at the boundary of multiple clusters, there is a risk of missed detection, triggering an expanded search: Starting from each center, perform a local breadth-first search (BFS) on the HNSW graph, with at most one supplement. For example, 8 neighboring clusters.

[0086] Step S108: Determine the target cluster set based on the initial cluster set and each neighboring cluster.

[0087] Optionally, the terminal determines the target cluster set based on the initial cluster set and each neighboring cluster. As shown in formula (2).

[0088] Formula (2)

[0089] In the formula, the target cluster Size not exceeding 32.

[0090] Traditional IVF uses a fixed nprobe count, which leads to a sharp drop in recall in boundary regions, potentially missing key vectors that are relevant to the query but belong to sparse or distant clusters. Dynamically expanding the initial cluster set can significantly improve the recall of boundary queries while avoiding the performance waste caused by expanding the search scope for all queries.

[0091] Step S110: Perform a search on the target cluster set in the local vector index of the vector index to obtain the target vector.

[0092] The local vector index is predetermined based on the inverted file plane vector index.

[0093] Optionally, the terminal performs a search on the target cluster in the local vector index IVFFlat of the vector index, returns the nearest neighbor result, and obtains the target vector.

[0094] In the aforementioned vector retrieval method, the first step is to receive a query vector. Based on search parameters, a global vector index is searched using the query vector and the vector index to obtain an initial cluster set. The global vector index is determined based on a single-layer hierarchical navigable small-world graph. Search parameters control the size of the initial cluster set. A single-layer HNSW structure is used in the global layer to index the cluster centers, retaining only basic connections. This enables efficient coarse-grained candidate cluster screening, avoiding the high memory consumption and graph construction complexity of multi-layer HNSW structures. Simultaneously, the nearest neighbor graph navigation capability of HNSW is utilized to achieve rapid screening of candidate clusters, significantly shortening the initial retrieval path and improving the efficiency of coarse-grained search in a large-scale vector space. Secondly, if the vector distribution in the initial cluster set does not meet preset conditions, an expansion search is performed in the global vector index, starting from the initial cluster set, to obtain neighboring clusters. Based on the initial cluster set and neighboring clusters, the target cluster set is determined. When the number of initial clusters in the global layer is insufficient or their distribution is sparse, neighboring clusters are expanded through HNSW graph topology expansion to alleviate the missed detection problem caused by blurred cluster boundaries, further improving recall accuracy. Finally, a search is performed on the target cluster set in the local vector index of the vector index to obtain the target vector. The local vector index is determined based on the inverted file plane vector index. At the local level, an independent plane vector storage is maintained for each cluster based on the IVFFlat index, which further integrates the advantages of the IVFFlat index. IVFFlat is also used to ensure the accuracy of the local search, while significantly reducing memory usage.

[0095] In an exemplary embodiment, if the vector distribution in the initial cluster set does not meet the preset conditions, before performing an expanded search in the global vector index starting from the initial cluster set, the method further includes: calculating the distance variance from the query vector to each cluster center in the initial cluster set; if the distance variance is less than the distance threshold, it is determined that the vector distribution in the initial cluster set does not meet the preset conditions.

[0096] Optionally, terminal computing to the cluster centers in the initial cluster set Distance variance If the variance of this distance is less than the threshold If the value is 0.05 (indicating that the center is too concentrated), it is determined that the vector distribution in the initial cluster does not meet the preset conditions. At this point, the terminal determines the query vector. Located at the boundary of multiple clusters, there is a risk of missed detection, so the initial cluster set needs to be expanded.

[0097] In this embodiment, fuzzy queries located at the boundaries of multiple clusters are identified by analyzing the variance of the distance from the query vector to the initial cluster center. When high distribution sparsity is detected, a local breadth-first search (BFS) on the HNSW graph is automatically triggered to expand neighboring clusters, effectively alleviating the problem of missed detections at cluster boundaries in traditional IVF methods and significantly improving the overall recall rate in complex distribution scenarios.

[0098] In one exemplary embodiment, such as Figure 2 As shown, a search is performed on the target cluster set in the local vector index of the vector index to obtain the target vector, including the following steps S202 to S208. Wherein:

[0099] Step S202: For each cluster in the target cluster set, calculate the distance between the query vector and each vector in the cluster in the local vector index corresponding to the cluster.

[0100] Optionally, the terminal targets the cluster. For each cluster, the distance between the query vector and each vector in the cluster is calculated in the local vector index corresponding to the cluster: traverse all vectors stored in the cluster and calculate the distance di between the query vector q and each vector xi. The distance metric can be a vector distance metric such as L2 distance, inner product or cosine similarity.

[0101] Step S204: Determine the initial vector of each cluster based on the distances of each cluster.

[0102] Optionally, the terminal locally sorts the vectors within each cluster according to the distances, in ascending order of distance; it selects the L vectors with the smallest distances as the initial vectors of that cluster, forming a local Top-L candidate cluster set. The number of distances is equal to the number of vectors in each cluster.

[0103] Step S206: Merge the initial vectors of each cluster to form an initial vector set.

[0104] Optionally, the terminal traverses all clusters and adds all vectors from the candidate cluster set corresponding to each cluster to the global candidate pool to form an initial vector set.

[0105] Step S208: Sort the initial vectors in the initial vector set to obtain the target number of target vectors.

[0106] Optionally, the terminal obtains the distance between each initial vector in the initial vector set and the query vector (if it has been calculated and cached in the previous steps, it can be used directly); sorts all initial vectors according to the distance values ​​from smallest to largest; selects the top K vectors after sorting as the final target vectors, i.e., the global Top-K approximate nearest neighbor result; and outputs the target vector set of the target number of target vectors, where K is the number of targets, which can be set freely according to the actual situation.

[0107] In this embodiment, by performing precise linear scans (brute-force search) within each cluster, the precision loss caused by quantization or compression is avoided, ensuring that the most realistic nearest neighbor vector is found within a local range, thus guaranteeing retrieval accuracy. At the same time, the local search processes of each cluster are independent of each other, naturally supporting parallel processing, and can be executed simultaneously on different CPU cores or different computing nodes, significantly improving the overall retrieval throughput.

[0108] In an exemplary embodiment, the method for determining search parameters includes: obtaining the current state of the query vector for this query, the current state including query complexity, average recall rate, and average latency; selecting a current action from a preset action space based on the query complexity, average recall rate, and average latency; and using the current action as the search parameter used for this query.

[0109] Optionally, the terminal uses a Q-learning-based adaptive parameter engine to dynamically adjust the HNSW search parameter efSearch, achieving performance self-optimization. This engine consists of three parts: a state space, an action space, and a reward function. Specifically:

[0110] (1) State space : Describes the current query context, including query complexity and average recall (e.g., recent queries). Average recall rate per query, average latency (most recent) The state space consists of three parts: the average latency of each query or the P95 latency. As shown in formula (3).

[0111] Formula (3)

[0112] in, The query complexity of a query vector is defined as the query vector itself. To its The variance of the distance to the nearest cluster center reflects the fuzziness of the local distribution of the query vector in the cluster space: the smaller the variance, the more likely it is that the query vector is fuzzy in the cluster space. When multiple centers are close together and located in the cluster boundary region, retrieval is difficult; the larger the variance, the clearer the affiliation and the easier the retrieval. Indicates recent The average recall rate (sliding window) for each query reflects the quality of recent searches; the higher the value, the better the search quality. Indicates recent The average latency per query (or P95 latency) reflects the current system load level; a higher value indicates higher latency. The state space can be discretized by bucketing (e.g., low / medium / high variance, recall and latency classification) to control its size and avoid dimensionality explosion.

[0113] (2) Action space Define the set of optional efSearch parameter values, as shown in formula (4).

[0114] Formula (4)

[0115] In the formula, The number of efSearch values. Representative actions, typically ranging in size from 10 to 200, for example... .

[0116] (3) Reward function The overall performance of this query is quantified as shown in formula (5).

[0117] Formula (5)

[0118] In the formula, This indicates the recall rate of this query. This indicates the delay for this query (or P95 delay). For example, weighting coefficients Prioritizing accuracy, this function guides the system to automatically balance high recall with low latency.

[0119] (4) Initialize the Q-table as shown in formula (6).

[0120] Formula (6)

[0121] In the formula, s is the state space. The state in, It is the action space The actions within.

[0122] By initializing Q-table, the retrieval parameter efSearch can be dynamically adjusted to achieve performance self-optimization.

[0123] Optionally, the current state of the query vector for this query is obtained, including query complexity, average recall, and average latency; wherein, the terminal retrieves the Z nearest cluster centers of the query vector q through a single-layer HNSW index, and calculates the set of distances from q to these cluster centers. Calculate the distance variance Determine the query complexity. The terminal maintains a sliding window of size D, recording the recall rates of the most recent D queries, and calculates the average recall rate within the sliding window. The terminal maintains a sliding window of the same size, records the response latency of the most recent D queries, and calculates the average or P95 value of the query latency within the sliding window. .

[0124] Optionally, based on historical average recall rate Acquisition ( Regarding the issue of obtaining the recall rate for this query, one feasible approach is to construct a high-precision reference index (such as a Flat Index based on brute-force search) for the input query. Perform an exact nearest neighbor search to obtain the true Top-Nearest Neighbors. Result set Then obtain the approximate Top- retrieval results returned by the current retrieval system. Result set Calculate recall rate: This allows us to obtain the recall rate for a single query. The final historical recall rate is dynamically updated using a sliding window or exponentially weighted average method, serving as part of the reinforcement learning state to guide the adaptive adjustment of the efSearch parameters.

[0125] Furthermore, the terminal maps continuous current states to discrete current states, such as st = (low variance, medium recall, high latency), for example, ... It is divided into three levels: low variance (<0.02), medium variance (0.02-0.05), and high variance (>0.05); It is divided into four levels: Poor (<0.85), Average (0.85-0.92), Good (0.92-0.97), and Excellent (>0.97); The latency is divided into three levels: low latency (<50ms), medium latency (50-100ms), and high latency (>100ms). The terminal selects the current action from the preset action space based on its current state (st). The selected current action This will be used as the efSearch parameter value for this query.

[0126] In one exemplary embodiment, such as Figure 3As shown, the update method for the search parameters includes steps S302 to S308. Wherein:

[0127] Step S302: Obtain the actual recall rate and actual latency of this query.

[0128] Optionally, after determining the search parameters used in this query, the terminal performs the search using those parameters. Upon completion of the search, the terminal obtains the actual recall rate for this query. Compared with actual delay .

[0129] Step S304: Calculate the reward for this query based on the actual recall rate and the actual latency.

[0130] Optionally, the terminal uses formula (5) based on the actual recall rate. Compared with actual delay To determine the award .

[0131] Step S306: Based on the current state, current action, reward, and next state, determine the next action corresponding to the next state.

[0132] Optionally, the terminal updates the Q-table based on the reward of this query using formula (7).

[0133] Formula (7)

[0134] In the formula, Represents the current state The Q value (i.e., expected cumulative reward) corresponding to the current action at is taken. The learning rate controls the degree to which new information affects the old Q-value, and its value ranges from 0 to 1. The larger the learning rate, the greater the update magnitude. This is a discount factor, representing the degree of importance attached to future rewards, with a value ranging from 0 to 1. The closer it is to 1, the more importance is placed on future rewards. Represents the current state. It represents the current action. This represents the next state. This represents the award. This represents an assignment operation, indicating that the Q value on the left is updated with the calculation result on the right. This represents the maximum Q value for the next state. Represents the possible actions for the next state.

[0135] Step S308: Use the next action as the updated search parameters.

[0136] Optionally, the terminal uses formula (8) to take the next action as the updated search parameter. The recommended efSearch value for the next query is then output.

[0137] Formula (8)

[0138] In the formula, These represent the recommended parameters for the next query. The parameter representing the maximum value is the action that produces the maximum value from the input parameter that maximizes the function value. It is the possible action of the next state, traversing the action space. All possible efSearch values.

[0139] In this embodiment, the efSearch value is innovatively obtained through reinforcement learning. efSearch is the size of the candidate set during HNWS index search. Traditional systems rely on manual parameter tuning, which is difficult to adapt to dynamic query loads. This engine automatically identifies "fuzzy queries" and improves efSearch through online learning, while reducing parameters for "explicit queries" to save resources, achieving dynamic adaptive optimization and reducing operational costs. Obtaining historical P95 latency is relatively simple, as mature vector databases all have this metric. Rewards are used to update the reinforcement learning model online, adjusting its strategy to provide more accurate efSearch parameters for the retrieval phase, gradually achieving a dynamic balance between retrieval accuracy and response latency.

[0140] In one exemplary embodiment, such as Figure 4 As shown, the method for determining the vector index includes steps S402 to S408. Wherein:

[0141] Step S402: Obtain the vector dataset.

[0142] Optionally, before receiving the query vector, the terminal obtains an initial vector dataset and performs normalization processing on the initial vector dataset, such as Euclidean distance L2 normalization, to obtain the vector dataset. ,in The total number of vectors d-dimensional real space, ensuring To unify the units of measurement and improve the stability of subsequent clustering and distance calculation.

[0143] Step S404: Process the vector dataset based on cluster analysis to obtain the cluster center set.

[0144] Optionally, the terminal processes the vector dataset X based on clustering analysis methods such as K-Means++ clustering to generate... Cluster centers Where C is the set of cluster centers, j takes values ​​from 1 to M, and M is a positive integer greater than 1. The value is usually set to That is, when hour, The terminal processes each vector according to formula (9). It is assigned to its nearest cluster center.

[0145] Formula (9)

[0146] In the formula, The assignment function; arg min represents the parameter that minimizes the value of the expression; Represents the j-th cluster center; Represents the i-th vector in the dataset; Representative vector and The square of the Euclidean distance between them.

[0147] It is worth noting that the settings The empirically optimal balance is that too few clusters will result in each cluster being too large, leading to high IVF search overhead; too many clusters will increase the complexity of HNSW coarse screening. It offers a good trade-off between accuracy and efficiency. Cluster centers represent clusters; one cluster center corresponds to one cluster, and a cluster is the set of vectors belonging to that cluster center. The number of clusters is the same as the number of cluster centers.

[0148] Step S406: Construct a global vector index for a single-layer hierarchical navigable small-world graph based on the cluster center set.

[0149] Optionally, the terminals are clustered at the cluster center. Build an HNSW index, but only set it to a single-level graph structure (set the number of levels). This means that multi-level sampling and hierarchical mapping are not performed. The entry point can be set as the global cluster center. Or any cluster center.

[0150] In cluster center set A single-layer HNSW graph structure is constructed above, serving as a coarse-grained index for the global vector index. Specific parameter settings are as follows: Distance metric: L2 distance; Average number of neighbors per node. ; Constructing parameter efConstruction It's worth noting that this doesn't use cluster centers from standard multi-level HNSW index clustering, but rather a single-level HNSW index. This is because the number of cluster centers... The current size is already small enough, and the search efficiency improvement brought by standard multi-level HNSWs is limited. Furthermore, the redundant connections and index metadata in the multi-level graph structure of multi-level HNSWs introduce 20% to 80% additional memory overhead. Therefore, choosing to build a single-level HNSW retains the efficient nearest-neighbor graph navigation capability of HNSWs while significantly reducing memory consumption. In hnswlib, level mult is the core parameter controlling the probability of node hierarchical layering. By setting level mult... All nodes are forced to be assigned to the same level (usually level 0), thus constructing a purely single-level graph structure, rather than a standard multi-level HNSW. This innovation reduces the memory footprint of HNSW indexes from... Down to The theoretical savings rate is ,in, Indicates the number of cluster centers. This indicates the number of standard HNSW layers. The average number of neighbors per node. , , For example, assuming that storing a neighbor node ID requires 4 bytes, the memory occupied by a multi-layer HNSW is... The memory usage of a single-layer HNSW is This results in savings of up to approximately 89.1%. Furthermore, the entry point can be set as the global centroid or any default centroid, ensuring graph reachability. Therefore, a single-layer HNSW... At this scale, the search efficiency is almost identical to that of a multi-layer structure (the difference is <1ms in actual tests), but memory savings are significant, making it particularly suitable for embedded or resource-constrained environments.

[0151] Step S408: For each cluster center in the cluster center set, construct a local vector index.

[0152] Optionally, for each cluster center in the cluster center set, the terminal constructs an IVFFlat local vector index, and the terminal groups the vector data according to the clustering results, and applies formula (10) to the first... Each cluster constructs an IVFFlat local vector index. .

[0153] Formula (10)

[0154] In the formula, Indicates in set The IVFFlat index is built on top of this (i.e., it directly stores all vectors and supports brute-force search). Was assigned to the first All vectors of each cluster, i.e. .

[0155] The IVFFlat structure directly stores the original vector (uncompressed), supporting precise brute-force distance calculations within clusters. IVFFlat avoids the precision loss associated with compression encodings like PQ and SQ, ensuring high recall for local searches; since the average size of each cluster is... On modern CPUs, a linear scan takes only about 1-2ms, which is perfectly acceptable. Furthermore, in distributed deployments, each local vector index can be stored and queried independently, supporting parallel processing and improving throughput.

[0156] In this embodiment, the constructed global vector index based on a single-layer HNSW and the local vector index based on IVFFlat can provide a foundation for subsequent vector retrieval.

[0157] In an exemplary embodiment, constructing a global vector index for a single-layer hierarchical navigable small-world graph based on a set of cluster centers includes: creating an initial graph structure, which includes nodes and edges connecting the nodes; starting from any node of the initial graph structure, traversing each cluster center in the set of cluster centers, and inserting each cluster center into each node of the initial graph structure.

[0158] The initial graph structure includes nodes and edges connecting the nodes; the initial graph structure is an empty graph structure, and the nodes are used to store cluster centers.

[0159] Optionally, the terminal first creates an empty graph structure, called the initial graph structure, starting from any node in the initial graph structure. The node can be any cluster center or the global cluster center (global centroid) of the entire set. The starting point is also the initial entry point. Traverse each center cj in the cluster center set C and insert it into the initial graph structure. For each cj, perform the following operations: (1) Determine the node level. In a standard multi-level HNSW, the level l of each node is generated by a random exponential decay probability distribution, such as floor(-ln(uniform(0,1)) * ml), which makes the number of high-level nodes scarce. By setting level mult = 1.0, this probability distribution is forced to fail, so that the level l of each node cj is fixed to 0. In this way, all nodes exist in the same level, forming a single-level graph. (2) Search for the nearest neighbor of the current graph. Starting from the current initial entry point, since this is a single-level graph, the search is only performed at level 0. Check all the neighbors of the current node, jump to the neighbor that is closer to the node cj to be inserted, and repeat this process until no closer neighbor can be found. This final node is called the local nearest neighbor. To find more and more accurate candidate nodes, this process uses a dynamic candidate list whose size is controlled by the efConstruction parameter. Finally, the top-M nearest neighbors are selected from this candidate list as candidates for subsequent connections. (3) Establish edges to create M connections for the newly inserted node cj, connecting it to the M nearest existing nodes in the graph. The construction process ends when all cluster centers cj have been inserted into the empty graph structure G.

[0160] In this embodiment, by constructing a single-layer global vector index, the query vector can be quickly navigated to its nearest cluster centers in the subsequent vector retrieval stage, thereby greatly improving the efficiency of the coarse screening stage.

[0161] In an exemplary embodiment, for each cluster center in the cluster center set, a local vector index is constructed, including: for each cluster center, obtaining the vector set of the cluster corresponding to the cluster center; obtaining the storage space of each vector set; copying the vectors in each vector set to the corresponding storage space, thus forming a local vector index.

[0162] Optionally, for each of the M cluster centers, the terminal obtains a set of vectors belonging to the same cluster, such as the set Xj of all vectors assigned to the j-th cluster, allocates a contiguous memory space for the vector set Xj in memory, and copies all vectors in the vector set Xj to the contiguous memory space to form an IVFFlat local vector index Ij.

[0163] The IVFFlat local vector index Ij stores the vector itself without any compression or quantization.

[0164] In this embodiment, by establishing local vector indexes for clusters corresponding to each cluster center, it is possible to ensure accurate searches within clusters without loss of precision.

[0165] In one exemplary embodiment, such as Figure 5 As shown, the vector index update method includes steps S502 to S508. Wherein:

[0166] Step S502: When the update method is to add a vector, obtain the added vector, and search the global vector index to find the target cluster center of the added vector, and determine the target cluster based on the target cluster center.

[0167] Optionally, when the update method is to add a vector, the terminal obtains the added vector. Using a single-layer HNSW global index built on a set of cluster centers, it enables fast searching with... Nearest cluster center Through calculation The nearest cluster center is determined by its distance from each cluster center. As the target cluster center, the cluster j corresponding to the target cluster center is then determined as the target cluster.

[0168] Step S504: Insert the newly added vector into the local vector index corresponding to the target cluster.

[0169] Optionally, the terminal will add vectors. It is directly added to the IVFFlat local vector index Ij of the target cluster j. Since the IVFFlat index uses a flat storage structure, this operation essentially adds... Insert it into the contiguous memory storage space corresponding to the cluster.

[0170] Step S506: Update the target cluster center of the target cluster into which the newly inserted vector is inserted.

[0171] Optionally, the terminal adjusts the parameters based on the newly added vector. With the addition of [a specific element], the target cluster center (centroid) position of target cluster j is recalculated. The update formula is shown in formula (11).

[0172] Formula (11)

[0173] In the formula, nj is the number of vectors in cluster j before the update. This is the cluster center (centroid) of the previous cluster. The centroids are the cluster centers before and after the update.

[0174] Step S508: If the updated target cluster center meets the preset conditions, recalculate the target node and the edge connecting the target node corresponding to the updated target cluster center in the global vector index.

[0175] Optionally, if the updated target cluster center meets the preset conditions, the terminal triggers a local reconstruction of the global HNSW index. This reconstruction process only targets the moved cluster center node, recalculates the target node corresponding to the updated target cluster center in the global vector index and the edges connecting the target node, without needing to reconstruct the entire HNSW graph.

[0176] In this embodiment, by employing a local update strategy, a full reconstruction of the entire index structure is avoided, significantly reducing the computational overhead and system latency caused by data updates and improving resource utilization efficiency. This approach is particularly suitable for application scenarios with high-frequency data updates. It also ensures that newly added vectors can participate in subsequent similarity calculations in a timely manner, while maintaining the representativeness of clusters through centroid updates. This guarantees the accuracy and recall of vector retrieval from both global and local perspectives.

[0177] In an exemplary embodiment, before recalculating the target node and the edge connecting the target node in the global vector index corresponding to the updated target cluster center in the case that the updated target cluster center meets the preset conditions, the method further includes: obtaining the target cluster center of the target cluster; if the offset between the updated target cluster center and the target cluster center exceeds a preset offset threshold, then it is determined that the updated target cluster center meets the preset conditions.

[0178] Optionally, if the updated target cluster center meets the preset conditions, before recalculating the target node corresponding to the updated target cluster center in the global vector index and the edge connecting the target node, the terminal calculates the target cluster center of the target cluster; if the offset between the updated target cluster center and the target cluster center exceeds the preset offset threshold as shown in formula (12), then it is determined that the updated target cluster center meets the preset conditions.

[0179] Formula (12)

[0180] In the formula, It is worth noting that if If a preset offset threshold is reached, HNSW local reconstruction is triggered. Local reconstruction means that only updates are performed in the HNSW graph. It connects to its neighboring nodes to avoid global reconstruction.

[0181] It should be understood that, In this case, the terminal needs to globally rebuild the global vector index HNSW.

[0182] In this embodiment, by judging the centroid offset and reconstructing the HNSW graph locally, it is ensured that the global navigation index can be adjusted in a timely manner when the data distribution changes, maintaining its efficient navigation capability and preventing the decline in retrieval performance caused by data drift.

[0183] In one exemplary embodiment, such as Figure 6 As shown, the vector index update method includes steps S602 to S608. Wherein:

[0184] Step S602: When the update method is to delete a vector, obtain the deletion instruction of the target vector.

[0185] Optionally, the terminal receives a specific target vector. The deletion command performs permission verification and data consistency checks to ensure the legality and security of the deletion operation and prevent accidental deletion or data inconsistency. The deletion command contains a unique identifier for the target vector or its position information in the index. Deletion commands can originate from user-initiated actions, business system triggers, or data lifecycle management strategies.

[0186] Step S604: According to the deletion instruction, mark the target vector as deleted in the local vector index corresponding to the cluster to which the target vector belongs.

[0187] Optionally, the terminal uses the target vector The identifier is located to its cluster j and its specific position in the IVFFlat local vector index Ij. Subsequently, the terminal does not immediately physically remove the target vector. Instead, the status bit corresponding to the target vector is set to "deleted" in a separate marker bitmap or status table.

[0188] Optionally, when performing a marking operation, the terminal records the deletion timestamp and operation context for easier subsequent auditing and data analysis. During queries, the system skips all vectors marked as deleted to ensure logical correctness.

[0189] Step S606: Based on the scheduled task, statistically analyze the vectors marked as deleted.

[0190] Optionally, the terminal deploys a periodic scheduled task (e.g., executed every 5 minutes) to scan the IVFFlat local vector indexes of all clusters and count the number of vectors currently marked as deleted. The global deletion rate is then calculated. , where Ndeleted is the total number of vectors that have been marked as deleted, and Ntotal is the total number of vectors in the system.

[0191] Optionally, during the statistical process, the terminal will record the number of deleted vectors by cluster, providing data support for subsequent local vector index reconstruction. Simultaneously, the terminal will monitor the growth trend of the deletion rate, providing early warnings of potential large-scale data changes.

[0192] Step S608: If the number of vectors marked as deleted reaches a threshold, perform a local vector index reconstruction operation for the cluster containing vectors marked as deleted.

[0193] Optionally, when the global deletion rate ρ exceeds a preset threshold (e.g., 10%), the terminal initiates a batch reconstruction process. For each cluster j containing the target vector to be deleted, a new IVFFlat local vector index is created, resulting in a new vector set containing only the vectors in that cluster that have not been marked for deletion; the memory space occupied by the original local vector index IVFFlat is released. The cluster centers of cluster j are recalculated based on the new vector set. ,like If a preset offset threshold is reached, HNSW local reconstruction is triggered. Local reconstruction means that only updates are performed in the HNSW graph. It connects to its neighboring nodes to avoid global reconstruction.

[0194] It should be understood that, In this case, the terminal needs to globally rebuild the global vector index HNSW.

[0195] In this embodiment, the soft deletion mechanism transforms immediate physical deletion into delayed logical deletion, significantly reducing the overhead of a single deletion operation and supporting high-concurrency deletion requests. The batch reconstruction strategy merges multiple scattered deletion operations into a single centralized index maintenance, reducing the overhead caused by memory fragmentation and frequent memory allocation and release, and improving memory utilization efficiency.

[0196] In one exemplary embodiment, such as Figure 7 As shown. In receiving the query vector Previously, search parameters were pre-set: the terminal obtains the current state of the query vector for this query, which includes query complexity, average recall, and average latency; among these, the terminal retrieves the Z nearest cluster centers of the query vector q using a single-layer HNSW index, and calculates the set of distances from q to these cluster centers. Calculate the distance variance Determine the query complexity. The terminal maintains a sliding window of size D, recording the recall rate of the most recent D queries, and calculates the average recall rate within the sliding window. The terminal also maintains a sliding window of the same size, recording the response latency of the most recent D queries, and calculates the average query latency or P95 value within the sliding window. The terminal maps continuous current states to discrete current states, such as st = (low variance, medium recall, high latency). For example, σq² is divided into three levels: low variance (<0.02), medium variance (0.02-0.05), and high variance (>0.05); rhist is divided into four levels: poor (<0.85), medium (0.85-0.92), good (0.92-0.97), and excellent (>0.97); lhist is divided into three levels: low latency (<50ms), medium latency (50-100ms), and high latency (>100ms). Based on the current state st, the terminal selects the current action from a preset action space. The selected current action This will be used as the efSearch parameter value for this query.

[0197] The terminal receives the input query vector The terminal uses a global vector index, as shown in Equation (1), such as the HNSW graph index, at predetermined cluster centers. Search of The nearest cluster centers are used to obtain a candidate cluster index set, such as cluster centers C1 to C4. This process is performed on only the nearest cluster centers. Executing on a single-layer graph with a few nodes results in a fast search speed (typically <5ms), far superior to the original... Direct search on the vector.

[0198] Terminal computing to the cluster centers in the initial cluster set Distance variance If the variance of this distance is less than the threshold If the value is 0.05 (indicating that the center is too concentrated), it is determined that the vector distribution in the initial cluster does not meet the preset conditions. At this point, the terminal determines the query vector. Located at the boundary of multiple clusters, there is a risk of missed detections; therefore, the initial cluster set needs to be expanded. Triggering the expanded search: Starting from each center, perform a local breadth-first search (BFS) on the HNSW graph, with at most one supplement. For example, there are 8 neighboring clusters. The terminal determines the target cluster set based on the initial cluster set and each neighboring cluster using formula (2). .

[0199] Terminal targeting cluster For each cluster, the distance between the query vector and each vector in the cluster is calculated in the local vector index corresponding to the cluster: traverse all vectors stored in the cluster and calculate the distance di between the query vector q and each vector xi. The distance metric can be L2 distance, inner product, or cosine similarity, etc. The terminal locally sorts the vectors within each cluster according to the distances in ascending order; selects the L vectors with the smallest distance as the initial vectors of the cluster, forming a local Top-L candidate cluster set. The number of distances is equal to the number of vectors in each cluster. The terminal traverses all clusters and adds all vectors in the candidate cluster set corresponding to each cluster to the global candidate pool, forming an initial vector set. The terminal obtains the distance between each initial vector in the initial vector set and the query vector (which has been calculated and cached in the previous steps, so it is used directly); sorts all initial vectors according to the distance value in ascending order; selects the top K vectors after sorting as the final target vectors, i.e., the global Top-K approximate nearest neighbor result; outputs the target vector set of the target number of target vectors, where K is the target number, which can be set freely according to the actual situation.

[0200] After determining the search parameters to be used in this query, the terminal performs the search using those parameters. Once the search is complete, the terminal obtains the actual recall rate for this query. Compared with actual delay The terminal uses formula (5) based on the actual recall rate. Compared with actual delay To determine the award The terminal updates the Q-table based on the reward for this query using formula (7). The terminal uses formula (8) to set the next action as the updated search parameter. The recommended efSearch value for the next query is then output.

[0201] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0202] Based on the same inventive concept, this application also provides a vector retrieval device for implementing the vector retrieval method described above. The solution provided by this device is similar to the implementation described in the above method; therefore, the specific limitations in one or more vector retrieval device embodiments provided below can be found in the limitations of the vector retrieval method described above, and will not be repeated here.

[0203] In one exemplary embodiment, such as Figure 8 As shown, a vector retrieval device is provided, comprising: a receiving module 801, a first search module 802, an extended search module 803, a determining module 804, and a second search module 805, wherein:

[0204] The receiving module 801 is used to receive the query vector.

[0205] The first search module 802 is used to perform a search based on the query vector and the global vector index of the vector index according to the search parameters to obtain an initial cluster set; wherein, the global vector index is determined based on a single-layer hierarchical navigable small world graph; the search parameters are used to control the size of the initial cluster set.

[0206] The extended search module 803 is used to perform an extended search in the global vector index, starting from the initial cluster set, to obtain each neighboring cluster when the vector distribution in the initial cluster set does not meet the preset conditions.

[0207] The determination module 804 is used to determine the target cluster set based on the initial cluster set and each neighboring cluster.

[0208] The second search module 805 is used to perform a search on the target cluster in the local vector index of the vector index to obtain the target vector, wherein the local vector index is determined based on the inverted file plane vector index.

[0209] In an exemplary embodiment, a vector retrieval device further includes: a determination module, configured to calculate the distance variance from the query vector to each cluster center in the initial cluster set; if the distance variance is less than a distance threshold, it is determined that the vector distribution in the initial cluster set does not meet a preset condition.

[0210] In an exemplary embodiment, the second search module 805 is further configured to, for each cluster in the target cluster set, calculate the distance between the query vector and each vector in the cluster in the local vector index corresponding to the cluster; determine the initial vector of each cluster based on the distances of each cluster; merge the initial vectors of each cluster to form an initial vector set; and sort the initial vectors in the initial vector set to obtain the target number of target vectors.

[0211] In an exemplary embodiment, a vector retrieval device further includes: a search parameter determination module, configured to obtain the current state of the query vector for the current query, the current state including query complexity, average recall rate, and average latency; select a current action from a preset action space based on the query complexity, average recall rate, and average latency; and use the current action as the search parameter used for the current query.

[0212] In an exemplary embodiment, a vector retrieval device further includes: a search parameter update module, configured to obtain the actual recall rate and actual latency of the current query; calculate the reward for the current query based on the actual recall rate and actual latency; determine the next action corresponding to the next state based on the current state, the current action, the reward, and the next state; and use the next action as the updated search parameter.

[0213] In one exemplary embodiment, a vector retrieval device further includes: a vector index determination module, used to obtain a vector dataset; process the vector dataset based on a clustering analysis method to obtain a cluster center set; construct a global vector index of a single-layer hierarchical navigable small-world graph based on the cluster center set; and construct a local vector index for each cluster center in the cluster center set.

[0214] In an exemplary embodiment, the vector index determination module is further configured to create an initial graph structure, which includes nodes and edges connecting the nodes; wherein the initial graph structure is an empty graph structure, and the nodes are used to store cluster centers; starting from any node of the initial graph structure, each cluster center in the cluster center set is traversed, and each cluster center is inserted into each node of the initial graph structure.

[0215] In an exemplary embodiment, the vector index determination module is further configured to, for each cluster center, obtain the vector set of the cluster corresponding to the cluster center; obtain the storage space of each vector set; copy the vectors in each vector set to the corresponding storage space, and perform local vector indexing.

[0216] In an exemplary embodiment, a vector retrieval device further includes: a vector index update module, which is further configured to: obtain a new vector when the update method is to add a new vector; search for the target cluster center of the new vector in the global vector index; determine a target cluster based on the target cluster center; insert the new vector into the local vector index corresponding to the target cluster; update the target cluster center of the target cluster into which the new vector is inserted; and, if the updated target cluster center meets preset conditions, recalculate the target node and the edges connecting the target node corresponding to the updated target cluster center in the global vector index.

[0217] In an exemplary embodiment, the vector index update module is further configured to obtain the target cluster center of the target cluster; if the offset between the updated target cluster center and the target cluster center exceeds a preset offset threshold, then the updated target cluster center is determined to meet the preset conditions.

[0218] In an exemplary embodiment, the vector index update module is further configured to: obtain a deletion instruction for the target vector when the update method is to delete a vector; mark the target vector as deleted in the local vector index corresponding to the cluster to which the target vector belongs, according to the deletion instruction; count the vectors marked as deleted based on a timed task; and, if the number of vectors marked as deleted reaches a threshold, perform a local vector index reconstruction operation for the cluster containing vectors marked as deleted.

[0219] Each module in the aforementioned vector retrieval device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0220] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 9 As shown, this computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs stored in the non-volatile storage media. The database stores vectors. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When executed by the processor, the computer program implements a vector retrieval method.

[0221] Those skilled in the art will understand that Figure 9 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0222] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0223] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0224] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0225] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.

[0226] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.

[0227] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A vector retrieval method, characterized in that, The method includes: Receive query vector; An initial cluster set is obtained by searching based on the query vector and the global vector index of the vector index according to the search parameters; wherein, the global vector index is determined based on a single-layer hierarchical navigable small-world graph; the search parameters are used to control the size of the initial cluster set. If the vector distribution in the initial cluster set does not meet the preset conditions, an extended search is performed in the global vector index, starting from the initial cluster set, to obtain each neighboring cluster. Based on the initial cluster set and each of the neighboring clusters, the target cluster set is determined; A search is performed on the target cluster set in the local vector index of the vector index to obtain the target vector, wherein the local vector index is determined based on the inverted file plane vector index; The method for determining the search parameters includes: Obtain the current state of the query vector for this query, the current state including query complexity, average recall rate and average latency; Based on the query complexity, the average recall rate, and the average latency, select a current action from the preset action space; The current action is used as the search parameter for this query.

2. The method according to claim 1, characterized in that, If the vector distribution in the initial cluster set does not meet the preset conditions, before performing an expanded search in the global vector index, starting from the initial cluster set, the process further includes: Calculate the variance of the distance from the query vector to each cluster center in the initial cluster set; If the distance variance is less than the distance threshold, it is determined that the vector distribution in the initial cluster does not meet the preset conditions.

3. The method according to claim 1, characterized in that, The step of performing a search on the target cluster set in the local vector index of the vector index to obtain the target vector includes: For each cluster in the target cluster set, calculate the distance between the query vector and each vector in the cluster in the local vector index corresponding to the cluster; Determine the initial vector for each cluster based on the distances described for each cluster; Merge the initial vectors of each of the clusters to determine the initial vector set; The initial vectors in the initial vector set are sorted to obtain the target number of target vectors.

4. The method according to claim 1, characterized in that, The methods for updating the search parameters include: Obtain the actual recall and actual latency for this query; The reward for this query is calculated based on the actual recall rate and the actual latency. Based on the current state, the current action, the reward, and the next state, determine the next action corresponding to the next state; Use the next action as the updated search parameter.

5. The method according to claim 1, characterized in that, The method for determining the vector index includes: Obtain the vector dataset; The vector dataset is processed using cluster analysis to obtain a set of cluster centers; A global vector index for a single-layer hierarchical navigable small-world graph is constructed based on cluster center sets; For each cluster center in the cluster center set, a local vector index is constructed.

6. The method according to claim 5, characterized in that, The global vector index for constructing a single-layer hierarchical navigable small-world graph based on cluster center sets includes: Create an initial graph structure, which includes nodes and edges connecting the nodes; wherein the initial graph structure is an empty graph structure, and the nodes are used to store cluster centers; Starting from any node of the initial graph structure, traverse each cluster center in the cluster center set and insert each cluster center into each node of the initial graph structure.

7. The method according to claim 5, characterized in that, The step of constructing a local vector index for each cluster center in the cluster center set includes: For each cluster center, obtain the vector set of the cluster corresponding to that cluster center; Obtain the storage space for each of the vector sets; The vectors in each of the vector sets are copied to the corresponding storage space, using local vector indexing.

8. The method according to claim 1, characterized in that, The methods for updating the vector index include: When the update method is to add a vector, the new vector is obtained, and the target cluster center of the new vector is found in the global vector index. The target cluster is determined based on the target cluster center. Insert the newly added vector into the local vector index corresponding to the target cluster; Update the target cluster center of the target cluster into which the new vector is inserted; If the updated target cluster center meets the preset conditions, the target node corresponding to the updated target cluster center in the global vector index and the edges connecting the target node are recalculated.

9. The method according to claim 8, characterized in that, Before recalculating the target node corresponding to the updated target cluster center in the global vector index and the edges connecting the target node when the updated target cluster center meets the preset conditions, the method further includes: Obtain the target cluster center of the target cluster; If the offset between the updated target cluster center and the original target cluster center exceeds a preset offset threshold, then the updated target cluster center is determined to meet the preset condition.

10. The method according to claim 1, characterized in that, The methods for updating the vector index include: When the update method is to delete a vector, obtain the deletion instruction for the target vector; According to the deletion instruction, the target vector is marked as deleted in the local vector index corresponding to the cluster to which the target vector belongs; Vectors marked as deleted based on scheduled tasks; If the number of vectors marked as deleted reaches a certain threshold, a local vector index reconstruction operation is performed on the cluster containing vectors marked as deleted.

11. A vector retrieval device, characterized in that, The device includes: The receiving module is used to receive query vectors; The search parameter determination module is used to obtain the current state of the query vector for this query, which includes query complexity, average recall rate, and average latency; based on the query complexity, average recall rate, and average latency, select a current action from the preset action space; and use the current action as the search parameter for this query. The first search module is used to perform a search based on the query vector and the global vector index of the vector index according to the search parameters to obtain an initial cluster set; wherein, the global vector index is determined based on a single-layer hierarchical navigable small-world graph; the search parameters are used to control the size of the initial cluster set; An extended search module is used to perform an extended search in the global vector index, starting from the initial cluster set, to obtain each neighboring cluster when the vector distribution in the initial cluster set does not meet the preset conditions. The determination module is used to determine the target cluster set based on the initial cluster set and each of the neighboring clusters; The second search module is used to perform a search on the target cluster in the local vector index of the vector index to obtain the target vector, wherein the local vector index is determined based on the inverted file plane vector index.

12. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 10.

13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 10.

14. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 10.