Lock-free learning index method and apparatus, and electronic device, storage medium and program product

By dividing the data to be indexed into multiple leaf nodes and constructing a lock-free linked list, the problem that learning indexing technology cannot support multi-threaded load is solved, and efficient querying, insertion and updating under multi-threading is achieved.

WO2026130190A1PCT designated stage Publication Date: 2026-06-25CHINA MOBILE (SUZHOU) SOFTWARE TECH CO LTD +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CHINA MOBILE (SUZHOU) SOFTWARE TECH CO LTD
Filing Date
2025-12-10
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Learning indexing technology only supports single-threaded queries, inserts, and updates, and cannot support efficient multi-threaded loads.

Method used

The data to be indexed is divided into multiple leaf nodes. The model is trained using each leaf node, a lock-free linked list is built, and the target index structure is constructed, supporting multi-threaded access.

Benefits of technology

It implements multi-threaded query, insert, and update operations, supporting efficient multi-threaded workloads.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025141522_25062026_PF_FP_ABST
    Figure CN2025141522_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the present disclosure are a lock-free learning index method and apparatus, and an electronic device, a computer-readable storage medium and a computer program product. The method comprises: performing grouping and space partitioning processing on data to be indexed, so as to obtain a plurality of pieces of leaf node data, and performing model training processing on the basis of the leaf node data, so as to obtain a first model; performing memory pre-allocation processing in the first model, so as to obtain a storage space, and storing a corresponding lock-free linked list in the storage space; determining a remaining space corresponding to the storage space, performing recombination optimization processing on the leaf node data corresponding to the storage space the remaining space of which is less than a preset space threshold value, so as to obtain an initial index structure, and on the basis of the lock-free linked list, connecting the initial index structure to a preset tree structure, so as to obtain a target index structure; and performing index processing on a target element by means of the target index structure, so as to obtain target position information of the target element in the data to be indexed.
Need to check novelty before this filing date? Find Prior Art

Description

Lock-free learning indexing methods, devices, electronic devices, storage media, and program products

[0001] Cross-references to related applications

[0002] This disclosure claims priority to Chinese Patent Application No. 2024118838426, filed on December 19, 2024, entitled "Lockless Learning Indexing Method and Apparatus, Electronic Device", the entire contents of which are incorporated herein by reference. Technical Field

[0003] This disclosure relates to the field of data processing technology, and to, but is not limited to, a lock-free learning indexing method, apparatus, electronic device, computer-readable storage medium, and computer program product. Background Technology

[0004] Learning indexing technology combines artificial intelligence with traditional databases, replacing traditional index structures with machine learning models. By analyzing the distribution characteristics of business data or the patterns of workload, it learns the distribution patterns of underlying data and establishes a mapping relationship between key values ​​and positions, making traditional database systems intelligent. This has become a research hotspot in the database field. The key idea of ​​learning indexing is to use a learning model to approximate the index. The model is trained using the key values ​​and positions of records, and then the trained model is used to predict the position of a specified key value. Learning indexing effectively reduces storage overhead and improves query efficiency by replacing tree traversal operations with mathematical calculations.

[0005] Learning indexes primarily focus on optimizing query performance. In terms of insertion and update, there are two common methods: in-place algorithms and delta-buffer. Both in-place array insertion and delta-buffer writing are single-threaded scenarios.

[0006] Therefore, how to solve the problem that learning indexes only support single-threaded queries, inserts, and updates and cannot support efficient multi-threaded loads is an urgent issue to be addressed. Summary of the Invention

[0007] This disclosure provides a lock-free learning index method, apparatus, electronic device, computer-readable storage medium, and computer program product. Its main objective is to address the problem that learning indexes only support single-threaded queries, inserts, and updates, and cannot support efficient multi-threaded workloads.

[0008] According to a first aspect of the present disclosure, a lock-free learning indexing method is provided, comprising:

[0009] The data to be indexed is divided into groups to obtain multiple leaf node data. A model is trained based on each leaf node data to obtain a first model corresponding to each leaf node data. Each leaf node data includes at least multiple elements to be indexed and the position information corresponding to each element.

[0010] In each of the first models, memory pre-allocation is performed to obtain the storage space corresponding to each of the first models, and the corresponding lock-free linked list is stored in the storage space; wherein, the lock-free linked list is a singly linked lock-free linked list composed of incremental modifications based on the corresponding leaf node data;

[0011] Determine the remaining space corresponding to each of the storage spaces, reorganize and optimize the leaf node data corresponding to the storage spaces with remaining space less than a preset space threshold to obtain an initial index structure, and connect the initial index structure to a preset tree structure according to the lock-free linked list to obtain the target index structure.

[0012] The target element is indexed using the target index structure to obtain the target position information of the target element in the data to be indexed, wherein the target element is any element among the elements to be indexed.

[0013] According to a second aspect of the present disclosure, a lock-free learning indexing apparatus is provided, comprising:

[0014] The partitioning unit is configured to group and partition the data to be indexed into multiple leaf node data.

[0015] The training unit is configured to perform model training processing based on each leaf node data to obtain a first model corresponding to each leaf node data; wherein each leaf node data includes at least multiple elements to be indexed and position information corresponding to each element to be indexed.

[0016] The allocation unit is configured to perform memory pre-allocation processing in each of the first models to obtain the storage space corresponding to each of the first models, and to store the corresponding lockless linked list in the storage space; wherein, the lockless linked list is a single-linked lockless linked list composed of incremental modifications based on the corresponding leaf node data;

[0017] The optimization unit is configured to determine the remaining space corresponding to each of the storage spaces, reorganize and optimize the leaf node data corresponding to the storage spaces whose remaining space is less than a preset space threshold to obtain an initial index structure, and connect the initial index structure to a preset tree structure according to the lockless linked list to obtain a target index structure.

[0018] The indexing unit is configured to index a target element through the target index structure to obtain the target position information of the target element in the data to be indexed, wherein the target element is any element among the elements to be indexed.

[0019] According to a third aspect of the present disclosure, an electronic device is provided, comprising:

[0020] At least one processor; and

[0021] A memory communicatively connected to the at least one processor; wherein,

[0022] The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method described in the first aspect above.

[0023] According to a fourth aspect of the present disclosure, a computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are configured to cause the computer to perform the method described in the first aspect above.

[0024] According to a fifth aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method described in the first aspect above.

[0025] Compared with related technologies, the embodiments of this disclosure divide the data to be indexed into different underlying data groups, namely leaf node data. Each group corresponds to a linear model prediction, namely the first model. Lock-free linked lists are added inside the groups as multi-threaded lock-free structures. Finally, a target index structure based on a preset tree structure is constructed, and indexing is performed through the target index structure. This enables multi-threaded high-concurrency access, supports multi-threaded querying and insertion / update, and supports efficient multi-threaded load.

[0026] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this application, nor is it intended to limit the scope of this application. Other features of this application will become readily apparent from the following description. Attached Figure Description

[0027] The accompanying drawings are provided to better understand this disclosure and are not intended to limit it. Wherein:

[0028] Figure 1 is a flowchart illustrating a lock-free learning indexing method provided in an embodiment of this disclosure;

[0029] Figure 2 is a schematic diagram of a first model corresponding to leaf node data provided in an embodiment of this disclosure;

[0030] Figure 3 is a schematic diagram of the properties of a lockless linked list provided in an embodiment of this disclosure;

[0031] Figure 4 is a schematic diagram of a memory pre-allocation process provided in an embodiment of this disclosure;

[0032] Figure 5 is a schematic diagram of a target index structure provided in an embodiment of this disclosure;

[0033] Figure 6 is a schematic diagram of a grouping space partitioning process provided in an embodiment of this disclosure;

[0034] Figure 7 is a schematic diagram illustrating the principle of grouping space partitioning provided in an embodiment of this disclosure;

[0035] Figure 8 is a schematic diagram of the error of a first model provided in an embodiment of this disclosure;

[0036] Figure 9 is a schematic diagram of an adaptive data reorganization optimization structure provided in an embodiment of this disclosure;

[0037] Figure 10 is a schematic diagram of a point query reorganization process provided in an embodiment of this disclosure;

[0038] Figure 11 is a schematic diagram of the structure of a lockless self-organizing linked list provided in an embodiment of this disclosure;

[0039] Figure 12 is a schematic diagram of another lockless self-organizing linked list provided in an embodiment of this disclosure;

[0040] Figure 13 is a schematic diagram of a lock-free B+ tree structure provided in an embodiment of this disclosure;

[0041] Figure 14 is a schematic diagram illustrating the principle of element insertion provided in an embodiment of this disclosure;

[0042] Figure 15 is a schematic diagram illustrating the principle of a non-unique index processing method provided in an embodiment of this disclosure;

[0043] Figure 16 is a schematic diagram of the structure of a lock-free learning indexing device provided in an embodiment of this disclosure;

[0044] Figure 17 is a schematic diagram of another lock-free learning indexing device provided in an embodiment of this disclosure;

[0045] Figure 18 is a schematic block diagram of an electronic device provided in an embodiment of this disclosure. Detailed Implementation

[0046] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0047] The following description, with reference to the accompanying drawings, outlines a lock-free learning indexing method, apparatus, electronic device, computer-readable storage medium, and computer program product according to embodiments of the present disclosure.

[0048] Figure 1 is a flowchart illustrating a lock-free learning indexing method provided in an embodiment of this disclosure. As shown in Figure 1, the method includes the following steps:

[0049] Step 101: The data to be indexed is divided into groups to obtain multiple leaf node data, and a model training process is performed on each leaf node data to obtain a first model corresponding to each leaf node data; wherein, each leaf node data includes at least multiple elements to be indexed and the position information corresponding to each element to be indexed.

[0050] The lock-free learning indexing method provided in this disclosure requires the construction of a tree-based target index structure to index the elements (target elements) that need to be indexed. The target index structure is divided into two parts: an upper-level tree structure and a lower-level leaf node grouping (multiple leaf node data).

[0051] At the same time, each piece of data to be indexed corresponds to a target index structure. That is, when it is necessary to index the target element in the data to be indexed, the corresponding target index structure needs to be established based on the data to be indexed, and then the target element is indexed in the data to be indexed based on the target index structure.

[0052] In this embodiment of the disclosure, it is necessary to determine the lower-level leaf node grouping of the target index structure by dividing the data to be indexed. That is, each leaf node data represents each lower-level leaf node grouping of the target index structure. The data to be indexed is the entire dataset that needs to be indexed, and its type includes but is not limited to: key value sequence, etc. For example, this embodiment of the disclosure does not limit the type and content of the data to be indexed.

[0053] To facilitate understanding of the lock-learning indexing method provided in this disclosure, the data to be indexed will be illustrated using a key-value sequence as an example in the following embodiments.

[0054] In the process of partitioning the grouping space, it is necessary to determine the error region corresponding to each element to be indexed in the data to be indexed, and to determine the grouping of each element to be indexed (i.e., the leaf node data where each element to be indexed is located) based on the error region. The error region is the region determined according to the preset error corresponding to each element to be indexed. The preset error of each element to be indexed is the maximum acceptable error between the actual position and the predicted position of the element to be indexed, which is set by a user. For example, the partitioning process of the grouping space can be carried out in the following way, but is not limited to: If the data to be indexed is a key-value sequence k, each element to be indexed is a key value, the preset error is δ, and the data point is a point in the plane coordinate system corresponding to each key value and its corresponding actual position, when the next data point pnext arrives, two inequalities are derived based on pstart (the starting data point), pnext, and δ to determine the two boundaries of the first error region. Then, the next data point is read to form two new boundaries, which intersect with the first error region to obtain a new error region. As data points arrive consecutively, the error region gradually shrinks until it becomes empty at a certain data point. This data point is then used as the starting point for the next group, and the data point preceding it is used as the segmentation point for the current segment. This process is repeated until the entire key-value sequence is grouped.

[0055] During model training, each leaf node data corresponds to a model. The first model is a trained linear model. The training method of the first model includes, but is not limited to, least binary error (minimizing the second error norm), etc. For example, this disclosure embodiment does not limit the training method of the first model.

[0056] After model training is complete, each leaf node data can be represented as a quintuple (keys, slope, intercept, pre_group, next_group). Here, keys is the key-value array that stores the data, which is the array of elements to be indexed in the leaf node data. slope represents the slope of the model parameters of the first model. intercept represents the intercept of the first model. pre_group represents the address pointing to the previous leaf node data.

[0057] Meanwhile, the first model can be represented by, but is not limited to, formula (1): fs(k)=k×slope+intercept Formula (1)

[0058] Where fs(k) is the predicted position of the key (target element), and k is the key.

[0059] Regarding the first model, this disclosure also provides a schematic diagram of the first model corresponding to the leaf node data, as shown in Figure 2, where represents the first model corresponding to a set of leaf node data with key values ​​[27, 46].

[0060] Step 102: Perform memory pre-allocation processing in each of the first models to obtain the storage space corresponding to each of the first models, and store the corresponding lock-free linked list in the storage space; wherein, the lock-free linked list is a single-linked lock-free linked list composed of incremental modifications based on the corresponding leaf node data.

[0061] In this embodiment of the disclosure, the lockless linked list includes at least records of modifications made to the leaf node data, node status, and the latest attributes of the node. For example, this embodiment of the disclosure does not limit the content of the lockless linked list.

[0062] Regarding lock-free linked lists, this disclosure provides an attribute diagram of a lock-free linked list, as shown in Figure 3. Size records the size of the leaf node data, i.e., the number of key values ​​(elements to be indexed). As shown in Figure 3, after executing Δdelete[K1, V1] (deleting key value [K1, V1]), the size of a certain LeafNode (leaf node data) becomes 6; after executing Δinsert[K2, V2] (inserting key value [K2, V2]), the size becomes 7.

[0063] Depth: Records the distance between the current operation step and the LeafNode in the lockless linked list, as shown in Figure 3. The Depth of the Δdelete[K1, V1] operation is 2, and the Depth of the Δinsert[K2, V2] operation is 1.

[0064] Offset: The position of the key-value pair to be operated on in the leaf node data (as shown in Figure 3, in the Δdelete[K1, V1] operation, K1 is the first position of the LeafNode, so the Offset is 0. In the Δinsert[K2, V2] operation, K2 is the second position of the LeafNode, so the Offset is 1).

[0065] For example, during memory pre-allocation, space needs to be pre-allocated within the first model corresponding to each leaf node data to store the lockless linked list. For example, regarding the memory pre-allocation process, this disclosure provides a schematic diagram of memory pre-allocation, as shown in Figure 4. In this case, the first model is stored at the high address end of the pre-allocated Memory Chunk, and the lockless linked list is stored from the high address end to the low address end (from right to left in Figure 4). By designing the connection direction of the lockless linked list to be opposite to the memory growth direction, an efficient traversal method can be provided. Reading incremental records in the order from newest to oldest is consistent with the linear memory access method from low to high, which fully utilizes the hardware prefetch function of the CPU (Central Processing Unit) and improves the performance of multi-threaded lockless linked list read and write.

[0066] Step 103: Determine the remaining space corresponding to each of the storage spaces, reorganize and optimize the leaf node data corresponding to the storage spaces with remaining space less than a preset space threshold to obtain an initial index structure, and connect the initial index structure to the preset tree structure according to the lockless linked list to obtain the target index structure.

[0067] In this embodiment of the disclosure, the preset space threshold is a custom-set threshold, such as 0%, 2%, etc. For example, this embodiment of the disclosure does not impose any restrictions on the preset space threshold.

[0068] Regarding the reorganization optimization of leaf node data, the following methods can be used, but are not limited to: When the storage space of a certain leaf node data is full (i.e., when the remaining space is 0%), the reorganization optimization of this leaf node data is triggered. (That is, as data is continuously inserted, the number of data stored in the corresponding unlocked linked list of each group (leaf node data) increases. In order to prevent the unlocked linked list from becoming too long and increasing the workload of merging and reorganization, the reorganization optimization will be triggered when the number of unlocked linked lists exceeds the upper limit).

[0069] The target index structure is divided into two parts: an upper-level tree structure and lower-level leaf node groups (multiple leaf node data). The preset tree structure is a custom-defined tree structure, such as an R-tree. For example, this disclosure provides a schematic diagram of the target index structure, as shown in Figure 5. The R-tree is the upper-level tree structure (preset tree structure). Multiple groups are designed as leaf node data. The leaf node data of the lower-level groups are indexed through the R-tree. The R-tree stores the group node start_key, and the address of each group is Group[i]. The upper-level tree structure stores...<start_key,Group[i]> Key / value pairs. The lower-level leaf node groups (leaf node data) contain key-value arrays, connected to the first part of the tree structure via a lock-free linked list (delta-node). Each delta-node stores incremental data records; a new delta-node is generated and connected to the old delta-node for each insert / delete operation. A linear model (first-level line model) is trained based on the data in each group (leaf node data). The input is the key (target element), and the output is the model's predicted data location (target location information). Furthermore, when partitioning the indexed data into groups, a greedy algorithm can be used to divide it into multiple groups (multiple leaf node data), minimizing the number of grouped data within the linear model's error range, thereby reducing index memory space.

[0070] Step 104: Index the target element using the target index structure to obtain the target position information of the target element in the data to be indexed, wherein the target element is any element among the elements to be indexed.

[0071] In this embodiment of the disclosure, when performing indexing processing of the target element, the following methods may be used, but are not limited to: searching for the group (leaf node data) to which the target element belongs in the preset tree structure; traversing the unlocked linked list (delta-node) corresponding to the group to which the target element belongs to find the target element; using the linear model (first model) corresponding to the group to which the target element belongs to obtain the predicted position, thereby obtaining the target position information.

[0072] The search within the pre-defined tree structure refers to the following: the address of each group (leaf node data) is stored in an R-tree (with `start_key` as the key and `Group[i]` as the value). The search first finds the group to which the target element belongs by traversing the R-tree (pre-defined tree structure) from the root to the leaves using a standard tree traversal algorithm. The search terminates when a leaf node containing the target element is reached. Since the R-tree is used to locate and confirm groups rather than individual points, the time complexity of searching for the group to which the target element belongs is O(logb(p)), where b is the fanout constant representing the pre-defined tree structure, and p is the number of groups created (i.e., the number of leaf node data).

[0073] The lock-free learning indexing method disclosed herein divides the data to be indexed into groups and spaces to obtain multiple leaf node data. Model training is then performed on each leaf node data to obtain a first model corresponding to each leaf node data. Each leaf node data includes at least multiple elements to be indexed and their corresponding position information. Memory pre-allocation is performed on each first model to obtain its own storage space, and a corresponding lock-free linked list is stored in the storage space. This lock-free linked list is a singly linked lock-free linked list composed of incremental modifications to the corresponding leaf node data. The remaining space for each storage space is determined, and the leaf node data corresponding to storage spaces with remaining space less than a preset space threshold are reorganized and optimized to obtain an initial index structure. The initial index structure is then connected to a preset tree structure based on the lock-free linked list to obtain a target index structure. The target element is indexed using the target index structure to obtain its target position information within the data to be indexed, where the target element is any element among the elements to be indexed. Compared with related technologies, the embodiments of this disclosure divide the data to be indexed into different underlying data groups, namely leaf node data. Each group corresponds to a linear model prediction, namely the first model. Lock-free linked lists are added inside the groups as multi-threaded lock-free structures. Finally, a target index structure based on a preset tree structure is constructed, and indexing is performed through the target index structure. This enables multi-threaded high-concurrency access, supports multi-threaded querying and insertion / update, and supports efficient multi-threaded load.

[0074] In one possible implementation of this disclosure, regarding the grouping space partitioning process for the data to be indexed, this disclosure provides a schematic flowchart of grouping space partitioning, as shown in Figure 6, including:

[0075] Step 601: Based on each element to be indexed in the data to be indexed and its corresponding position information, coordinate point construction processing is performed to obtain the target data point corresponding to each element to be indexed.

[0076] In this embodiment of the disclosure, when constructing coordinate points, the element to be indexed is used as the horizontal coordinate and the position information corresponding to the element to be indexed is used as the vertical coordinate. For example, for ease of understanding, this embodiment of the disclosure provides a schematic diagram of the principle of grouping space division, as shown in Figure 7, where p0, ..., p5 are the target data points corresponding to the 5 elements to be indexed in the data to be indexed.

[0077] Step 602: Determine the first upper boundary and the first lower boundary of the error for the first data point based on the preset error, and determine the error range based on the first upper boundary and the first lower boundary of the error to obtain the first error region corresponding to the first data point; wherein, the first data point is any target data point.

[0078] In this embodiment of the disclosure, the preset error is the maximum acceptable error between the actual position and the predicted position of the element to be indexed, which is set by a custom method. Regarding the determination of the first error region, the following methods can be used, but are not limited to: Assuming the preset error is δ, p0(x0, y0) is the starting data point of the key value sequence (data to be indexed), and the starting data point is the starting point of the first model. When reading the second data point p1(x1, y1), the ordinate of x1 on the key value sequence must be between points p1┬ and p1┴, which are the upper and lower boundary points of p1, obtained by y1±δ. Therefore, |p1┬-p1|=|p1-p1┴|=δ. Any line between the upper line u1 and the lower line l1 satisfies the error requirement of p1, and the area between the upper line u1 and the lower line l1 is the feasible space of the data point p1. At this time, if the first data point is represented by p1, then the upper boundary of the first error is u1, the lower boundary of the first error is l1, and the first error region is the area between the upper line u1 and the lower line l1.

[0079] Step 603: Determine the upper boundary and lower boundary of the second error for the second data point based on the preset error, and determine the error range based on the upper boundary and lower boundary of the second error to obtain the second error region corresponding to the second data point.

[0080] The second data point is the next adjacent target data point of the first data point.

[0081] In this embodiment of the disclosure, when a new data point p2(x2, y2) is read, any line between the upper line u2 and the lower line l2 satisfies the error requirement of p2, and the area between the upper line u2 and the lower line l2 is the feasible space of the data point p2. At this time, if the second data point is represented by p2, then the upper boundary of the second error is u2, the lower boundary of the second error is l2, and the second error area is the area between the upper line u2 and the lower line l2.

[0082] Step 604: In the case where there is an overlap between the first error region and the second error region, the indexable element corresponding to the first data point and the indexable element corresponding to the second data point are divided into first leaf node data.

[0083] In this embodiment, when a new data point is read, the error region is incrementally updated. For example, in Figure 7, p1 is taken as the first data point and p2 as the second data point. When data point p2 is read, two upper and lower boundary lines u2 and l2 are obtained. The area between u2 and l2 is the second error region of p2. In the region formed by the intersection of the second error region and the first error region, any straight line can satisfy the error requirements of p1 and p2. Therefore, it can be determined that p1 and p2 belong to the same group, that is, to the same leaf node data.

[0084] Step 605: In the case that there is no overlap between the first error region and the second error region, the element to be indexed corresponding to the first data point is divided into first leaf node data, the element to be indexed corresponding to the second data point is divided into second leaf node data, and the second data point is used as the starting data point of the second leaf node data.

[0085] In this embodiment, when a new data point is read, the error region is incrementally updated. For example, in Figure 7, p4 is taken as the first data point and p5 as the second data point. When data point p4 is read, two upper and lower boundary lines u4 and l4 are obtained. The area between u5 and l5 is the second error region of p5. The second error region does not intersect with the first error region, and no straight line can satisfy the error requirements of p4 and p5. Therefore, it can be determined that p4 and p5 do not belong to the same group, i.e., they do not belong to the same leaf node data.

[0086] Step 606: Repeat the above steps until all the data to be indexed is grouped and the multiple leaf node data are obtained.

[0087] In this embodiment of the disclosure, the incremental update process of the error region is repeated until the intersection region of the error regions becomes empty. When the intersection region of the error regions becomes empty, it means that no subsequent data point (including the current data point) can be approximated by a straight line within the error range. Therefore, the previous data point will be the endpoint of the current segment and also the new starting data point of the next group segment.

[0088] For example, as shown in the right figure of FIG7, the overall process of this embodiment includes, but is not limited to: after p1 and p2, the next two data points p3 and p4 are read sequentially. At this time, there is a common intersection region [u3, l2] between p5, p5, p5, and p5. After reading p5, the common intersection region between p5 and p1, p2, p3, and p4 becomes empty because the upper boundary line u5 is lower than the lower boundary line l2. Therefore, data point p4 is the grouping endpoint. The line connecting p0 and p4 is an approximate result of the data points between p0 and p4. p5 is used as the starting data point of the next group, and this process is repeated until all the data to be indexed is grouped.

[0089] In one possible implementation of this disclosure, when performing model training based on leaf node data, the following methods can also be used, but are not limited to: performing position prediction processing on the elements to be indexed using a preset linear model to obtain the predicted position information corresponding to each element to be indexed; performing error analysis based on the position information and the predicted position information to obtain the prediction error corresponding to each element to be indexed; and performing model optimization processing on the preset linear model based on the prediction error, the elements to be indexed, and the position information to obtain the first model corresponding to each leaf node data.

[0090] In this embodiment of the disclosure, the prediction error needs to be calculated in advance. Then, the model is optimized according to the prediction error using a preset optimization algorithm to obtain the first model. Regarding the prediction error, this embodiment of the disclosure provides an error diagram of the first model, as shown in Figure 8. If the line offset of the (x2, y2) linear model is greater than the error, then the model from (x1, y1) to (x3, y3) is invalid. Therefore, the model needs to be optimized according to the prediction error. The optimization algorithm is a custom algorithm, such as the second-order method, etc. Exemplarily, this embodiment of the disclosure does not impose any limitations.

[0091] In one possible implementation of this disclosure, the reorganization optimization process can also be implemented in the following ways, but is not limited to: processing the lock-free linked list corresponding to the target leaf node data to obtain a first dataset and a second dataset; wherein, the target leaf node data is the leaf node data corresponding to the storage space with remaining space less than a preset space threshold; constructing a lock-free self-organizing linked list and a lock-free B+ tree in the target buffer corresponding to the target leaf node data based on the first dataset; wherein, the first dataset includes the existing data of the lock-free linked list after insertion and / or deletion operations, and the second dataset includes the data deleted from the lock-free linked list after insertion and / or deletion operations; performing point query reorganization processing on the target leaf node data through the lock-free self-organizing linked list to obtain a first index structure, and performing range query reorganization processing on the first index structure through the lock-free B+ tree to obtain an initial index structure.

[0092] For ease of understanding, this disclosure provides a schematic diagram of an adaptive data reorganization optimization structure, as shown in Figure 9. The target leaf node data has a corresponding buffer, namely the target buffer, which is divided into a lock-free self-organizing linked list and a lock-free B+ tree + delta-node (lock-free B+ tree). The lock-free self-organizing linked list and the lock-free B+ tree handle point queries and range queries respectively. First, the background thread processes the delta-node (lock-free linked list) corresponding to the target leaf node data and outputs two datasets L. exist (First dataset) and L delete (Second dataset), add the incremental records of insertion and deletion to L. exist and L delete To confirm that the inserted and deleted values ​​are not overwritten by the newly added values, use L. exist Store the existing data of delta_node using L exist The data is used to construct a lock-free self-organizing linked list and a lock-free B+ tree. The query return result is (L buffer U result)-L delete , where L buffer Returns data to the buffer, and result is the grouped return result.

[0093] In one possible implementation of this disclosure, when performing point query reorganization, this disclosure provides a flowchart illustrating the point query reorganization process, as shown in Figure 10, including:

[0094] Step 1001: Perform window partitioning processing on the lock-free self-organizing linked list to obtain a first window, and perform region partitioning processing on the first window according to a preset ratio to obtain a first region and a second region.

[0095] In this embodiment of the disclosure, the preset ratio is a custom-set ratio, such as 1% of the total linked list length, 20% of the total linked list length, etc. For example, this embodiment of the disclosure does not limit the preset ratio.

[0096] For ease of understanding, this disclosure provides a schematic diagram of a lock-free self-organizing linked list, as shown in Figure 11. It requires combining the Least Frequently Used (LFU) and Least Recently Used (LRU) strategies to ensure the freshness of data in the first window, achieving adaptability between LFU and LRU. A first window is partitioned at the front of the lock-free self-organizing linked list to simultaneously store frequency and recency data. The window is further divided into regions A1 (first region) and A2 (second region). 1% of the total linked list length is used as region A1 to store recency data (most recently accessed data), while the remaining 20% ​​of the total linked list length is used as region A2 to store frequency data (data accessed more frequently). In other words, recency data is stored in the first region, and frequency data is stored in the second region.

[0097] Step 1002: Store near-near data in the first region; wherein, the near-near data is the most recently accessed data that accesses the target leaf node data.

[0098] In this embodiment of the disclosure, 1% of the total linked list length is used as the A1 area to store near-near data, that is, near-near data can be directly stored in the first area.

[0099] Step 1003: Monitor the data storage records in the first window to obtain monitoring results. If it is determined from the monitoring results that there are added records in the data storage records, perform numerical calculation processing on the frequency data and store the processed frequency data in the second area; wherein, the frequency data is the access frequency of accessing the target leaf node data; wherein, the first index structure includes at least the first area and the second area.

[0100] In this embodiment of the disclosure, in order to avoid the problem of the frequency count growing indefinitely and the data access mode changing, each time a record is added to the first window, the frequency value of all records in the first window is divided by 2. Choosing to divide by 2 can be efficiently implemented in hardware using a shift register, which can play the role of count decay.

[0101] For example, by reorganizing a lock-free self-organizing linked list through point lookup, L can be reorganized. exist By performing data reorganization and merging, applying lock-free self-organizing linked lists to concurrent scenarios can fully utilize the advantages of multi-core processors and also supplements the scenarios of data deletion. For example, regarding lock-free self-organizing linked lists, this disclosure also provides another schematic diagram of a lock-free self-organizing linked list structure, as shown in Figure 12, wherein the lock-free self-organizing linked list includes L existAnd a bitmap, which is used to store L exist Does the corresponding key (the element to be indexed) exist in the array, with a length of L? exist_max_key -L exist_min_key When multiple threads access a lock-free self-organizing linked list, if one thread moves a node to the head of the list, other threads still searching for that node will be unable to find it, resulting in a "node does not exist" error. To address the data inconsistency issue caused by concurrent access to a lock-free self-organizing linked list, if a thread does not find the data but detects the existence of a bitmap, it will return to the head of the list and search again. Node movement is achieved through atomic Compare-and-Swap (CAS) operations to modify the node pointer, allowing for concurrent access and modification of the linked list structure without the need for locks.

[0102] In one possible implementation of this disclosure, when performing range query reorganization, the following methods can also be used, but are not limited to: initializing the inserted element to obtain a first insertion range, wherein performing range query reorganization on the first index result includes at least performing insertion sorting on the first index structure, the inserted element being the element in the first index structure that has undergone insertion sorting, and the first offset range being the offset search range corresponding to the inserted element; traversing the unlocked linked list and comparing the inserted element with the first element to be compared to obtain a comparison result; wherein the first element to be compared is the first element to be compared traversed from the unlocked linked list, and the element to be compared is an element of insertion and / or deletion type in the unlocked linked list; if it is determined from the comparison result that the inserted element is the same as the first element to be compared, the first offset range is converged according to the first element to be compared. The process involves several steps: First, the first offset range is the search range for the offset corresponding to the first element to be compared. If, based on the comparison results, the inserted element is greater than the first element to be compared, and the starting value of the first offset range is less than the starting value of the first offset range, the starting value of the first offset range is set to the starting value of the first offset range, thus obtaining the second offset range. Second, if, based on the comparison results, the inserted element is less than the first element to be compared, and the ending value of the first offset range is less than the ending value of the first offset range, the ending value of the first offset range is set to the ending value of the first offset range, thus obtaining the second offset range. This process continues until the inserted element is compared with all elements to be compared, resulting in the target offset range corresponding to the inserted element. The inserted element is then inserted within the target offset range to obtain the initial index structure.

[0103] In the embodiments of the present disclosure, an ordered lock-free B+ tree structure is used to process range queries. Since the lock granularity of the lock-free B+ tree structure is relatively coarse, a read lock is added for select, and a write lock is added for insert / delete. Therefore, a delta-node (lock-free linked list) is added between the leaf nodes and tree nodes of the lock-free B+ tree structure to record incremental operations, as shown in FIG. 13. FIG. 13 is a schematic structural diagram of a lock-free B+ tree structure provided by the embodiments of the present disclosure.

[0104] Among them, the unordered data of the lock-free self-organizing linked list is sorted into an ordered data lock-free B+ tree structure, and the modification of the new data is placed in the lock-free linked list (delta-node), avoiding locking the lock-free B+ tree structure. When querying, the delta-node is traversed first. During the range query reorganization process, it includes: sorting key-value data and retraining the model. To reduce the time complexity of sorting, the Offset attribute of the delta-node is used for insertion sorting to find the insertion position of the inserted element in the leaf node data. When the thread traverses the delta-node to initialize the binary search range of the key (inserted element) as [start, end] (the first insertion range), during the traversal process, each time an insert / delete type key-value key' (element to be compared) is scanned, then key is compared with key'. If they are equal, the offset search range is converged to [Offset1, Offset2] (the first offset range). If Offset 1>start and key>key', then start is set to Offset1. Otherwise, if Offset2<end and key<key', then end is set to Offset2. Exemplarily, for the execution process of the inserted element, the embodiments of the present disclosure provide a schematic diagram of the principle of element insertion, as shown in FIG. 14. The insertion key value with key = 7 is placed at the position with an offset of 3 in the array until the traversal of the delta-node ends. The running time is O(logn), compared with other sorting algorithms O(n) and O(nlogn), optimizing the partial time complexity of grouped data reorganization.

[0105] In one possible implementation of this disclosure, when indexing a target element, the following methods can also be used, but are not limited to: searching for the target element in the target index structure to determine the leaf node data to be indexed; wherein, the leaf node data to be indexed is the leaf node data containing the target element in the target index structure; traversing the unlocked linked list to be indexed to determine the element type of the target element; wherein, the unlocked linked list to be indexed is the unlocked linked list contained in the leaf node data to be indexed; if the target element is determined to be a non-unique element based on the element type, performing data analysis processing on the unlocked linked list to be indexed to obtain a third dataset and a fourth dataset; wherein, the third dataset includes the existing data of the unlocked linked list to be indexed after insertion and / or deletion operations, and the fourth dataset includes the data deleted from the unlocked linked list to be indexed after insertion and / or deletion operations; performing data analysis processing on the target element based on the third dataset and the fourth dataset to obtain the unique key value corresponding to the target element; inputting the unique key value into the first model to be indexed for position prediction processing to obtain the target position information, wherein, the first model to be indexed is the first model corresponding to the leaf node data to be indexed.

[0106] For example, but not limited to: if the target element is determined to be a unique element based on the element type, the target element is input into the first model to be indexed for location prediction processing to obtain the target location information.

[0107] In this embodiment of the disclosure, the delta-node (lockless linked list) corresponding to the group to which the target element belongs is traversed to find the target element; the linear model (first model) corresponding to the group to which the target element belongs is used to obtain the predicted position, and the target position information is obtained.

[0108] When traversing the delta-nodes corresponding to the groups to which the target element belongs, the search proceeds from top to bottom. If the target element is determined to be a non-unique element, a non-unique index is executed through the target index structure. Regarding unique indexes, this disclosure provides a schematic diagram of the principle of non-unique index processing, as shown in Figure 15. When traversing the delta-nodes, two datasets L are output through data processing. exist and L delete L exist Stored as a key-value pair, L delete Save the deleted key-value pairs. It should be noted that the first and third datasets are of the same type and can both be accessed via L. exist This indicates that the second and fourth datasets are of the same type and can both use L. delete For example, an algorithm for handling non-unique indexes can be represented by, but is not limited to, the following algorithms:

[0109] Non-unique indexes ensure the integrity of the result set when querying using non-unique elements.

[0110] If the target element is determined to be unique, then the position of the target element in the group is found. When creating the group, the error between the position of the element to be indexed stored by the FSW algorithm and the position of the target element determined by the linear model does not exceed a constant. In order to calculate the approximate position of the target element in the leaf node to be indexed, the target element is subtracted from the first element to be indexed in the leaf node to be indexed. Then, the difference is multiplied by the slope of the first model to be indexed, Groupslope. For example, it can be expressed by formulas (2) and (3): pred_pos=(k-start_key)×Group.slope Formula (2) true_pos∈[pred_pos-err,pred_pos+err] Formula (3)

[0111] In this way, after the position of the interpolated element, it can be ensured that the actual position of key_pos is within the error threshold. The time complexity of grouping and searching for the key is finite, and the running time of locating the key is O(log2(error)), where error is a constant.

[0112] In summary, the embodiments disclosed herein can achieve the following technical effects:

[0113] 1. The embodiments of this disclosure divide the data to be indexed into different underlying data groups, namely leaf node data. Each group corresponds to a linear model prediction, namely the first model. Lock-free linked lists are added inside the groups as multi-threaded lock-free structures. Finally, a target index structure based on a preset tree structure is constructed. Indexing is performed through the target index structure, which can perform multi-threaded high-concurrency access, support multi-threaded querying and insertion updates, and support efficient multi-threaded load.

[0114] 2. The embodiments of this disclosure implement a lock-free structure through a learning index, which supports multi-threaded high-concurrency access. Compared with locks, it reduces lock waiting time and improves high-concurrency performance.

[0115] 3. The embodiments of this disclosure pre-allocate memory and store data from high address to low address, which is consistent with the linear memory access method from low to high. This fully utilizes the memory hardware prefetch function of modern CPUs, improves the performance of multi-threaded delta-node linked list read and write, and only requires binary search to traverse the delta-node linked list to locate new data, avoiding traversing the entire key-value array for writing.

[0116] 4. The embodiments of this disclosure support multi-threaded non-blocking through lock-free self-organizing linked lists, efficiently process cached data, and improve the efficiency of hot data access based on LFU & LRU load awareness.

[0117] 5. The embodiments of this disclosure ensure the integrity of the result set when querying using non-unique elements through non-unique indexes.

[0118] Corresponding to the lock-free learning indexing method described above, this invention also proposes a lock-free learning indexing apparatus. Since the apparatus embodiments of this invention correspond to the method embodiments described above, details not disclosed in the apparatus embodiments can be referred to in the method embodiments described above, and will not be repeated here.

[0119] Figure 16 is a schematic diagram of a lock-free learning indexing device provided in an embodiment of this disclosure. As shown in Figure 16, it includes: a partitioning unit 161, configured to perform grouping space partitioning processing on the data to be indexed to obtain multiple leaf node data; a training unit 162, configured to perform model training processing on each of the leaf node data to obtain a first model corresponding to each of the leaf node data; wherein, each leaf node data includes at least multiple elements to be indexed and position information corresponding to each of the elements to be indexed; and an allocation unit 163, configured to perform memory pre-allocation processing on each of the first models to obtain a storage space corresponding to each of the first models, and store the data in the storage space. The corresponding lock-free linked list; wherein the lock-free linked list is a single-linked lock-free linked list composed of incremental modifications to the corresponding leaf node data; the optimization unit 164 is configured to determine the remaining space corresponding to each of the storage spaces, reorganize and optimize the leaf node data corresponding to the storage space with the remaining space less than a preset space threshold to obtain an initial index structure, and connect the initial index structure to a preset tree structure according to the lock-free linked list to obtain a target index structure; the indexing unit 165 is configured to index the target element through the target index structure to obtain the target position information of the target element in the data to be indexed, wherein the target element is any element among the elements to be indexed.

[0120] The lock-free learning indexing device disclosed herein divides the data to be indexed into groups and spaces to obtain multiple leaf node data. Model training is then performed on each leaf node data to obtain a first model corresponding to each leaf node data. Each leaf node data includes at least multiple elements to be indexed and their respective position information. Memory pre-allocation is performed on each first model to obtain a corresponding storage space, and a corresponding lock-free linked list is stored in the storage space. The lock-free linked list is a single-linked lock-free linked list composed of incremental modifications to the corresponding leaf node data. The remaining space corresponding to each storage space is determined, and the leaf node data corresponding to storage spaces with remaining space less than a preset space threshold are reorganized and optimized to obtain an initial index structure. The initial index structure is then connected to a preset tree structure based on the lock-free linked list to obtain a target index structure. The target element is indexed using the target index structure to obtain the target position information of the target element in the data to be indexed, where the target element is any element among the elements to be indexed. Compared with related technologies, the embodiments of this disclosure divide the data to be indexed into different underlying data groups, namely leaf node data. Each group corresponds to a linear model prediction, namely the first model. Lock-free linked lists are added inside the groups as multi-threaded lock-free structures. Finally, a target index structure based on a preset tree structure is constructed, and indexing is performed through the target index structure. This enables multi-threaded high-concurrency access, supports multi-threaded querying and insertion / update, and supports efficient multi-threaded load.

[0121] For example, in one possible implementation of this disclosure embodiment, as shown in FIG17, the partitioning unit 161 includes: a construction module 1611, configured to perform coordinate point construction processing based on each element to be indexed in the data to be indexed and the position information corresponding to each element to be indexed, to obtain a target data point corresponding to each element to be indexed; a determination module 1612, configured to determine a first upper boundary and a first lower boundary of error for a first data point based on a preset error, and determine an error range based on the first upper boundary and the first lower boundary of error, to obtain a first error region corresponding to the first data point; wherein, the first data point is any target data point; the determination module 1612 is further configured to determine a second upper boundary and a second lower boundary of error for a second data point based on the preset error, and perform an error range determination based on the second upper boundary and the second lower boundary of error. The row error range is determined to obtain the second error region corresponding to the second data point; wherein, the second data point is the next adjacent target data point of the first data point; the partitioning module 1613 is configured to partition the element to be indexed corresponding to the first data point and the element to be indexed corresponding to the second data point into first leaf node data when there is an overlapping area between the first error region and the second error region; the partitioning module 1613 is further configured to partition the element to be indexed corresponding to the first data point into first leaf node data and the element to be indexed corresponding to the second data point into second leaf node data when there is no overlapping area between the first error region and the second error region, and to use the second data point as the starting data point of the second leaf node data; repeat the above steps until all the data to be indexed is grouped to obtain the plurality of leaf node data.

[0122] Exemplarily, in one possible implementation of this disclosure embodiment, as shown in FIG17, the training unit 162 includes:

[0123] Prediction module 1621 is configured to perform position prediction processing on the elements to be indexed using a preset linear model to obtain the predicted position information corresponding to each element to be indexed.

[0124] Analysis module 1622 is configured to perform error analysis based on the location information and the predicted location information to obtain the prediction error corresponding to each of the elements to be indexed.

[0125] The optimization module 1623 is configured to perform model optimization processing on the preset linear model based on the prediction error, the element to be indexed, and the position information, so as to obtain the first model corresponding to each leaf node data.

[0126] Exemplarily, in one possible implementation of this disclosure embodiment, as shown in FIG17, the optimization unit 164 includes:

[0127] Processing module 1641 is configured to process the lockless linked list corresponding to the target leaf node data to obtain a first dataset and a second dataset; wherein, the target leaf node data is the leaf node data corresponding to the storage space where the remaining space is less than the preset space threshold;

[0128] The construction module 1642 is configured to construct a lock-free self-organizing linked list and a lock-free B+ tree in the target buffer corresponding to the target leaf node data based on the first dataset; wherein, the first dataset includes the existing data of the lock-free linked list after insertion and / or deletion operations, and the second dataset includes the data deleted from the lock-free linked list after insertion and / or deletion operations;

[0129] The reorganization module 1643 is configured to perform point query reorganization processing on the target leaf node data through the lock-free self-organizing linked list to obtain a first index structure, and perform range query reorganization processing on the first index structure through the lock-free B+ tree to obtain the initial index structure.

[0130] Exemplarily, in one possible implementation of this disclosure embodiment, as shown in FIG17, the recombination module 1643 is further configured as follows:

[0131] The window is divided into a first window by performing window partitioning in the lockless self-organizing linked list, and the first window is further divided into regions by a preset ratio to obtain a first region and a second region.

[0132] The near-term data is stored in the first region; wherein, the near-term data is the most recently accessed data that accesses the target leaf node data;

[0133] The data storage records in the first window are monitored to obtain monitoring results. If it is determined from the monitoring results that there are added records in the data storage records, the frequency data is numerically calculated and processed, and the processed frequency data is stored in the second area; wherein, the frequency data is the access frequency of accessing the target leaf node data;

[0134] The first index structure includes at least the first region and the second region.

[0135] Exemplarily, in one possible implementation of this disclosure embodiment, as shown in FIG17, the recombination module 1643 is further configured as follows:

[0136] The insertion element is initialized to obtain a first insertion range. The range query and reorganization process of the first index result includes at least the insertion sort process of the first index structure. The insertion element is the element of the first index structure that has been inserted and sorted. The first offset range is the offset search range corresponding to the insertion element.

[0137] Traverse the unlocked linked list and compare the inserted element with the first element to be compared to obtain the comparison result; wherein, the first element to be compared is the first element to be compared traversed from the unlocked linked list, and the element to be compared is an element of insertion and / or deletion type in the unlocked linked list;

[0138] If the inserted element is determined to be the same as the first element to be compared based on the comparison result, the first offset range is converged based on the first comparison range to obtain the second offset range; wherein, the first comparison range is the offset search range corresponding to the first element to be compared.

[0139] If, based on the comparison result, it is determined that the inserted element is greater than the first element to be compared, and the starting value of the first offset range is less than the starting value of the first range to be compared, the starting value of the first offset range is set to the starting value of the first range to be compared, thus obtaining the second offset range.

[0140] If, based on the comparison result, it is determined that the inserted element is smaller than the first element to be compared, and the end value of the first offset range is smaller than the end value of the first range to be compared, the end value of the first offset range is set to the end value of the first range to be compared, thus obtaining the second offset range.

[0141] The process continues until the inserted element is compared with all the elements to be compared, the target offset range corresponding to the inserted element is obtained, and the inserted element is inserted within the target offset range to obtain the initial index structure.

[0142] Exemplarily, in one possible implementation of this disclosure embodiment, as shown in FIG17, the indexing unit 165 includes:

[0143] Search module 1651 is configured to perform search processing on the target element in the target index structure to determine the leaf node data to be indexed; wherein, the leaf node data to be indexed is the leaf node data containing the target element in the target index structure;

[0144] Traversal module 1652 is configured to traverse the lock-free linked list to be indexed to determine the element type of the target element; wherein, the lock-free linked list to be indexed is the lock-free linked list contained in the leaf node data to be indexed;

[0145] Analysis module 1653 is configured to perform data analysis processing on the lockless linked list to be indexed when the target element is determined to be a non-unique element based on the element type, to obtain a third dataset and a fourth dataset; wherein, the third dataset includes the existing data of the lockless linked list to be indexed after insertion and / or deletion operations, and the fourth dataset includes the data deleted from the lockless linked list to be indexed after insertion and / or deletion operations;

[0146] The analysis module 1653 is further configured to perform data analysis processing on the target element based on the third dataset and the fourth dataset to obtain the unique key value corresponding to the target element.

[0147] The prediction module 1654 is configured to input the unique key value into the first model to be indexed for location prediction processing to obtain the target location information, wherein the first model to be indexed is the first model corresponding to the leaf node data to be indexed.

[0148] For example, in one possible implementation of this disclosure embodiment, as shown in FIG17, the indexing unit 165 is further configured to, when the target element is determined to be a unique element according to the element type, input the target element into the first model to be indexed for position prediction processing to obtain the target position information.

[0149] It should be noted that the foregoing explanations of the method embodiments also apply to the apparatus of the embodiments of this disclosure, and the principles are the same; therefore, the embodiments of this disclosure are not limited thereto. According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product. FIG18 shows a schematic block diagram of an example electronic device 1800 that can be used to implement embodiments of this disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the disclosure described and / or claimed herein.

[0150] As shown in Figure 18, device 1800 includes a computing unit 1801, which can perform various appropriate actions and processes according to a computer program stored in ROM (Read-Only Memory) 1802 or loaded from storage unit 1808 into RAM (Random Access Memory) 1803. RAM 1803 can also store various programs and data required for the operation of device 1800. The computing unit 1801, ROM 1802, and RAM 1803 are interconnected via bus 1804. I / O (Input / Output) interface 1805 is also connected to bus 1804.

[0151] Multiple components in device 1800 are connected to I / O interface 1805, including: input unit 1806, such as keyboard, mouse, etc.; output unit 1807, such as various types of monitors, speakers, etc.; storage unit 1808, such as disk, optical disk, etc.; and communication unit 1809, such as network card, modem, wireless transceiver, etc. Communication unit 1809 allows device 1800 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0152] The computing unit 1801 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1801 include, but are not limited to, CPUs, GPUs (Graphics Processing Units), various special-purpose AI (Artificial Intelligence) computing chips, various computing units running machine learning model algorithms, DSPs (Digital Signal Processors), and any suitable processor, controller, microcontroller, etc. The computing unit 1801 performs the various methods and processes described above, such as the lock-free learning indexing method. For example, in some embodiments, the lock-free learning indexing method can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 1808. In some embodiments, part or all of the computer program can be loaded and / or installed on device 1800 via ROM 1802 and / or communication unit 1809. When the computer program is loaded into RAM 1803 and executed by the computing unit 1801, one or more steps of the methods described above can be performed. Alternatively, in other embodiments, the computing unit 1801 may be configured to perform the aforementioned lock-free learning indexing method by any other suitable means (e.g., by means of firmware).

[0153] Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, FPGAs (Field Programmable Gate Arrays), ASICs (Application-Specific Integrated Circuits), ASSPs (Application-Specific Standard Products), SOCs (System-on-Chips), CPLDs (Complex Programmable Logic Devices), computer hardware, firmware, software, and / or combinations thereof. These various implementations may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0154] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0155] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, RAM, ROM, EPROM (Electrically Programmable Read-Only Memory) or flash memory, optical fiber, CD-ROM (Compact Disc Read-Only Memory), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0156] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (Cathode-Ray Tube) or LCD (Liquid Crystal Display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0157] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include LANs (Local Area Networks), WANs (Wide Area Networks), the Internet, and blockchain networks.

[0158] Computer systems can include clients and servers. Clients and servers are generally geographically separated and typically interact via communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. A server can be a cloud server, also known as a cloud computing server or cloud host, a host product within the cloud computing service system, addressing the shortcomings of traditional physical hosts and VPS (Virtual Private Server) services, such as high management difficulty and weak business scalability. Servers can also be servers in distributed systems or servers integrated with blockchain. It's important to note that artificial intelligence (AI) is the study of enabling computers to simulate certain human thought processes and intelligent behaviors (such as learning, reasoning, thinking, and planning), encompassing both hardware and software technologies. AI hardware technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, and big data processing; AI software technologies mainly include computer vision, speech recognition, natural language processing, machine learning / deep learning, big data processing, and knowledge graph technologies.

[0159] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0160] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A lock-free learning indexing method, comprising: The data to be indexed is divided into groups to obtain multiple leaf node data. Model training is then performed on each leaf node data to obtain a first model corresponding to each leaf node data. Each leaf node data includes at least multiple elements to be indexed and their respective position information. Memory pre-allocation is performed on each first model to obtain a corresponding storage space, and a corresponding unlocked linked list is stored in the storage space. The unlocked linked list is a single-linked unlocked linked list composed of incremental modifications to the corresponding leaf node data. The remaining space corresponding to each storage space is determined, and the leaf node data corresponding to storage spaces with remaining space less than a preset space threshold are reorganized and optimized to obtain an initial index structure. The initial index structure is then connected to a preset tree structure based on the unlocked linked list to obtain a target index structure. The target element is indexed using the target index structure to obtain the target position information of the target element in the data to be indexed, where the target element is any element among the elements to be indexed.

2. The method according to claim 1, wherein, The process of grouping and partitioning the data to be indexed to obtain multiple leaf node data includes: Based on each element to be indexed in the data to be indexed and its corresponding position information, coordinate points are constructed to obtain the target data points corresponding to each element to be indexed. The first upper boundary and the first lower boundary of the error of the first data point are determined according to the preset error, and the error range is determined according to the first upper boundary and the first lower boundary of the error to obtain the first error region corresponding to the first data point; wherein, the first data point is any target data point; The second upper boundary and the second lower boundary of the error of the second data point are determined according to the preset error, and the error range is determined according to the second upper boundary and the second lower boundary of the error to obtain the second error region corresponding to the second data point; wherein, the second data point is the next adjacent target data point of the first data point; In the case where there is an overlap between the first error region and the second error region, the indexable element corresponding to the first data point and the indexable element corresponding to the second data point are divided into first leaf node data. When there is no overlap between the first error region and the second error region, the element to be indexed corresponding to the first data point is divided into first leaf node data, the element to be indexed corresponding to the second data point is divided into second leaf node data, and the second data point is used as the starting data point of the second leaf node data. Repeat the above steps until all the data to be indexed is grouped, resulting in the multiple leaf node data.

3. The method according to claim 1, wherein, The step of training the model based on the data of each leaf node to obtain the first model corresponding to each leaf node data includes: The location prediction process of the elements to be indexed is performed by a preset linear model to obtain the predicted location information corresponding to each element to be indexed. Based on the location information and the predicted location information, error analysis is performed to obtain the prediction error corresponding to each element to be indexed. The preset linear model is optimized based on the prediction error, the element to be indexed, and the location information to obtain the first model corresponding to each leaf node data.

4. The method according to claim 1, wherein, The step of reorganizing and optimizing the leaf node data corresponding to the storage space with remaining space less than a preset space threshold to obtain the initial index structure includes: The lockless linked list corresponding to the target leaf node data is processed to obtain a first dataset and a second dataset; wherein, the target leaf node data is the leaf node data corresponding to the storage space where the remaining space is less than the preset space threshold; Based on the first dataset, a lock-free self-organizing linked list and a lock-free B+ tree are constructed in the target buffer corresponding to the target leaf node data; wherein, the first dataset includes the existing data of the lock-free linked list after insertion and / or deletion operations, and the second dataset includes the data deleted from the lock-free linked list after insertion and / or deletion operations; The target leaf node data is reorganized by point query using the lock-free self-organizing linked list to obtain the first index structure. The first index structure is then reorganized by range query using the lock-free B+ tree to obtain the initial index structure.

5. The method according to claim 4, wherein, The step of performing point query reorganization on the target leaf node data through the lock-free self-organizing linked list to obtain the first index structure includes: The window is divided into a first window by performing window partitioning in the lockless self-organizing linked list, and the first window is further divided into regions by a preset ratio to obtain a first region and a second region. The near-term data is stored in the first region; wherein, the near-term data is the most recently accessed data that accesses the target leaf node data; The data storage records in the first window are monitored to obtain monitoring results. If it is determined from the monitoring results that there are added records in the data storage records, the frequency data is numerically calculated and processed, and the processed frequency data is stored in the second area; wherein, the frequency data is the access frequency of accessing the target leaf node data; The first index structure includes at least the first region and the second region.

6. The method according to claim 4, wherein, The step of performing range query reorganization on the first index structure using the lock-free B+ tree to obtain the initial index structure includes: The insertion element is initialized to obtain a first insertion range. The range query and reorganization process of the first index result includes at least the insertion sort process of the first index structure. The insertion element is the element of the first index structure that has been inserted and sorted. The first offset range is the offset search range corresponding to the insertion element. Traverse the unlocked linked list and compare the inserted element with the first element to be compared to obtain the comparison result; wherein, the first element to be compared is the first element to be compared traversed from the unlocked linked list, and the element to be compared is an element of insertion and / or deletion type in the unlocked linked list; If the inserted element is determined to be the same as the first element to be compared based on the comparison result, the first offset range is converged based on the first comparison range to obtain the second offset range; wherein, the first comparison range is the offset search range corresponding to the first element to be compared. If, based on the comparison result, it is determined that the inserted element is greater than the first element to be compared, and the starting value of the first offset range is less than the starting value of the first range to be compared, the starting value of the first offset range is set to the starting value of the first range to be compared, thus obtaining the second offset range. If, based on the comparison result, it is determined that the inserted element is smaller than the first element to be compared, and the end value of the first offset range is smaller than the end value of the first range to be compared, the end value of the first offset range is set to the end value of the first range to be compared, thus obtaining the second offset range. The process continues until the inserted element is compared with all the elements to be compared, the target offset range corresponding to the inserted element is obtained, and the inserted element is inserted within the target offset range to obtain the initial index structure.

7. The method according to claim 1, wherein, The step of indexing the target element through the target index structure to obtain the target position information of the target element in the data to be indexed includes: performing a search process on the target element in the target index structure to determine the leaf node data to be indexed; wherein, the leaf node data to be indexed is the leaf node data containing the target element in the target index structure; traversing the unlocked linked list to be indexed to determine the element type of the target element; wherein, the unlocked linked list to be indexed is the unlocked linked list contained in the leaf node data to be indexed; if the target element is determined to be a non-unique element based on the element type, the unlocked linked list to be indexed is... The table undergoes data analysis and processing to obtain a third dataset and a fourth dataset. The third dataset includes the existing data of the lockless linked list to be indexed after insertion and / or deletion operations, and the fourth dataset includes the deleted data of the lockless linked list to be indexed after insertion and / or deletion operations. Based on the third and fourth datasets, the target element is analyzed and processed to obtain a unique key value corresponding to the target element. The unique key value is input into the first model to be indexed for position prediction processing to obtain the target position information. The first model to be indexed is the first model corresponding to the leaf node data to be indexed.

8. The method according to claim 7, wherein, After traversing the lockless linked list to be indexed to determine the element type of the target element, the method further includes: if the target element is determined to be a unique element based on the element type, inputting the target element into the first model to be indexed for position prediction processing to obtain the target position information.

9. A lock-free learning indexing device, comprising: The partitioning unit is configured to group and partition the data to be indexed into multiple leaf node data. The training unit is configured to perform model training processing based on each leaf node data to obtain a first model corresponding to each leaf node data; wherein each leaf node data includes at least multiple elements to be indexed and position information corresponding to each element to be indexed; the allocation unit is configured to perform memory pre-allocation processing on each first model to obtain a storage space corresponding to each first model, and store a corresponding unlocked linked list in the storage space; wherein the unlocked linked list is a single-linked unlocked linked list composed of incremental modifications based on the corresponding leaf node data; the optimization unit is configured to determine the remaining space corresponding to each storage space, reorganize and optimize the leaf node data corresponding to the storage space with the remaining space less than a preset space threshold to obtain an initial index structure, and connect the initial index structure to a preset tree structure according to the unlocked linked list to obtain a target index structure; the indexing unit is configured to index the target element through the target index structure to obtain the target position information of the target element in the data to be indexed, wherein the target element is any element among the elements to be indexed.

10. An electronic device, comprising: At least one processor; And a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

11. A computer-readable storage medium storing computer instructions configured to cause a computer to perform the method of any one of claims 1-8.

12. A computer-readable storage medium comprising a computer program that, when executed by a processor, implements the method as described in any one of claims 1-8.