Indexing method and device based on key value pair KV system, electronic equipment and medium
A key-value pair and index technology, applied in the computer field, can solve problems such as low index efficiency, and achieve the effect of improving reading speed, improving reading efficiency, and saving memory resources
Pending Publication Date: 2020-06-05
BEIJING BAIDU NETCOM SCI & TECH CO LTD
7 Cites 5 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0004] The embodiment of the present application provides an indexing method, device, electronic device and medium based on a key-value pair K...
Method used
The technical scheme provided by the embodiment of the present application reduces the resource occupation of the system memory by using the obtained key value as the sequence number of the fixed-length type index; by determining the address information of the storage location where the value is located according to the sequence number, and then The reading of the numerical value is completed, and the reading speed and efficiency of the numerical value are improved.
The technical scheme that the embodiment of the present application provides, by setting up two-level index, and determine primary index sequence number and secondary index number, according to primary index sequence number and secondary index number, determine to be queried in secondary index area The address information of the value in the key-value pair avoids the problem of excessive memory resource occupation caused by only using the primary index to read the value in the key-value pair to be queried when the amount of data is too large. The technical solution of this embodiment is especially applicable to the KV system of the non-fixed-length storage space.
[0097] By using the KV system provided by the embodiment of the present application to store the point adjacency table of the graph database, the problem of low indexing efficiency caused by excessive data volume in the graph database can be avoided. Certainly, the technical solutions of the embodiments of the present a...
Abstract
The invention discloses an indexing method and device based on a key value pair KV system, electronic equipment and a medium, and relates to the technical field of indexing. According to the specificimplementation scheme, the method comprises the steps: acquiring key values in key value pairs to be inquired, wherein the key values of the key value pairs in the KV system are progressively increased integer numbers; taking the key value as a serial number of a fixed-length type index, and determining address information of a storage position where a numerical value in the to-be-queried key value pair is located according to the serial number; and reading a numerical value in the to-be-queried key value pair from a storage space according to the address information. According to the method,the obtained key values are used as the serial number of the fixed-length type index, so that the resource occupation amount of a system memory is reduced; by determining the address information of the storage position where the numerical value is located according to the serial number, the numerical value reading is completed, the numerical value reading speed is increased, and the numerical value reading efficiency is improved.
Application Domain
Special data processing applicationsDatabase indexing
Technology Topic
Numeric ValueEngineering +5
Image
Examples
- Experimental program(4)
Example Embodiment
[0064] Example one
[0065] Figure 1A This is a schematic flowchart of a method for indexing a KV system based on key-value pairs provided in the first embodiment of this application. This embodiment is applicable to the case of querying the corresponding value based on the KV system through the key value, and can be executed by the KV system-based indexing device provided in the embodiment of the present application, and the device can be implemented in software and/or hardware. Such as Figure 1A As shown, the method can include:
[0066] S101. Obtain a key value in a key-value pair to be queried, wherein the key value of each key-value pair in the KV system is an increasing integer number.
[0067] Among them, the key-value pair is composed of the key-value Key and the numerical value, which are used to represent the corresponding relationship between the numerical value and the key-value. A key-value can correspond to one numerical value or multiple numerical values. For example, the key value is "age". The corresponding value can include "20 years old", "30 years old" or "40 years old", etc.; for another example, if the key value is "education", the corresponding value can include "specialty", "undergraduate" or "graduate".
[0068] Specifically, the key value of each key-value pair in the KV system of this embodiment is an incremental integer number, where the integer number represents numeric data that does not contain a decimal part, that is, an integer. For example, the key value in this embodiment It can be expressed as: "1", "2", "3"..."N"; for example, in this embodiment, the key value can be expressed as: "001", "002", "003"..."NNN".
[0069] By obtaining the key value in the key-value pair to be queried, the data foundation is laid for the subsequent obtaining of the address information of the storage location of the value based on the key value. The key-value pairs to be queried can be determined through various index requests and query requests, and are generally index requests determined by the physical layer KV system. For example, when the value of the 1-100th key-value pair needs to be queried, then 100 index requests can be determined, and each index request determines that there is a key-value pair to be queried.
[0070] S102. Use the key value as the serial number of the fixed-length type index, and determine the address information of the storage location where the value in the key-value pair to be queried is located according to the serial number.
[0071] Specifically, the key value itself serves as an element of the index, that is, the serial number of the fixed-length type index. The fixed-length type index can be implemented by the index area, or by sorting the fixed-length storage space. The following will introduce each situation separately. Based on the index number, the information of the storage location address of the value can be queried or calculated, as the address information.
[0072] For example, when a fixed-length type index is implemented in an index area, it means that each element in the index area occupies a certain byte length, for example, each element space in the index area occupies 4 bytes, or each element space in the index area Occupies 8 bytes. Each element space in the index area stores address information of the storage location of at least one value in the key-value pair to be queried, and the address information may be an offset address of the storage location of the value.
[0073] Specifically, the key value is used as the serial number of the fixed-length type index, for example, the key value "1" is used as the serial number of the first element space in the index area, and the key value "2" is used as the second element space in the index area. The serial number. The sequence number can be directly located to the element space in the index area. According to the serial number of the fixed-length type index, the address information of the storage location of the value in the key-value pair to be queried is determined.
[0074] Optionally, if the value of the KV system is stored in a non-fixed length type storage space, S102 includes:
[0075] Use the key value as the serial number of the fixed-length type index; in the fixed-length index area, read the offset address of the storage location of the value in the key-value pair to be queried from the element space corresponding to the serial number, As the address information.
[0076] Specific, such as Figure 1B As shown, each element space in the index area stores an offset address (offset) of the storage location of at least one value in the key-value pair to be queried. According to the serial number of the fixed-length type index obtained, the fixed-length type index The element space corresponding to the sequence number is determined in the area, and the element space is then accessed to read the offset address of the storage location of the value in the key-value pair to be queried as the address information. The offset address is the offset address of the storage location of the value, and the value can be stored in a data file.
[0077] Exemplarily, if the key value of the key-value pair to be queried is "1", then the key value "1" is used as the serial number 1 of the fixed-length type index. In the fixed-length index area, it is determined that the element space corresponding to the serial number 1 is The first element space in the index area is then accessed, and the offset address of the storage location of the value in the key-value pair to be queried stored in the element space is read as the address information.
[0078] Exemplarily, if the key value of the key-value pair to be queried is "2", then the key value "2" is used as the serial number 2 of the fixed-length type index. In the fixed-length index area, it is determined that the element space corresponding to the serial number 2 is The second element space in the index area is then accessed, and the offset address of the storage location of the value in the key-value pair to be queried stored in the element space is read as the address information.
[0079] If the value of the KV system is stored in the non-fixed-length type storage space, the key value is used as the serial number of the fixed-length type index, and from the fixed-length type index area, the element space is determined according to the serial number and the address of the value is read from it Information, so that the KV system does not need to store key values in memory, saving memory space, and making the KV system do not need to query key values in memory, thereby speeding up the indexing time and efficiency.
[0080] In another indexing method, the address information can also be calculated and determined by the serial number, as follows:
[0081] Optionally, if the value of the KV system is stored in a fixed-length type storage space, S102 includes:
[0082] Use the key value as the serial number of the fixed-length type index; according to the serial number and the length of the storage unit in the fixed-length type storage space, calculate the offset address of the storage unit where the value in the key-value pair to be queried is located as the Address information.
[0083] Specific, such as Figure 1C As shown, when the value of the KV system is stored in a fixed-length type storage space (ie, a POD type storage space), the characteristic of this type of storage space is that the length of the storage unit is a fixed length. The fixed-length storage space includes several storage units for storing values, and the sort order of the values in the storage unit is the same as the sort order of the key values. The key value is used as the serial number, and the serial number of the storage unit where the value corresponding to the key value is located is determined according to the serial number, and then id*sizeof(value) is calculated to determine the offset address of the target storage unit as the address information. The above id is the sequence number, and the value is the byte length of the storage unit, such as 4 bytes or 8 bytes.
[0084] Exemplarily, assuming the key value is "100", the key value "100" is used as the serial number, and in the fixed-length storage space, "100*8 bytes" is used as the storage unit of the value in the key-value pair to be queried The offset address is the address information.
[0085] If the value of the KV system is stored in the fixed-length storage space, the key value is used as the index sequence number, and the offset address of the storage unit where the value is located is calculated according to the sequence number and the length of the storage unit in the fixed-length storage space As the address information, when the value is of fixed-length type, the KV system does not need to set the index area to obtain the address information of the value, which further reduces the amount of memory resources occupied, and does not need to be from the element space of the index area Reading the address information of the value improves the indexing speed and efficiency of the KV system accordingly.
[0086] On the basis of the foregoing embodiment, before reading the value in the key-value pair to be queried from the storage space according to the address information, the method further includes:
[0087] According to the address information or the serial number, query whether the corresponding storage unit has a value stored in the storage unit bitmap, and if so, continue to perform the value reading operation.
[0088] Specific, such as Figure 1C As shown, the KV system in this embodiment sets up a storage unit bitmap, and the bitmap is associated with the storage unit. When a value is stored in the storage unit, the bitmap can optionally be displayed as "1". When no value is stored, the bitmap can be optionally displayed as "0".
[0089] According to the address information or the serial number, when the corresponding storage unit has a value stored in the storage unit bitmap, the value reading operation is continued, which avoids the reading operation of the empty value and increases the efficiency of the value reading in the KV system .
[0090] S103. Read the value in the key-value pair to be queried from the storage space according to the address information.
[0091] The storage space includes, but is not limited to, disk space. The value in the key-value pair to be queried is specifically stored in a file block in the storage space. In addition to storing the value, the file block can also store information including size and update. Timestamp information (timestamp) and control bit information (control flag), etc., such as Figure 1B Shown.
[0092] Specifically, according to the address information obtained in S102, the corresponding file block is accessed from the storage space, and the value in the key-value pair to be queried is read from the file block.
[0093] By reading the value in the key-value pair to be queried from the storage space according to the address information, the technical effect of reading the value in the key-value pair to be queried from the storage space according to the key value in the key-value pair to be queried is realized.
[0094] The technical solution provided by the embodiment of the present application reduces the resource occupation of the system memory by using the obtained key value as the serial number of the fixed-length type index; by determining the address information of the storage location of the value according to the serial number, the value is completed Read, improve the reading speed and efficiency of reading the value.
[0095] On the basis of the above embodiment, the KV system can be used to store the point adjacency table of the graph database. The key value of each key-value pair is the point identifier of a point in the graph, and the value is the point adjacent to the point in the graph. Point identification.
[0096] Specifically, the point adjacency table of the graph database reflects the relationship between the image points in the graph database. The graph is composed of points, and the adjacency relationship between the points is an edge relationship. In order to record the adjacency relationship, the graph database needs to store some adjacency tables, such as Figure 1D Shown. The point adjacency table records the adjacent point of each point through key-value pairs, that is, the key value is the point identifier, and the value records the point identifier of the adjacent point of the point. The point identifier as the key value increases in sequential integer type. Exemplary, such as Figure 1D As shown, the point identifier is "1", which is used as the key value, and the point identifiers included in its adjacent points are respectively "100", "105, "107" and "110", then in the key-value pair, the key value is "1", the values are "100", "105, "107" and "110".
[0097] By using the KV system provided by the embodiment of the present application to store the point adjacency table of the graph database, the problem of low indexing efficiency caused by the excessive amount of data in the graph database can be avoided. Of course, the technical solutions of the embodiments of the present application can also be applied to various KV systems whose key values meet the condition of increasing integer type. For example, in the employee record table, the number of each employee is usually an incremental integer value, and the specific information of the employee is recorded as the value.
[0098] On the basis of the foregoing embodiment, it further includes:
[0099] When the value of the KV system is stored in the non-fixed-length storage space, if a data write request is received, an integer value is assigned to the data at the end of the KV system in order, as the value of the data The key value, the data is stored as a value; the offset address of the storage space where the value is located is added to the element space of the index area corresponding to the sequence number of the key value.
[0100] Specifically, when a data write request is received, the data is stored as a value at the end of the file block, and an integer value is sequentially assigned at the end of the index area as the key value of the newly added value, and finally The offset address of the newly added value is added to the element space corresponding to the sequence number of the key value in the index area.
[0101] By storing the data as a value at the end of the KV system when receiving a data write request, and assigning an integer value as the key value of the value, the offset address of the new value is finally added to the new value In the key-value element space, the writing of new data is realized, and the order of existing key-value pairs will not be disrupted to avoid index errors. At the same time, since data is written at the end, there is no need to reorder key-value pairs. , Improve the speed of data writing.
[0102] On the basis of the foregoing embodiment, it further includes:
[0103] When the value of the KV system is stored in the fixed-length storage space, when a data write request is received, an integer value is assigned to the data as the key value of the data, and the data is stored as the value. In the fixed-length storage space, in the storage unit corresponding to the key value.
[0104] Specifically, when the value of the KV system is of the POD type, by pre-determining the offset address of the corresponding value according to the key value, the storage space for storing the value is allocated in advance. When a new value is written, it directly corresponds to the new value The key value of, stores the new value in the corresponding storage space. For example, if the key value corresponding to the written value is "100", the new value will be stored in the 100th storage unit in the fixed-length storage space.
[0105] When a data write request is received, an integer value is assigned to the data as a key value, and the data is stored as a value in a fixed-length storage space. The storage unit corresponding to the key value is realized according to the key value and storage The preset relationship of space, the technical effect of numerical storage.
[0106] On the basis of the above-mentioned embodiment, the KV system in this embodiment also supports the deletion and modification of key-value pairs. For example, if the key-value pair is deleted, delete the key value in the logical layer of the KV system and the key value to be deleted in the physical layer. The right relationship makes the logical layer no longer query the key-value pairs to be deleted in the physical layer, and the KV system will release the disk regularly to completely delete the key-value pairs to be deleted in the physical layer; for example, if the key-value pairs are modified, then It is possible to delete the old key-value pair first, and then write the new key-value pair to achieve the effect of modifying the key-value pair.
[0107] By deleting or modifying the key-value pairs of the KV system, the KV system can always be updated, ensuring the accuracy and reliability of the index.
Example Embodiment
[0108] Example two
[0109] Figure 2A This is a schematic flowchart of a method for indexing a KV system based on key-value pairs provided in the second embodiment of this application. This embodiment provides a specific implementation for the foregoing embodiment, which is suitable for solving the problem of excessive memory usage due to the use of a primary index when the amount of data is too large in the prior art, such as Figure 2A As shown, the method can include:
[0110] S201. Obtain a key value in a key-value pair to be queried, where the key value of each key-value pair in the KV system is an increasing integer number.
[0111] S202. Use the key value as the primary index sequence number, and determine the secondary index sequence number of the index block corresponding to the sequence number range of the primary index sequence number, where each index block corresponds to a set number of primary index sequence numbers.
[0112] Exemplary, such as Figure 2B As shown, assuming that each index block corresponds to a primary index serial number of 10 million, that is, 10 million key values, the index block with a secondary index serial number of "1" corresponds to a primary index serial number of 0-10 million; The index block with the secondary index sequence number "2" corresponds to the primary index sequence number of 10 million to 20 million; the index block with the secondary index sequence number "3" corresponds to the primary index sequence number of 20 million to 30 million, and so on. If the primary index serial number obtained according to the key value is "12 million", since "12 million" is in the range of 10 million to 20 million, it is determined that the primary index serial number "12 million" corresponds to the secondary index serial number of the index block as " 2".
[0113] S203: According to the secondary index serial number, read the offset address of the storage location where the index block is located in the corresponding element space of the fixed-length primary index area.
[0114] Specific, such as Figure 2B As shown, in the first-level index area (range index), there are multiple element spaces, each element space corresponds to a different first-level index number range, and each element space stores the first-level index number of the element space The range corresponds to the offset address of the index block. According to the secondary index serial number obtained in S202, the element space corresponding to the secondary index serial number is determined in the primary index area, and the offset address of the storage location of the index block is read from it.
[0115] Exemplarily, assuming that the acquired secondary index sequence number is "2", in the primary index area, read the offset address of the index block with the secondary index sequence number "2" in the second element space.
[0116] S204: Locate the secondary index area of the index block from the storage space according to the offset address of the index block.
[0117] Among them, different index blocks correspond to different secondary index regions.
[0118] Specifically, according to the offset address of the index block, the storage area is located in the storage space as a secondary index area (index), which is generally indexed in the memory.
[0119] S205. In the fixed-length type secondary index area, read the offset address of the storage location of the value in the key-value pair to be queried from the element space corresponding to the primary index sequence number, as the address information.
[0120] Specific, such as Figure 2B As shown, in the secondary retrieval area, multiple element spaces are divided, and each element space stores an offset address where at least one value in the key-value pair to be queried is located. Exemplarily, suppose the primary index serial number, that is, the key value is "12 million", the secondary index serial number of the index block is "2", and each index block corresponds to the primary index serial number of 10 million, so "12 million" corresponds to The 2 millionth element space of index block "2", correspondingly read the storage location of the value in the key-value pair to be queried from the 2 millionth element space in the secondary index area of index block "2" As the address information.
[0121] Under normal circumstances, the offset address uses 8 bytes, but when the amount of data is too large, the memory space occupied by the offset address will be large. For example, an offset address of 1 billion data volume needs to occupy 8G of memory, and 10 billion data In order to solve the above problem, in this embodiment, the offset address stored in each element space in the secondary index area uses 4 bytes. , That is, split the 8-byte offset address into two 4-byte offset addresses, and store them in the element spaces of two adjacent secondary index areas. This can reduce the memory space occupied by the offset address, but because the 4-byte offset address can only access 4GB files, if the value in the key-value pair to be queried is too large, another 4GB file needs to be accessed.
[0122] S206: Read the value in the key-value pair to be queried from the storage space according to the address information.
[0123] Specifically, in this embodiment, the file block used to store the value in the key-value pair to be queried in the storage space is split into several 4GB file blocks to increase the utilization of multiple disks and prevent waste of resources. .
[0124] The technical solution provided by the embodiment of the present application establishes a two-level index and determines the primary index serial number and the secondary index number. According to the primary index serial number and the secondary index number, the key-value pair to be queried is determined in the secondary index area The address information of the medium value avoids the problem of excessive memory resource occupation by only using the primary index to read the value in the key-value pair to be queried when the amount of data is too large. The technical solution of this embodiment is particularly suitable for a KV system with non-fixed-length storage space.
Example Embodiment
[0125] Example three
[0126] Figure 3A This is a schematic flowchart of a method for indexing a KV system based on key-value pairs provided in the third embodiment of this application. This embodiment provides a specific implementation for the above embodiment. This embodiment is suitable for reading the value in the key-value pair to be queried when the value is of the POD type, such as Figure 3A As shown, the method can include:
[0127] S301. Obtain the key value in the key-value pair to be queried, where the key value of each key-value pair in the KV system is an increasing integer number.
[0128] S302. Use the key value as the primary index sequence number, and determine the secondary index sequence number of the index block corresponding to the sequence number range of the primary index sequence number, where each index block corresponds to a set number of primary index sequence numbers.
[0129] Specifically, in this embodiment, there is a one-to-one correspondence between the index block and the element space of the primary index area, and the file storage space corresponding to each index block is set in different disks to increase the resource utilization of each disk.
[0130] S303: According to the secondary index serial number, read the offset address of the file storage space corresponding to the index block in the corresponding element space of the fixed-length primary index area.
[0131] Exemplarily, assuming that the secondary index serial number of the index block is "2", the file corresponding to the index block with the secondary index serial number "2" is read in the second element space of the fixed-length primary index area The offset address of the storage space.
[0132] S304. In the fixed-length file storage space, calculate the offset of the storage unit where the value in the key-value pair to be queried is based on the first-level index serial number and the length of the storage unit in the fixed-length file storage space Address as the address information.
[0133] Specific, such as Figure 3B As shown, the calculation process of the offset address of the storage unit where the value in the key-value pair to be queried is located is: blocksize*blockid+block-inner-offset. Among them, blocksize is the key value range of the index block, such as 1000W, blockid is the current secondary index sequence number to be checked, such as 2, block-inner-offset is the key value, that is, the internal offset of the primary index sequence number in the index block The shift address, for example, the 1200W key value is in the 200W storage unit in the fixed-length file storage space of the second index block, and the internal offset address can refer to the foregoing embodiment, based on the storage unit The byte length and sequence number are calculated.
[0134] S305. Read the value in the key-value pair to be queried from the storage space according to the address information.
[0135] The technical solution provided by the embodiment of the present application uses the establishment of a two-level index to calculate the offset address of the storage unit where the value in the key-value pair to be queried is used when the file storage space is of fixed-length type, as the address information, which expands the data Capacity issues, and the concurrency of the index is improved, and because the file storage space corresponding to each index block is set in different disks, the resource utilization of each disk is increased.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Similar technology patents
Local service distribution method and device, electronic equipment and storage medium
Owner:DATANG MOBILE COMM EQUIP CO LTD
Video security access method and system based on port selective encryption
Owner:NARI INFORMATION & COMM TECH
FPGA-based fast data information sorting method and system, equipment and storage medium
Owner:NAT SPACE SCI CENT CAS
Sound processing method and device in virtual scene, equipment and storage medium
Owner:TENCENT TECH (SHENZHEN) CO LTD
Classification and recommendation of technical efficacy words
- Improve reading speed
- Reduce resource usage
Method for storing solid state hard disc and data
Owner:CHENGDU HUAWEI TECH
Design method of medical image cloud storage platform
ActiveCN108806773AReduce memory consumptionImprove reading speed
Owner:上海熙业信息科技有限公司
Adaptive reading optimization method and system for mass data under cloud storage environment
Owner:COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
A method and system for improving reading speed of cold storage data
PendingCN109344092AImprove reading speedImprove access speed
Owner:TIANJIN YIHUALU INFORMATION TECH
File packing and reading method based on Hash
ActiveCN102880677AImprove reading speed
Owner:ZHUHAI KINGSOFT ONLINE GAME TECH CO LTD +1
Method for realizing multi-screen playing video
Owner:GUANGDONG VTRON TECH CO LTD
Virtual machine cluster resource allocation and scheduling method
Owner:SICHUAN ZHONGDIAN AOSTAR INFORMATION TECHNOLOGIES CO LTD
Method and device for managing testing environment on the basis of container technology
Owner:BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1