Method for creating index based on abbreviated comparison of numeric data in Opengauss
By performing abbreviated comparisons and sorting of numeric data type tuples in the Opengauss database, the problem of low efficiency in creating numeric data type indexes is solved, and the index creation process is optimized and its speed is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING VASTDATA TECH
- Filing Date
- 2022-11-16
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, index creation for numeric data types is inefficient, especially in the Opengauss database. Extremely long numeric data results in inefficient character array comparison algorithms, affecting the index creation process.
A shortened comparison algorithm is used to convert numeric data type tuples into int type shortened keys, and then sorts and inserts them into the B-tree index using the shortened comparison algorithm, thus optimizing the index creation process.
By using a shortened comparison algorithm, the efficiency of creating numeric data type indexes is improved, the number of CPU operations is reduced, and the speed of index creation is increased.
Smart Images

Figure CN115687361B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of index creation technology, and in particular to a method for creating an index of numeric data in OpenGauss based on abbreviation comparison. Background Technology
[0002] Indexes are one of the most important functions of a database. Essentially, an index is a structure that sorts the values of one or more columns in a database table, allowing for quick access to specific information within the table. A primary purpose of indexes is to accelerate data retrieval within a table; that is, to assist information seekers in quickly finding record IDs that meet specific criteria.
[0003] Index creation is fundamental to implementing index functionality. Taking the most commonly used B-tree index as an example, the most time-consuming part of creating a B-tree index is sorting the values in the database table; the performance of this sorting is crucial to the speed of index creation.
[0004] In practical applications, when certain data types are implemented using character arrays, and these character arrays become excessively long, it can severely impact the index creation process. For example, numeric is a commonly used data type. Opengauss databases store excessively long numeric types, so they use character arrays for storage. This results in numeric data types being compared using sorting algorithms, which cannot use conventional numeric comparison functions and require a character-by-character comparison algorithm, leading to very low index creation efficiency.
[0005] Therefore, improving the efficiency of creating indexes on numeric data types has become a pressing technical problem. Summary of the Invention
[0006] In view of this, in order to overcome the shortcomings of the prior art, the main problem solved by the present invention is to optimize the efficiency of sorting numeric data and improve the efficiency of creating indexes on numeric data in Opengauss.
[0007] On one hand, this invention provides a method for creating an index for numeric data in OpenGauss based on abbreviation comparison, comprising:
[0008] Step S1: Determine whether the column of the data to be indexed is of type numeric, and set optimization tags for abbreviated comparison for the tuples in the target table based on the determination result;
[0009] Step S2: Scan the tuples in the target table one by one, and save the value of the tuple index column according to the optimization mark of the tuple;
[0010] Step S3: Sort the tuples in Step S2 using a shortened comparison algorithm;
[0011] Step S4: Insert the sorted tuples from step S3 into the btree index.
[0012] Furthermore, in step S1 of the method for creating an index for numeric data based on abbreviation comparison in Opengauss of the present invention, setting optimization tags for abbreviation comparison for tuples in the target table according to the judgment result includes:
[0013] If the column containing the data to be indexed is of type numeric, set the optimization flag to true;
[0014] If the column containing the data to be indexed is not of type numeric, set the optimization flag to false.
[0015] Furthermore, in step S2 of the Opengauss numeric data index creation method based on abbreviation comparison of the present invention, saving the value of the tuple index column according to the tuple optimization mark includes:
[0016] If the value of the optimization flag of the tuple is true, convert the value of the index column of the tuple from numeric type to int type abbreviation key, and save the converted abbreviation key and the value of the original field of the index column of the tuple;
[0017] If the value of the optimization flag for a tuple is false, save the value of the tuple index column with the data type numeric.
[0018] Furthermore, in the Opengauss index creation method for numeric data based on abbreviation comparison of the present invention, converting the value of the tuple index column from numeric type to an int type abbreviation key includes: when the value of the tuple index column is positive, converting the value of the tuple index column in the following manner:
[0019] If the weight of the highest bit of numeric data is greater than 20, the maximum value of the int type data is used to represent the abbreviation key;
[0020] If the weight of the highest bit of the numeric data is less than -11, use 0 to represent the abbreviation key;
[0021] If the weight of the highest bit of the numeric data is between -11 and 20, obtain the first seven significant digits of the numeric data to be converted, represented by an integer (int), and denoted as intermediate result A; shift the weight of the highest bit of the numeric data left by 24 bits, represented by an integer (int), and denoted as intermediate result B; perform a bitwise OR operation between intermediate result A and intermediate result B, and use the integer (int) obtained from the bitwise OR operation to represent the abbreviation key.
[0022] Furthermore, in the method for creating an index based on abbreviation comparison for numeric data in Opengauss of the present invention, converting the value of the tuple index column from numeric type to int type abbreviation key includes: when the value of the tuple index column is negative, multiplying the value of the tuple index column by -1 to obtain a positive value, converting the positive value, multiplying the conversion result by -1 to obtain a negative value, and using the negative value to represent the abbreviation key.
[0023] Furthermore, step S3 of the method for creating an index of numeric data based on abbreviation comparison in Opengauss of the present invention includes:
[0024] Compare the abbreviation keys of two tuples. If the abbreviation keys of the two tuples are not equal, sort the two tuples according to the comparison result.
[0025] If the abbreviation keys of two tuples are equal, compare the values of the original fields of the index columns of the two tuples, and sort the two tuples according to the comparison results.
[0026] Furthermore, step S3 of the numeric data index creation method based on abbreviation comparison in Opengauss of the present invention includes: sorting the tuples in step S2 in ascending order using abbreviation comparison algorithm.
[0027] Furthermore, step S3 of the numeric data index creation method based on abbreviation comparison in Opengauss of the present invention includes: sorting the tuples in step S2 in descending order using abbreviation comparison algorithm.
[0028] On the other hand, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed, performs the steps of the above-described method for creating an index of numeric data based on abbreviation comparison in Opengauss.
[0029] Finally, the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the above-described method for creating an index of numeric data based on abbreviation comparison in Opengauss.
[0030] The present invention provides an index creation method for numeric data in Opengauss based on abbreviation comparison, which has the following beneficial effects: by using the "abbreviation key" comparison method, the method of comparing character arrays one by one in the prior art is optimized into a direct comparison of individual int data, which improves the performance of the comparison algorithm, reduces a large number of CPU operations, greatly reduces the time spent on creating numeric data type indexes, thereby reducing the overhead of the original algorithm in the prior art and improving the efficiency of index creation. Attached Figure Description
[0031] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0032] Figure 1 This is a flowchart illustrating a method for creating an index of numeric data based on abbreviation comparison in Opengauss, an exemplary first embodiment of the present invention.
[0033] Figure 2 This is an execution flowchart of the first exemplary embodiment of the present invention, which describes the method for creating an index of numeric data based on abbreviation comparison in Opengauss. Detailed Implementation
[0034] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0035] It should be noted that, in the absence of conflict, the following embodiments and features can be combined with each other; and, based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.
[0036] It should be noted that various aspects of embodiments within the scope of the appended claims are described below. It will be apparent that the aspects described herein can be embodied in a wide variety of forms, and any particular structure and / or function described herein is merely illustrative. Based on this disclosure, those skilled in the art will understand that one aspect described herein can be implemented independently of any other aspect, and two or more of these aspects can be combined in various ways. For example, any number of aspects set forth herein can be used to implement the device and / or practice the method. Additionally, this device and / or method can be implemented using structures and / or functionalities other than one or more of the aspects set forth herein.
[0037] Figure 1 This is a flowchart illustrating a method for creating an index of numeric data in Opengauss based on abbreviation comparison, according to an exemplary first embodiment of the present invention. Figure 2 The following is an execution flowchart of a method for creating an index of numeric data in OpenGauss based on abbreviation comparison according to an exemplary first embodiment of the present invention. Figure 1 and Figure 2 As shown, the method in this embodiment includes:
[0038] Step S1: Determine whether the column of the data to be indexed is of type numeric, and set optimization tags for abbreviated comparison for the tuples in the target table based on the determination result;
[0039] Step S2: Scan the tuples in the target table one by one, and save the value of the tuple index column according to the optimization mark of the tuple;
[0040] Step S3: Sort the tuples in Step S2 using a shortened comparison algorithm;
[0041] Step S4: Insert the sorted tuples from step S3 into the btree index.
[0042] The second exemplary embodiment of the present invention provides a method for creating an index of numeric data in OpenGauss based on abbreviation comparison. This embodiment is... Figure 1 and Figure 2 In a preferred embodiment of the method shown, step S1 of this embodiment involves setting optimization tags for abbreviated comparison of tuples in the target table based on the judgment result, including:
[0043] If the column containing the data to be indexed is of type numeric, set the optimization flag to true;
[0044] If the column containing the data to be indexed is not of type numeric, set the optimization flag to false.
[0045] The third exemplary embodiment of the present invention provides a method for creating an index for numeric data in OpenGauss based on abbreviation comparison. This embodiment is... Figure 1 and Figure 2 In a preferred embodiment of the method shown, step S2 of this embodiment involves saving the value of the tuple index column based on the tuple's optimization marker, including:
[0046] If the value of the optimization flag of the tuple is true, convert the value of the index column of the tuple from numeric type to int type abbreviation key, and save the converted abbreviation key and the value of the original field of the index column of the tuple;
[0047] If the value of the optimization flag for a tuple is false, save the value of the tuple index column with the data type numeric.
[0048] In practical applications, the method in this embodiment converts the value of the tuple index column from numeric type to an int type abbreviation key, including:
[0049] When the value of the tuple index column is positive, the value of the tuple index column is converted as follows:
[0050] If the weight of the highest bit of numeric data is greater than 20, the maximum value of the int type data is used to represent the abbreviation key;
[0051] If the weight of the highest bit of the numeric data is less than -11, use 0 to represent the abbreviation key;
[0052] If the most significant bit weight of the numeric data is between -11 and 20, obtain the first seven significant digits of the numeric data to be converted, represented by an integer (int), and denoted as intermediate result A. Shift the most significant bit weight of the numeric data left by 24 bits, represented by an integer (int), and denoted as intermediate result B. Perform a bitwise OR operation on intermediate result A and intermediate result B, and use the integer (int) obtained from the bitwise OR operation to represent the abbreviation key. The bitwise OR operation is a binary operation; its function is to OR the corresponding binary bits of the two numbers involved in the operation. If either of the corresponding binary bits is 1, the result bit is 1. When the numbers involved in the operation are negative, both numbers are represented in two's complement form.
[0053] Taking the integer 123456789 as an example, the conversion process of this embodiment is further explained. In the numeric data structure, the specific variable will store the following values.
[0054]
[0055] Since the highest digit weight of 123456789 is 9, which falls between -11 and 20, the following transformation is performed:
[0056] Let A be the first seven significant digits, that is, A = 1234567. The type of A is int, which is 100101101011010000111 in binary.
[0057] Let B be the weight of the highest bit, i.e., B = 9. Shift B left by 24 bits to get 150994944, which is 10010000000000000000000000000 in binary.
[0058] Performing a bitwise OR operation on the two results yields 1001101111000110000101001110, which is converted to decimal as 163340622 and used as its abbreviation key.
[0059] When the value of the tuple index column is negative, the value of the tuple index column is multiplied by -1 to obtain a positive value. After the positive value is converted, the conversion result is multiplied by -1 to obtain a negative value. The negative value is used to represent the abbreviation key.
[0060] The fourth exemplary embodiment of the present invention provides a method for creating an index for numeric data in OpenGauss based on abbreviation comparison. This embodiment is... Figure 1 and Figure 2 In a preferred embodiment of the method shown, step S3 of the method in this embodiment includes:
[0061] Compare the abbreviation keys of two tuples. If the abbreviation keys of the two tuples are not equal, sort the two tuples according to the comparison result.
[0062] If the abbreviation keys of two tuples are equal, compare the values of the original fields of the index columns of the two tuples, and sort the two tuples according to the comparison results.
[0063] In practical applications, the method in this embodiment sorts the tuples in step S2 in ascending order using a shortened comparison algorithm. Alternatively, the tuples in step S2 can be sorted in descending order using a shortened comparison algorithm.
[0064] On the other hand, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed, performs the steps of the above-described method for creating an index of numeric data based on abbreviation comparison in Opengauss.
[0065] Finally, the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the above-described method for creating an index of numeric data based on abbreviation comparison in Opengauss.
[0066] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for creating an index for numeric data in OpenGauss based on abbreviation comparison, characterized in that, The method includes: Step S1: Determine whether the column containing the data to be indexed is of type numeric. Based on the determination result, set optimization flags for abbreviated comparisons for the tuples in the target table, including: If the column containing the data to be indexed is of type numeric, set the optimization flag to true; If the column containing the data to be indexed is not of type numeric, set the optimization flag to false; Step S2: Scan each tuple in the target table, and save the values of the tuple index columns based on the tuple's optimization flags, including: (1) If the value of the optimization flag of the tuple is true, convert the value of the index column of the tuple from numeric type to an int type abbreviation key, including: When the value of the tuple index column is positive, the value of the tuple index column is transformed as follows: If the weight of the highest bit of numeric data is greater than 20, the maximum value of the int type data is used to represent the abbreviation key; If the weight of the highest bit of the numeric data is less than -11, use 0 to represent the abbreviation key; If the weight of the highest bit of the numeric data is between -11 and 20, obtain the first seven significant digits of the numeric data to be converted, represented by an integer (int), and denoted as intermediate result A; shift the weight of the highest bit of the numeric data to the left by 24 bits, represented by an integer (int), and denoted as intermediate result B; perform a bitwise OR operation between intermediate result A and intermediate result B, and use the integer (int) obtained from the bitwise OR operation to represent the abbreviation key. When the value of the tuple index column is negative, the value of the tuple index column is multiplied by -1 to obtain a positive value. After the positive value is converted, the conversion result is multiplied by -1 to obtain a negative value. The negative value is used to represent the abbreviation key. Save the converted abbreviation key and the original field value of the index column of the tuple; (2) If the value of the optimization flag of the tuple is false, save the value of the tuple index column with data type numeric; Step S3: Sort the tuples from Step S2 using a contraction comparison algorithm, including: Compare the abbreviation keys of two tuples. If the abbreviation keys of the two tuples are not equal, sort the two tuples according to the comparison result. If the abbreviation keys of two tuples are equal, compare the values of the original fields of the index columns of the two tuples, and sort the two tuples according to the comparison results; Step S4: Insert the sorted tuples from step S3 into the btree index.
2. The method for creating an index for numeric data in OpenGauss based on abbreviation comparison according to claim 1, characterized in that, Step S3 includes: sorting the tuples in step S2 in ascending order using a shrinkage comparison algorithm.
3. The method for creating an index for numeric data in OpenGauss based on abbreviation comparison according to claim 1, characterized in that, Step S3 includes: sorting the tuples in step S2 in descending order using a shortened comparison algorithm.
4. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, which, when executed, performs the method as described in any one of claims 1-3.
5. A computer device, characterized in that, The computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method according to any one of claims 1-3.