Methods, storage media and equipment for collecting database statistical information
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CETC JINCANG (BEIJING) TECH CO LTD
- Filing Date
- 2023-02-09
- Publication Date
- 2026-06-30
Smart Images

Figure CN116149945B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of database technology, and in particular to a method, storage medium, and device for collecting database statistical information. Background Technology
[0002] In database usage, the execution of a query statement typically involves sequentially passing through the parser, optimizer, and executor stages before returning a result. During the optimizer stage, the database system selects the optimal execution strategy from various options, thereby executing the query statement more efficiently. Currently, databases primarily employ cost-based query optimizers, which calculate execution costs using database statistics and select the plan with the lowest cost for the executor to execute. Therefore, database statistics are one of the key factors influencing the optimization effectiveness of the query optimizer.
[0003] Currently, database statistics are primarily collected using two methods. One method involves analyzing the entire dataset in a table. The advantage of this method is the accuracy of the obtained statistics, but its disadvantage is its time-consuming nature. As the amount of data in the table grows, this time consumption may become unacceptable. The second method involves sampling and analyzing the data table. The advantage of this method is its shorter processing time, but its disadvantage is that the obtained statistics are not accurate enough, affecting the optimization performance of the query optimizer. Furthermore, both methods require table read operations, which means they compete for resources with user operations, impacting database processing performance. Summary of the Invention
[0004] One object of the present invention is to provide a method, storage medium and device for collecting database statistics that can reduce the access pressure on tables while collecting statistical information.
[0005] A further objective of this invention is to make the collection of statistical information more convenient.
[0006] Specifically, the present invention provides a method for collecting database statistical information, comprising:
[0007] Start collecting statistical information;
[0008] Retrieve log information since the last statistics update;
[0009] Filter out modified data information related to statistical information from the log information;
[0010] Update the statistics based on the modified data.
[0011] Optionally, the steps before initiating the collection of statistical information include:
[0012] Configure the trigger conditions for enabling statistical information collection;
[0013] Once the trigger condition is detected, proceed with the step of enabling the collection of statistical information.
[0014] Optionally, the triggering conditions include reaching a preset interval time, the number of rows added reaching a preset value, and the amount of log increase reaching a preset condition.
[0015] Optionally, the steps for filtering modified data information related to statistical information from log information include:
[0016] Filter out log messages containing information about added rows, deleted rows, updated rows, and table modifications.
[0017] Optionally, the step of filtering modified data information related to statistical information from log information includes:
[0018] The modified data is categorized as follows: information that modifies the table is classified as the first category, information that adds rows is classified as the second category, information that deletes rows is classified as the third category, and information that updates rows is classified as either deleted rows or inserted rows, which are then classified into the third and second categories, respectively.
[0019] Optionally, the steps for updating statistical information based on modified data include:
[0020] The first and second incremental statistical information are calculated based on the modified data information.
[0021] Update the statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information;
[0022] The first incremental statistical information is calculated using the second type of information, the second incremental information is calculated using the third type of information, and the original statistical information is the statistical information of the current state.
[0023] Optionally, the step of updating the statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information includes:
[0024] If the statistical information belongs to the first type, the statistical information is updated by comparing the first incremental statistical information, the second incremental statistical information, and the original statistical information.
[0025] Optionally, the step of updating the statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information includes:
[0026] If the statistical information belongs to the second type, the statistical information is updated by calculating the first incremental statistical information, the second incremental statistical information, and the original statistical information.
[0027] According to another aspect of the present invention, a machine-readable storage medium is also provided, on which a machine-executable program is stored, which, when executed by a processor, implements the method for collecting database statistics according to any of the preceding claims.
[0028] According to another aspect of the present invention, a computer device is also provided, including a memory, a processor, and a machine-executable program stored in the memory and running on the processor, wherein the processor, when executing the machine-executable program, implements a method for collecting database statistical information according to any of the preceding claims.
[0029] The database statistics collection method of this invention obtains log information, then filters out modified data information related to the statistics from the log information, and then updates the statistics based on the modified data information. In other words, statistics are collected or updated by accessing log information. Therefore, on the one hand, there is no need to access the table, thus avoiding resource contention with user operations, reducing the burden of table access, and thus avoiding a significant impact on database processing performance. On the other hand, accessing log information facilitates the acquisition of incremental information, that is, the modified data information of the table since the last statistics update, thereby updating the statistics. Compared with existing technologies, this solution does not require the acquisition of the entire data table, improving the efficiency of statistics collection. Moreover, because all modified data information since the last statistics update is acquired, the accuracy of the statistics can be guaranteed.
[0030] Furthermore, the database statistical information collection method of the present invention, by configuring trigger conditions, initiates the statistical information collection process after the trigger conditions are met. This allows the database system to automatically start a new round of statistical information collection upon triggering of the conditions, eliminating the need for manual intervention. This makes the statistical information collection process more convenient. The trigger conditions can be selected as needed, making the statistical information collection process more flexible.
[0031] The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments of the invention in conjunction with the accompanying drawings. Attached Figure Description
[0032] The following sections will describe some specific embodiments of the invention in detail by way of example and not limitation, with reference to the accompanying drawings. The same reference numerals in the drawings denote the same or similar parts or portions. Those skilled in the art should understand that these drawings are not necessarily drawn to scale. In the drawings:
[0033] Figure 1This is a schematic flowchart of a method for collecting database statistical information according to an embodiment of the present invention;
[0034] Figure 2 This is a schematic flowchart of a method for collecting database statistical information according to another embodiment of the present invention;
[0035] Figure 3 This is a schematic flowchart of a method for collecting database statistical information according to yet another embodiment of the present invention;
[0036] Figure 4 This is a schematic flowchart of a method for collecting database statistical information according to yet another embodiment of the present invention;
[0037] Figure 5 This is a schematic diagram of a machine-readable storage medium according to an embodiment of the present invention;
[0038] Figure 6 This is a schematic diagram of a computer device according to an embodiment of the present invention. Detailed Implementation
[0039] Those skilled in the art should understand that the embodiments described below are merely a part of the embodiments of the present invention, and not all of the embodiments of the present invention. These partial embodiments are intended to explain the technical principles of the present invention and are not intended to limit the scope of protection of the present invention. Based on the embodiments provided by the present invention, all other embodiments obtained by those skilled in the art without creative effort should still fall within the scope of protection of the present invention.
[0040] It should be noted that the logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be specifically implemented in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-included system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).
[0041] like Figure 1 As shown, in one embodiment, the method for collecting database statistics generally includes:
[0042] Step S101: Start collecting statistical information. This means starting the statistical information update process.
[0043] Step S102: Obtain log information since the last statistics update. Specifically, this means obtaining log information generated within the time period from the last time statistics updates were started to the current time updates. It should be noted that statistics are created from scratch; that is, the first generation of statistics also counts as a statistics update.
[0044] Step S103: Filter the log information to extract modified data related to statistical information. Specifically, filter the log information for added rows, deleted rows, updated rows, and table modifications. Added row information refers to log information generated when a row is added to the table; deleted row information refers to log information generated when a row is deleted from the table; updated row information refers to log information where data in a row of the table is modified; modified table information includes log information for adding a column to the table, deleting a column from the table, and deleting the entire table.
[0045] Step S104: Update the statistical information based on the modified data. Specifically, the statistical information is divided into table-level statistics and column-level statistics. Table-level statistics include the number of rows and pages, while column-level statistics include the maximum value, minimum value, and high-frequency value. Since the modified data related to the statistical information has already been obtained, the statistical information can be updated based on the added and deleted information.
[0046] For example, regarding the number of rows, by using information on added and deleted rows, we can determine how many rows have been added or removed since the last statistical update, and thus update the row count in the statistics. Similarly, regarding the maximum value in a column, by using information on added, deleted, and updated rows in that column, we can determine which values have been added, removed, or changed since the last statistical update, and thus update the maximum value in that column.
[0047] In this embodiment, log information is acquired, and then modified data information related to statistical information is filtered out from the log information. The statistical information is then updated based on the modified data information. In other words, statistical information is collected or updated by accessing the log information. Therefore, on the one hand, there is no need to access the table, thereby avoiding resource competition with user operations, reducing the burden of accessing the table, and thus avoiding a significant impact on the database processing performance.
[0048] On the other hand, accessing log information facilitates the acquisition of incremental information, namely, the modified data of the table since the last statistical update, thereby updating the statistical information. Compared with existing technologies, this solution does not require acquiring the full data of the data table, improving the efficiency of statistical information collection. Moreover, because it acquires all modified data since the last statistical update, the accuracy of the statistical information can be guaranteed.
[0049] like Figure 2 As shown, in one embodiment, the method for collecting database statistics generally includes:
[0050] Step S201: Start collecting statistical information.
[0051] Step S202: Obtain log information after the last statistical information update.
[0052] Step S203: Filter out information from the log messages that includes added rows, deleted rows, updated rows, and modified tables.
[0053] Step S204: Classify the modified data. Information about modified tables is classified as the first category, information about added rows as the second category, information about deleted rows as the third category, and information about updated rows is further divided into deleted rows and inserted rows, which are then classified into the third and second categories respectively. Because updating rows modifies data in a specific row, the original data is deleted first, and then the new data is inserted, thus allowing the information in updated rows to be categorized into deleted rows and inserted rows.
[0054] Specifically, statistical information is divided into table-level statistics and column-level statistics. Modifying table information involves adding a column, deleting a column, or deleting log information. Therefore, modifying table statistics means adding a new column, deleting a column from the existing statistics, or deleting the entire table's statistics without affecting the original statistics; this falls under the first category. Adding or deleting rows both modify the original table-level and column-level statistics, thus falling under the second and third categories respectively.
[0055] Step S205: Calculate the first incremental statistical information and the second incremental statistical information based on the modified data information.
[0056] Step S206: Update the statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information. The first incremental statistical information is calculated using the second type of information, the second incremental information is calculated using the third type of information, and the original statistical information is the statistical information of the current state.
[0057] Specifically, the information of all added rows in the second type of information is treated as a data table to obtain the first incremental statistical information, obtained in the same way as ordinary statistical information. That is, the first incremental statistical information includes statistics such as the number of rows, maximum value, and minimum value. Similarly, the information of all deleted rows in the third type of information is treated as a data table to obtain the second incremental statistical information, which also includes statistics such as the number of rows, maximum value, and minimum value. The original statistical information is the statistical information generated from the previous statistical information update.
[0058] In addition, during the calculation of incremental statistics, non-updated columns are not included in the statistics.
[0059] Continue to refer to Figure 2 As shown, the steps for updating statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information further include:
[0060] Step S207: If the statistical information belongs to the first type, the statistical information is updated by comparing the first incremental statistical information, the second incremental statistical information, and the original statistical information. Specifically, the statistical information of the first type is the statistical information that can be obtained by simply comparing the first incremental statistical information, the second incremental statistical information, and the original statistical information, including the maximum value and the minimum value.
[0061] For example, for a certain column, the maximum value in the original statistics is 80, the maximum value in the first increment statistics is 90, and the maximum value in the second increment statistics is 70. Then, by comparison, the updated maximum value should be 90. Similarly, for a certain column, the minimum value in the original statistics is 50, the minimum value in the first increment statistics is 60, and the minimum value in the second increment statistics is 55. Then, by comparison, the updated minimum value should be 50.
[0062] Step S208: If the statistical information belongs to the second type, the statistical information is updated by calculating the first incremental statistical information, the second incremental statistical information, and the original statistical information. Specifically, the second type of statistical information is the statistical information that can only be obtained by calculating the first incremental statistical information, the second incremental statistical information, and the original statistical information, including the number of rows, histogram, high-frequency values, etc.
[0063] Specifically, the calculation method is: original statistics - second incremental statistics + first incremental statistics. For example, regarding the number of rows, if the original statistics have 1000 rows, the first incremental statistics have 200 rows, and the second incremental statistics have 100 rows, then the updated statistics should have 1100 rows.
[0064] It should be noted that the "+" and "-" signs in the formula are for illustrative purposes only and do not necessarily represent addition or subtraction of numerical values. The calculation involves the fusion of the original statistical information, the first incremental statistical information, and the second incremental statistical information.
[0065] In this embodiment, the modified data information is categorized to obtain first incremental statistical information and second incremental statistical information. For some statistical information, updates can be made directly by comparing the first incremental statistical information, the second incremental statistical information, and the original statistical information. For other statistical information, updates can be made after calculating the first incremental statistical information, the second incremental statistical information, and the original statistical information. Calculating the first and second incremental statistical information first helps improve the efficiency of updating statistical information.
[0066] like Figure 3 As shown, in one embodiment, the method for collecting database statistics generally includes:
[0067] Step S301: Configure the triggering conditions for enabling statistical information updates. Specifically, the triggering conditions include reaching a preset interval time, the number of rows added reaching a preset value, and the amount of log increases reaching a preset condition.
[0068] Step S302: If the trigger condition is detected, proceed to step S303. Specifically, if the trigger condition is configured to reach a preset interval, for example, a preset interval of 24 hours, then after 24 hours since the last statistical information update, the trigger condition is determined to have been triggered, and a new round of statistical information collection will automatically begin. It should be noted that the preset interval can also be 12 hours, 30 hours, etc.
[0069] If the trigger condition is configured as the number of rows added reaching a preset value, then if the number of rows added to the data table reaches the preset value after the last statistics update, the trigger condition is considered triggered, and a new round of statistics collection will automatically begin. It should be noted that the trigger condition can also be the number of rows deleted or modified reaching a preset value.
[0070] If the trigger condition is configured as the log increase reaching a preset condition, then after the log size (storage usage) increases by the preset condition since the last statistics update, the trigger condition is determined to be triggered, and a new round of statistics collection will automatically begin. Alternatively, the trigger condition may be the increase in the number of log lines reaching a preset condition.
[0071] Step S303: Start collecting statistical information.
[0072] Step S304: Obtain log information after the last statistics update.
[0073] Step S305: Filter out the modified data information related to the statistical information from the log information.
[0074] Step S306: Update the statistical information based on the modified data information.
[0075] In this embodiment, by configuring trigger conditions, the collection of statistical information is initiated after the trigger conditions are met. This allows the database system to automatically start a new round of statistical information collection upon triggering, eliminating the need for manual intervention. This makes the collection of statistical information more convenient. The trigger conditions can be selected as needed, making the collection of statistical information more flexible.
[0076] like Figure 4 As shown, in one embodiment, the method for collecting database statistics generally includes:
[0077] Step S401: Set trigger conditions. Specifically, trigger conditions include reaching a preset interval time, the number of rows added reaching a preset value, and the amount of log increases reaching a preset condition.
[0078] Step S402: Trigger statistical information collection. Specifically, if the trigger condition is met, the collection of statistical information will begin. For example, if the trigger condition is configured to be a preset interval, then after the preset interval has elapsed since the last statistical information update, the trigger condition will be determined to have been triggered, and a new round of statistical information collection will automatically begin.
[0079] Step S403 involves collecting incremental information from logs and then cleaning, classifying, and statistically analyzing this information. Specifically, this means acquiring log information generated within the time period from the last time statistical information updates were initiated to the current time of initiation. The incremental information is then cleaned to filter out useful log data, specifically information related to added rows, deleted rows, updated rows, and table modifications.
[0080] Then, the cleaned log information is categorized: information about table modifications is classified as the first category, information about added rows as the second category, information about deleted rows as the third category, and information about updated rows is further divided into deleted rows and inserted rows, which are then classified into the third and second categories respectively. Finally, first and second incremental statistics are obtained.
[0081] Step S404: Update the statistical table using the information obtained from the statistical analysis. For statistical information that can be directly updated, update and write it directly to the statistical table through simple comparison. For statistical information that cannot be directly updated, perform a fusion calculation first, and then update and write it to the statistical table.
[0082] In this embodiment, log information is obtained, and then statistical information is updated using the log information. In other words, statistical information is collected or updated by accessing the log information. Therefore, on the one hand, there is no need to access the table, thus avoiding resource contention with user operations, reducing the burden of table access, and consequently avoiding a significant impact on database processing performance.
[0083] On the other hand, accessing log information facilitates the acquisition of incremental information, namely, the modified data of the table since the last statistical update, thereby updating the statistical information. Compared with existing technologies, this solution does not require acquiring the full data of the data table, improving the efficiency of statistical information collection. Moreover, because it acquires all modified data since the last statistical update, the accuracy of the statistical information can be guaranteed.
[0084] Furthermore, by configuring trigger conditions, statistical information collection can be initiated only after the conditions are met. This allows the database system to automatically start a new round of statistical information collection upon triggering, eliminating the need for manual intervention. This makes statistical information collection more convenient. Trigger conditions can be selected as needed, making statistical information collection more flexible.
[0085] This embodiment also provides a machine-readable storage medium and a computer device. Figure 5 This is a schematic diagram of a machine-readable storage medium 10 according to an embodiment of the present invention. Figure 6 This is a schematic diagram of a computer device 20 according to an embodiment of the present invention.
[0086] The machine-readable storage medium 10 stores a machine-executable program 11 thereon, which, when executed by a processor, implements the database statistics information collection method of any of the above embodiments.
[0087] Computer device 20 may include memory 210, processor 220 and machine-executable program 11 stored on memory 210 and running on processor 220, and processor 220 implements the database statistical information collection method of any of the above embodiments when executing machine-executable program 11.
[0088] It should be noted that the logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be specifically implemented in any machine-readable storage medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-based system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).
[0089] For the purposes of this embodiment, the machine-readable storage medium 10 can be any means capable of containing, storing, communicating, propagating, or transmitting a program for use by or in conjunction with an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, the machine-readable storage medium 10 can even be paper or other suitable media on which the program can be printed, since the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.
[0090] It should be understood that various parts of the present invention can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system.
[0091] Computer device 20 can be, for example, a server, desktop computer, laptop computer, tablet computer, or smartphone. In some examples, computer device 20 can be a cloud computing node. Computer device 20 can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system. Typically, program modules can include routines, programs, object programs, components, logic, data structures, etc., that perform specific tasks or implement specific abstract data types. Computer device 50 can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules can reside on local or remote computing system storage media, including storage devices.
[0092] Computer device 20 may include a processor 220 adapted to execute stored instructions and a memory 210 that provides temporary storage space for the operation of said instructions during operation. Processor 220 may be a single-core processor, a multi-core processor, a computing cluster, or any other configuration. Memory 210 may include random access memory (RAM), read-only memory, flash memory, or any other suitable storage system.
[0093] The processor 220 can be connected via a system interconnect (e.g., PCI, PCI-Express, etc.) to an I / O interface (input / output interface) suitable for connecting the computer device 20 to one or more I / O devices (input / output devices). I / O devices may include, for example, a keyboard and indicating devices, where indicating devices may include a touchpad or touchscreen, etc. I / O devices may be built into the computer device 20 or may be external devices connected to the computing device.
[0094] The processor 220 may also be linked via a system interconnect to a display interface suitable for connecting the computer device 20 to a display device. The display device may include a display screen that is a built-in component of the computer device 20. The display device may also include an external computer monitor, television, or projector connected to the computer device 20. Furthermore, a network interface controller (NIC) may be adapted to connect the computer device 20 to a network via a system interconnect. In some embodiments, the NIC may use any suitable interface or protocol (such as an Internet Minicomputer System Interface) to transmit data. The network may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, etc. Remote devices may connect to the computing device via the network.
[0095] The flowchart provided in this embodiment is not intended to indicate that the operations of the method will be performed in any particular order, or that all operations of the method are included in every case. Furthermore, the method may include additional operations. Within the scope of the technical concept provided by the method in this embodiment, additional variations can be made to the above method.
[0096] Therefore, those skilled in the art should recognize that although numerous exemplary embodiments of the present invention have been shown and described in detail herein, many other variations or modifications conforming to the principles of the present invention can be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Thus, the scope of the present invention should be understood and construed as covering all such other variations or modifications.
Claims
1. A method for collecting database statistical information, comprising: The collection of the statistical information is initiated. The statistical information includes table-level statistical information and column-level statistical information. The table-level statistical information includes the number of rows and the number of pages. The column-level statistical information includes the maximum value, the minimum value, and the high-frequency value. Retrieve log information since the last statistics update; Filter out the modified data information related to the statistical information from the log information; Update the statistical information based on the modified data information; The step of filtering out modified data information related to the statistical information from the log information includes: Filter out information from the log messages that includes added rows, deleted rows, updated rows, and table modifications; The step of filtering out modified data information related to the statistical information from the log information includes: The modified data is classified as follows: the information of the modified table is classified as the first type of information, the information of the added row is classified as the second type of information, the information of the deleted row is classified as the third type of information, and the information of the updated row is divided into deleted row and inserted row, which are respectively classified into the third type and the second type. The step of updating the statistical information based on the modified data information includes: The first incremental statistical information and the second incremental statistical information are calculated based on the modified data information; The statistical information is updated based on the first incremental statistical information, the second incremental statistical information, and the original statistical information; The first incremental statistical information is calculated using the second type of information, the second incremental information is calculated using the third type of information, and the original statistical information is the statistical information of the current state.
2. The method for collecting database statistical information according to claim 1, wherein, Prior to initiating the step of collecting the statistical information, the following steps are included: Configure the trigger conditions for enabling statistical information collection; If the triggering condition is detected, the step of starting the collection of statistical information is executed.
3. The method for collecting database statistical information according to claim 2, wherein, The triggering conditions include reaching a preset interval time, the number of rows added reaching a preset value, and the amount of log increase reaching a preset condition.
4. The method for collecting database statistical information according to claim 1, wherein, The step of updating the statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information includes: If the statistical information belongs to the first type, the statistical information is updated by comparing the first incremental statistical information, the second incremental statistical information, and the original statistical information.
5. The method for collecting database statistical information according to claim 1, wherein, The step of updating the statistical information based on the first incremental statistical information, the second incremental statistical information, and the original statistical information includes: If the statistical information belongs to the second type, the statistical information is updated by calculating the first incremental statistical information, the second incremental statistical information, and the original statistical information.
6. A machine-readable storage medium having a machine-executable program stored thereon, wherein the machine-executable program, when executed by a processor, implements the method for collecting database statistics according to any one of claims 1 to 5.
7. A computer device comprising a memory, a processor, and a machine-executable program stored in the memory and running on the processor, wherein the processor, when executing the machine-executable program, implements the method for collecting database statistical information according to any one of claims 1 to 5.