Abnormality processing method, device and equipment of database

By registering the process identifier of the management component in the main database, real-time reporting of exception types and data disk updates are achieved, solving the problem of low efficiency in database exception handling, improving the real-time performance and accuracy of exception detection, reducing the risk of business interruption, and ensuring the reliability of database recovery and business continuity.

CN122019239BActive Publication Date: 2026-06-23TIANJIN NANKAI UNIV GENERAL DATA TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TIANJIN NANKAI UNIV GENERAL DATA TECH
Filing Date
2026-04-07
Publication Date
2026-06-23

Smart Images

  • Figure CN122019239B_ABST
    Figure CN122019239B_ABST
Patent Text Reader

Abstract

The application provides an exception processing method, device and equipment of a database, which can be applied to the technical field of computers. The method comprises the following steps: in response to receiving an exception notification sent by a master database, determining an exception type triggered by the master database based on the exception notification; in the case that the exception type is an exception existing in an operation log used for recording a data change process in the master database, comparing an exception time of the operation log in the exception notification with a marking time to obtain a comparison result; in the case that the comparison result represents that the marking time is earlier than the exception time, sending an update instruction to the master database, the update instruction being used for instructing the master database to update corresponding second table data in a physical disk based on first table data to obtain updated second table data, and updating the marking time to a target marking time.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and more specifically to a method, apparatus, and device for handling database anomalies. Background Technology

[0002] In database applications, database stability and high availability are crucial for ensuring business continuity. During operation, databases may encounter various anomalies such as downtime, data corruption, and log errors. Failure to detect and handle these anomalies promptly will lead to database service disruptions and even data loss.

[0003] However, monitoring modules in related technologies typically use polling to detect database anomalies, and can usually only detect whether the database service is available, making it difficult to automatically pinpoint the specific type of anomaly. Especially in scenarios involving corrupted operation logs, it is difficult to determine whether the corruption will affect database restarts or data synchronization between primary and standby databases. The processing is complex and time-consuming, which can easily lead to prolonged business interruptions. Summary of the Invention

[0004] In view of the above problems, the present invention provides a method, apparatus and device for handling database anomalies.

[0005] According to one aspect of the present invention, a database exception handling method is provided, applied to a management component, comprising: responding to receiving an exception notification sent by a master database; determining the exception type triggered by the master database based on the exception notification; the master database storing process identifiers pre-registered by the management component, and sending the exception notification to the management component based on the process identifiers when an exception is detected; if the exception type is an exception in the operation log used to record data change processes in the master database, comparing the exception time in the operation log in the exception notification with a marked time to obtain a comparison result; the marked time being the completion time of updating the corresponding second table data in the physical disk using the first table data stored in memory during business processing by the master database; if the comparison result indicates that the marked time is earlier than the exception time, sending an update instruction to the master database; the update instruction instructing the master database to update the corresponding second table data in the physical disk based on the first table data to obtain updated second table data, and updating the marked time to a target marked time, so that in the event of a master database restart or data synchronization between the master database and a backup database, data loading is performed based on the updated second table data and / or log data in the operation log located after the target marked time.

[0006] Another aspect of the present invention provides an anomaly handling apparatus for a database, applied to a management component, comprising: a determination module, configured to, in response to receiving an anomaly notification sent by a master database, determine the anomaly type triggered by the master database based on the anomaly notification, wherein the master database stores process identifiers pre-registered by the management component, and sends the anomaly notification to the management component based on the process identifiers when an anomaly is detected; a comparison module, configured to, when the anomaly type is an anomaly in the operation log used to record data change processes in the master database, compare the anomaly time in the operation log in the anomaly notification with a marked time to obtain a comparison result, wherein the marked time is the completion time of updating the corresponding second table data in the physical disk using the first table data stored in memory during business processing by the master database; and a sending module, configured to, when the comparison result indicates that the marked time is earlier than the anomaly time, send an update instruction to the master database, wherein the update instruction instructs the master database to update the corresponding second table data in the physical disk based on the first table data, thereby obtaining updated second table data, and updating the marked time to a target marked time, so that in the event of a master database restart or data synchronization between the master database and a backup database, data loading is performed based on the updated second table data and / or log data in the operation log located after the target marked time.

[0007] Another aspect of the present invention provides an electronic device comprising: one or more processors; and a memory for storing one or more computer programs, wherein the one or more processors execute the one or more computer programs to implement the steps of the method described above.

[0008] Another aspect of the present invention provides a computer-readable storage medium having a computer program or instructions stored thereon, which, when executed by a processor, implement the steps of the above-described method.

[0009] Another aspect of the present invention provides a computer program product, including a computer program or instructions that, when executed by a processor, implement the steps of the above-described method.

[0010] According to the database anomaly handling method of the present invention, by pre-registering the process identifier of the management component with the master database, the master database can proactively send anomaly notifications to the management component based on the identifier when an anomaly is detected, thereby achieving real-time and accurate reporting of anomaly types. When the reported anomaly type is an operation log anomaly, the anomaly time carried in the anomaly notification is further compared with the marking time of the master database. If the marking time is earlier than the anomaly time, data is written to disk and the marking time is updated. This ensures that corrupted log data is not used during subsequent master database restarts or master-slave database data synchronization, but rather the latest data and / or log data after the corruption point is used, reducing recovery failures or business interruptions caused by operation log corruption. This at least partially solves the technical problems of low efficiency in database anomaly handling and the resulting long-term business interruptions in related technologies, achieving the technical effects of improving the real-time performance and accuracy of database anomaly detection, reducing database recovery failures caused by operation log corruption, and lowering the risk of business interruption. Attached Figure Description

[0011] The above-mentioned contents, as well as other objects, features and advantages of the present invention, will become clearer from the following description of embodiments of the present invention with reference to the accompanying drawings.

[0012] Figure 1 The diagram illustrates an application scenario of a database exception handling method and apparatus according to an embodiment of the present invention.

[0013] Figure 2 A flowchart of a database exception handling method according to an embodiment of the present invention is shown.

[0014] Figure 3 The diagram illustrates the interaction between the master database, the backup database, and the management component according to an embodiment of the present invention.

[0015] Figure 4 A schematic diagram of a data synchronization process based on operation logs according to an embodiment of the present invention is shown.

[0016] Figure 5 A structural block diagram of a database exception handling apparatus according to an embodiment of the present invention is shown.

[0017] Figure 6 A block diagram of an electronic device suitable for implementing an exception handling method for a database according to an embodiment of the present invention is shown. Detailed Implementation

[0018] Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the invention. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of the invention for ease of explanation. However, it will be apparent that one or more embodiments may be practiced without these specific details. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concept of the invention.

[0019] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.

[0020] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.

[0021] When using expressions such as "at least one of A, B and C", they should generally be interpreted in accordance with the meaning that is commonly understood by those skilled in the art (e.g., "a system having at least one of A, B and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B and C, etc.).

[0022] In database applications, customer data may be corrupted due to unexpected events such as power outages in the data center or hard drive failures. Therefore, it is crucial to be able to detect and repair such corruption promptly to prevent business disruptions. However, in the database architecture of related technologies, monitoring modules often struggle to detect internal database corruption in a timely manner, typically only passively discovering problems when the failure manifests as "failure to start" or "data synchronization interruption." Furthermore, during the rebuilding of a primary / standby database, the lack of effective protection measures can easily lead to business disruptions or even simultaneous corruption of both primary and standby database files.

[0023] Therefore, this invention provides a database anomaly handling method to achieve real-time detection and automatic repair of database corruption, thereby improving business continuity and data security.

[0024] Figure 1 The diagram illustrates an application scenario of a database exception handling method and apparatus according to an embodiment of the present invention.

[0025] like Figure 1As shown, application scenario 100 according to this embodiment may include management component 110 and main database 120.

[0026] The management component 110 interacts with the main database 120 via a communication link to monitor and handle anomalies in the main database 120. The communication link can be an internal bus, a local area network, or an inter-process communication mechanism, used to transmit instructions, status information, and data between the management component 110 and the main database 120.

[0027] The management component 110 can be deployed independently on a dedicated server or coexist with the main database 120 on the same physical device (e.g., a server). It is responsible for receiving abnormal notifications actively reported by the main database 120, and performing abnormal analysis and issuing corresponding instructions according to the abnormality type.

[0028] The primary database 120 is used to store data, such as business data, and monitors the consistency of business data (e.g., table data), operation logs, and database indexes in real time during operation, automatically detecting anomalies such as data corruption or operation log corruption. When an anomaly is detected, the primary database 120 proactively sends an anomaly notification based on the pre-registered process identifier of the management component 110; simultaneously, the primary database 120 also performs operations such as data write-to-disk, index rebuilding, log cleanup, or primary / standby switchover according to instructions issued by the management component 110, and feeds back the execution results to the management component 110.

[0029] It should be noted that the database exception handling method provided in this embodiment of the invention can generally be executed by the management component 110. Accordingly, the database exception handling device provided in this embodiment of the invention can generally be set in the management component 110.

[0030] It should be understood that Figure 1 The number of management components and master databases shown is merely illustrative. Depending on implementation needs, there can be any number of management components and master databases.

[0031] The following will be based on Figure 1 The described scene, through Figures 2-4 The database exception handling method of the embodiments of the invention will be described in detail.

[0032] Figure 2 A flowchart of a database exception handling method according to an embodiment of the present invention is shown.

[0033] like Figure 2 As shown, the method includes operations S210 to S230, which are exception handling methods for the database that can be executed by the management component.

[0034] In operation S210, in response to receiving an exception notification from the master database, the type of exception triggered by the master database is determined based on the exception notification. The master database stores process identifiers pre-registered by the management component, and sends the exception notification to the management component based on the process identifier when an exception is detected.

[0035] In operation S220, if the exception type is that there is an exception in the operation log used to record the data change process in the main database, the exception time in the operation log in the exception notification is compared with the marked time to obtain the comparison result. The marked time is the time when the data of the first table stored in memory in the main database during the business processing is used to update the corresponding data of the second table in the physical disk.

[0036] In operation S230, if the comparison result indicates that the marked time is earlier than the abnormal time, an update instruction is sent to the main database. The update instruction is used to instruct the main database to update the corresponding second table data in the physical disk based on the first table data, obtain the updated second table data, and update the marked time to the target marked time, so that in the event of a restart of the main database or data synchronization between the main database and the standby database, data loading is performed based on the updated second table data and / or log data in the operation log that is located after the target marked time.

[0037] During implementation, the management component can register its process identifier (PID) with the main database through the main database's active damage notification interface when the main database starts for the first time. After the main database detects data anomalies, it will actively notify the management component by sending a signal.

[0038] The process identifier can be the process identifier of the management component in the operating system, or it can be the communication address or socket information of the management component. It is used to establish a directed communication channel between the master database and the management component. Through this registration mechanism, the master database can determine which target to send a notification to when an exception occurs, so that the exception information can be transmitted to the management component in a timely and accurate manner for timely repair.

[0039] The main database can integrate an anomaly detection component that monitors its operational status in real time, including the health of table data, operation logs, and database indexes. The anomaly monitoring module can identify and classify anomalies in various ways. For example, it can capture error codes generated within the main database during service provision or file access, and determine the specific anomaly based on the mapping between error codes and anomaly types. Capturing anomalies may include: checksum errors discovered during data access, table or index content anomalies, missing transaction logs or unexpected transaction identifiers discovered when accessing operation logs (such as Write-Ahead Log (xlog) or Commit Log (clog)), and missing data files or IO read / write anomalies during underlying input / output (I / O) processes.

[0040] Alternatively, a pre-defined periodic task can be used to call a dedicated verification interface to perform anomaly checks on operation logs, table data, and database indexes to determine if the data is corrupted. Through this mechanism, the anomaly detection component can distinguish between different types of anomalies, such as corrupted operation logs and corrupted table data.

[0041] In some embodiments, when the master database detects an anomaly, it can also record the anomaly in the data anomaly record table, and mark the database status as "abnormal" by itself or by the management component.

[0042] Once the master database detects an abnormal event and determines the abnormal type according to preset rules, it can proactively send an abnormal notification to the management component based on previously stored process identifiers. This notification may include the abnormal type, the time of the abnormal event, and contextual information related to the abnormality, such as data identifiers associated with the abnormality.

[0043] After receiving an exception notification from the master database, the management component can determine the type of exception triggered by the master database by parsing the notification content. Based on the exception type, the management component can then execute differentiated handling strategies for different types of exceptions.

[0044] When the exception type reported by the primary database is an operation log exception, the management component can extract the exception time from the operation log from the exception notification and obtain the latest marking time from the primary database. The exception time is then compared with the marking time to obtain the comparison result. The marking time refers to the moment when the data in the first table stored in memory during business processing (i.e., service provision) is synchronously updated to the corresponding data in the second table on the physical disk. This time reflects the latest point in time of data persistence.

[0045] If the comparison result indicates that the marked time is earlier than the anomaly time, meaning the data persistence completion time occurred before the anomaly occurred in the operation log, the management component sends an update command to the primary database. Based on this update command, the primary database writes the latest data from memory to the physical disk, resulting in the updated second table data. After completing the disk data update, the primary database further updates the marked time to the target marked time, which is the completion time of this data persistence operation.

[0046] Through the above processing, in the event of a subsequent restart of the primary database or data synchronization between the primary and standby databases, the system can perform data loading or recovery based on the updated second table data already written to disk. Simultaneously, for log data in the operation log after the target marker time, since the data changes corresponding to this part of the log have been directly written to disk through the first table data in memory, the system can selectively use this part of the log, or, in the event of log corruption, rely entirely on the updated second table data already written to disk to complete data loading or recovery. Therefore, even if the operation log is corrupted, the primary database can still complete data loading or recovery based on a secure data copy, thus ensuring the reliability of the recovery or loading process and the continuity of business operations.

[0047] Data synchronization between the primary and standby databases can include: normal data synchronization, primary-standby failover, and dual-machine rebuild. Normal data synchronization refers to the primary database continuously transmitting operation logs to the standby database while providing services. The standby database replays these logs to maintain data consistency with the primary database. Primary-standby failover refers to switching business traffic from the primary database to the standby database when the primary database fails. The standby database is then upgraded to become the new primary database to continue providing services, and data consistency between the primary and standby databases must be maintained throughout this process. Dual-machine rebuild refers to performing a full data synchronization based on the primary database when there are significant data differences between the primary and standby databases, the standby database is severely corrupted, or a new standby node is added. After the full synchronization is complete, incremental log synchronization is initiated to re-establish a reliable primary-standby replication relationship.

[0048] According to the database anomaly handling method of the present invention, by pre-registering the process identifier of the management component with the master database, the master database can proactively send anomaly notifications to the management component based on the identifier when an anomaly is detected, thereby achieving real-time and accurate reporting of anomaly types. When the reported anomaly type is an operation log anomaly, the anomaly time carried in the anomaly notification is further compared with the marking time of the master database. If the marking time is earlier than the anomaly time, data is written to disk and the marking time is updated. This ensures that corrupted log data is not used during subsequent master database restarts or master-slave database data synchronization, but rather the latest data and / or log data after the corruption point is used, reducing recovery failures or business interruptions caused by operation log corruption. This at least partially solves the technical problems of low efficiency in database anomaly handling and the resulting long-term business interruptions in related technologies, achieving the technical effects of improving the real-time performance and accuracy of database anomaly detection, reducing database recovery failures caused by operation log corruption, and lowering the risk of business interruption.

[0049] According to an embodiment of the present invention, the abnormal time is the start time of the abnormal period in which the operation log shows an abnormality; the operation log has a corresponding status flag; the method may further include the following operations.

[0050] If the comparison result indicates that the marked time is later than the abnormal time, a state switching instruction is sent to the master database. The state switching instruction is used to instruct the master database to mark the log data in the operation log corresponding to the abnormal period as invalid and switch the log status of the operation log from the abnormal state to the normal state.

[0051] An abnormal moment in the operation log can refer to the starting point of an abnormal period in which the operation log becomes corrupted or experiences write errors, such as the moment when the operation log begins to show signs of corruption or write errors. Furthermore, the operation log has corresponding status flags to indicate whether the log is currently in a normal or abnormal state, allowing the system to identify log availability during recovery.

[0052] If the comparison result indicates that the marked time is later than the abnormal time, that is, the completion time of data persistence occurs after the abnormal period in the operation log, it means that this part of the abnormal log will not affect the processes such as dual-machine reconstruction, normal data synchronization, master-slave switchover, and master database reconstruction between the master and slave databases.

[0053] At this point, the abnormal flags in the primary database and operation logs can be cleared, or the abnormal state can be switched to the normal state. In this case, no additional data write-to-disk operation is required, which can ensure that the subsequent primary database restart and primary-standby database data synchronization process can be executed normally.

[0054] Furthermore, after updating the corresponding second table data on the physical disk based on the data in the first table, obtaining the updated second table data, and updating the marker time to the target marker time, a state switch command can also be sent to the main database to update the exception log data and log status. It can also update the database status to normal.

[0055] Specifically, marking the log data corresponding to the abnormal period in the operation log as invalid indicates that this part of the log data cannot be used for subsequent data recovery operations. At the same time, the main database switches the log status of the operation log from abnormal to normal, so that newly added logs after the abnormal period can be written and identified normally.

[0056] According to embodiments of the present invention, the above processing can retain log data from abnormal periods for traceability analysis while avoiding the use of expired log data during subsequent restarts or data synchronization between primary and backup databases, thereby improving the fault tolerance and recovery reliability of the database.

[0057] Furthermore, the above process relies on a proactive notification mechanism between the primary database and the management components. It can detect anomaly types in real time without external polling and automatically execute differentiated processing strategies when operation logs are corrupted, by comparing the anomaly time with the marking time. If the marking time is earlier than the anomaly time, the impact of corrupted logs is avoided by proactively writing data to disk. If the marking time is later than the anomaly time, a state switch prevents the corrupted logs from participating in subsequent recovery processes. The entire process is automatically triggered by the management components based on anomaly notifications. Upon receiving the instruction, the primary database completes data writing to disk or status updates online without interrupting service, thus ensuring uninterrupted business operations and improving the continuous availability of the database system.

[0058] According to embodiments of the present invention, the database exception handling method may further include the following operations.

[0059] In response to receiving a notification from the primary database that the update of the second table data is complete, a data sending instruction is sent to the primary database. The data sending instruction is used to instruct the primary database to send the updated second table data to the backup database.

[0060] Once the primary database completes the update operation on the corresponding second table on the physical disk based on the data in the first table in memory, according to the update instructions sent by the management component, the primary database can send an update completion notification for the second table data to the management component. This notification informs the management component that the data write-to-disk operation has been successfully executed, and the data in the second table on the disk has been updated to the latest data in memory.

[0061] Upon receiving the update completion notification, the management component can send a data transmission command to the primary database. This command instructs the primary database to send the updated data in the second table to the standby database. Based on this command, the primary database synchronizes the latest data, now written to disk, to the standby database, ensuring data consistency between the two databases.

[0062] According to an embodiment of the present invention, through the above process, when the primary database subsequently fails or when data synchronization between the primary and backup databases is required, the backup database already has a copy of the table data corresponding to the abnormal part in the operation log. Thus, data synchronization can be achieved without the abnormal part of the log during subsequent data synchronization, further ensuring the reliability and data integrity of business switching.

[0063] According to embodiments of the present invention, the database exception handling method may further include the following operations.

[0064] In the event of an exception type of table data exception, a table data repair command is sent to the main database.

[0065] The table data repair command instructs the main database to perform the following operations: scan the target table data containing anomalies in the first and second table data to identify expired rows in the target table data. Expired rows are historical rows that have been deleted or updated. Identify the data status of each expired row and obtain the identification result. If the identification result indicates that there are abnormal expired rows in the target table data with a visible data status, update the data status of the abnormal expired rows from visible to invisible. Mark the storage space occupied by the expired rows as free. The storage space is either memory or physical disk space.

[0066] The data in the first table and the data in the second table can be from a table file or can be a portion of the data from a table file.

[0067] After receiving a table data repair command, the primary database can scan the target table data for anomalies. During the scan, the primary database can traverse the data pages of the target table, identifying expired rows one by one. Expired rows refer to historical rows that have been marked as invalid through delete or update operations. These rows are no longer referenced by any queries but still occupy storage space.

[0068] For each expired row, the primary database identifies its data status. Data status includes visible and invisible states. Visible status indicates that the row is visible to the current transaction, while invisible status indicates that the row has been marked as invalid. During normal expired row cleanup, expired rows should be in an invisible state. However, in scenarios where table data is corrupted, an anomaly may occur where some expired rows are incorrectly marked as visible. This can cause the system to mistakenly treat these rows as valid data when accessing them, leading to data consistency errors.

[0069] If the identification results indicate that there are abnormally expired rows in the target table that are currently visible, the main database updates the data status of these abnormally expired rows from visible to invisible. After completing the data status update, the main database can mark the storage space occupied by the expired rows as free, so that it can be reused for subsequent insertion or update operations.

[0070] In addition, table data repair commands can also be used to instruct the main database to perform the following operations: update the statistics of the target table data, including the number of data pages, the number of active rows, and the number of expired rows. They also reclaim and reorganize index entries that point to expired rows, ensuring consistency between the indexes and the target table data.

[0071] Throughout the entire repair process, the primary database can continue to provide services without interruption. All repair operations can be completed online without affecting normal read and write operations, thus ensuring uninterrupted business operations. Once the repair is complete, the primary database can send a repair completion notification to the management component, updating the database status to a normal state.

[0072] According to an embodiment of the present invention, table data may become abnormal due to incorrect transaction status markings in some rows. The above-described cleanup logic allows for online correction of data status and space reclamation without interrupting business operations.

[0073] According to embodiments of the present invention, the database exception handling method may further include the following operations.

[0074] In response to receiving a repair failure notification from the primary database, the system controls the primary and backup databases to perform a master-slave switchover so that the backup database can act as the new primary database for business processing. The repair failure notification indicates that the repair command for the target table data has failed to repair the data.

[0075] If the target table data is still abnormal after the repair, and the standby database is detected to be in a normal state, the management component can trigger the primary / standby failover process.

[0076] In addition, before a primary / standby switchover, the management component can instruct the primary database to check for corruption in the operation logs after the latest marked time. If corruption is found, the process of updating the corresponding second table data on the physical disk based on the first table data in memory must be executed first, the data must be written to disk and the marked time updated to ensure the integrity of the data after the marked time, and then the primary / standby switchover can be performed.

[0077] According to an embodiment of the present invention, through the above mechanism, the system can prioritize attempting automated repair to maintain business continuity in the event of table data corruption, and perform primary / backup switchover if the repair fails and the backup database is available, thereby better achieving the stability and reliability of database services.

[0078] According to an embodiment of the present invention, controlling the primary database and the backup database to perform primary / backup switching may include the following operations.

[0079] In response to the backup database storing updated data in the second table, a log transmission instruction is sent to the primary database, instructing the primary database to send the operation log to the backup database. A log application instruction is also sent to the backup database, instructing it to receive and apply the operation log based on the updated data in the second table, ensuring consistency between the backup database's data and the primary database's data. Upon receiving an application completion instruction from the backup database, a promotion instruction is sent to the backup database to upgrade it to the new primary database, and a demotion instruction is sent to the primary database to suspend business processing or convert it to a new backup database.

[0080] Figure 3 The diagram illustrates the interaction between the master database, the backup database, and the management component according to an embodiment of the present invention.

[0081] like Figure 3 As shown, the management component can establish communication connections with both the primary and backup databases. During implementation, the management component can register process identifiers with both the primary and backup databases. Once the primary or backup database detects a data anomaly, it will proactively notify the management component by sending an anomaly notification.

[0082] Once the standby database has completed storing the updated data in the second table, the management component can send a log transmission command to the primary database. Upon receiving this command, the primary database sends the operation logs to the standby database, enabling the standby database to access the incremental log data generated by the primary database after the data has been written to disk.

[0083] During primary / standby failover or data synchronization, the management component can send log application commands to the standby database. Upon receiving the log application command, the standby database applies the received operation logs based on the already stored updated data in the second table. It then replays each data change operation recorded in the logs into the standby database's data files, ensuring that the data in the standby database is consistent with the data in the primary database.

[0084] Both the primary and backup databases can store table data, index data, and log data, including database indexes.

[0085] Figure 4 A schematic diagram of a data synchronization process based on operation logs according to an embodiment of the present invention is shown.

[0086] like Figure 4 As shown, in the primary database, the data files are stored in two copies on disk: one is the data file itself, i.e., the table data, and the other is the operation log. The table data synchronization process in the backup database depends on the operation log of the primary database.

[0087] For example, the operation log may include the following record items: the transaction identifier for creating the record (xmin), the transaction identifier for deleting or updating the record (xmax), the command identifier (cid), the physical location identifier of the data (ctid), and the data change type of the record (data).

[0088] 401 illustrates the process of inserting record 1 for the first time. The transaction identifier for creating the record in record 1 is 10, the transaction identifier for deleting or updating the record is 0 (indicating that the record has not been modified), the command identifier is 0 (indicating that the insert operation is the first command in the current transaction), the physical location identifier of the data is (0, 1), that is, the first entry of data block 0, and the data change type is insert.

[0089] Figure 402 illustrates the process of updating record 1 to generate record 2. In record 2, the transaction identifier for creating the record is 12, the transaction identifier for deleting or updating the record is 0, the command identifier is 0, the physical location identifier of the data is (0, 2), which is the second entry in data block 0, and the data change type is update. Furthermore, in record 1, the transaction identifier for deleting or updating the record is updated to 12, and the physical location identifier of the data is updated to (0, 2), which is the second entry in data block 0.

[0090] Example 403 illustrates the process of updating record 2 to generate record 3. In record 3, the transaction identifier for creating the record is 15, the transaction identifier for deleting or updating the record is 0, the command identifier is 0, the physical location identifier of the data is (1,1), which is the first entry in data block 1, and the data change type is update. Furthermore, the transaction identifier for deleting or updating the record in record 2 is updated to 15, and the physical location identifier of the data is updated to (1,1), which is the first entry in data block 1.

[0091] If the marker time corresponding to the updated second table data already stored in the standby database is before the two update times 402 and 403, the data in the standby database and the primary database can be aligned by replaying the updated second table data at times 402 and 403.

[0092] Furthermore, when data files in the primary database, such as table data, are corrupted, the corrupted data files can also be repaired based on the operation logs. For example, if the second entry (item) in the 0th data block of a table is found to be corrupted, the xmin and xmax values ​​of that data block in the operation log can be checked. A redo operation can then be performed based on the log records associated with the corrupted data, thereby repairing the corrupted data in the data block. For instance, if the second entry in the 0th data block of a table is corrupted, two related log records, 402 and 403, need to be used for verification and repair to achieve the goal of repairing the corrupted file.

[0093] Once the standby database has completed log application, it can send an application completion command to the management component. Upon receiving this command, the management component sends a promotion command to the standby database. The standby database is then promoted to the new primary database based on this command and begins handling read and write requests. Simultaneously, the management component can send a demotion command to the original primary database. Upon receiving this command, the original primary database either suspends business processing or becomes a new standby database, ceasing to provide read and write services and instead operating as a synchronous standby database for the new primary database.

[0094] According to an embodiment of the present invention, the above process enables a smooth primary / backup switchover in the event of table data repair failure, thereby improving the high availability and data consistency of the database service.

[0095] Furthermore, if an anomaly notification is received from the standby database, a dual-machine rebuild between the primary and standby databases can be performed. This involves rebuilding the database using data from the primary database or synchronizing data from the standby database. Dual-machine database rebuilds should also incorporate protective measures to prevent rebuilding the standby database with corrupted database files. Specifically, during a dual-machine rebuild, the primary database should be checked for anomalies first; if no anomalies are found, rebuilding is permitted.

[0096] According to embodiments of the present invention, the database exception handling method may further include the following operations.

[0097] In the event of a database index exception, an index rebuild instruction is sent to the primary database. The index rebuild instruction is used to instruct the primary database to rebuild the database index based on the table data in the primary database.

[0098] There are no restrictions on the index rebuild command; it can be any command, such as `reindex`. `reindex` is a command used to rebuild indexes. When an index experiences performance degradation or query errors due to data corruption, bloat, or outdated statistics, the `reindex` operation can delete the old index data and rebuild the index to restore its correctness and query efficiency.

[0099] According to an embodiment of the present invention, since the reconstruction process supports online concurrent mode, the main database can continue to provide services without interruption during index reconstruction, and read and write operations can continue normally, thus achieving index recovery without interruption of business operations.

[0100] According to embodiments of the present invention, the database exception handling method may further include the following operations.

[0101] In response to an exception type indicating a primary database restart, a marker file query instruction is sent to the primary database. This instruction instructs the primary database to perform a marker file query, where the marker file is generated by the primary database during a normal shutdown. In response to receiving a query result from the primary database indicating that the marker file does not exist, it is determined that the primary database restarted after a power outage, and the primary database is instructed to perform anomaly detection on the operation log, the data in the first table, the data in the second table, and the database indexes.

[0102] During the normal shutdown process of the main database, a marker file can be created in the specified directory. This marker file is used to identify whether the database is shut down normally or due to an abnormal power failure.

[0103] If the management component receives a query result indicating that the marker file does not exist, it determines that the primary database was restarted after a power outage, which is an abnormal shutdown scenario. In this case, the management component can instruct the primary database to perform anomaly checks on the operation logs, data in the first table, data in the second table, and database indexes.

[0104] For example, the main database starts its internal anomaly monitoring module, which calls a dedicated verification interface to perform anomaly checks on operation logs, table data, and database indexes. This allows for the timely detection of data corruption that may result from abnormal power outages, and sends anomaly notifications to the management component, triggering the appropriate repair process. Through this proactive detection and repair mechanism, potential data corruption can be identified and addressed early, improving the database's reliability in abnormal power outage scenarios.

[0105] If the management component receives a query result indicating the existence of a marker file, it can be determined that the primary database restarted normally after a shutdown, without needing to initiate an anomaly detection process. The management component can send a marker file deletion command to the primary database, which will then delete the marker file and regenerate it the next time the database monitoring module stops the database, thus maintaining the effectiveness of the marker file mechanism.

[0106] In addition, in some embodiments, the tag file may also be generated by the management component and transmitted to the main database for storage, without limitation.

[0107] According to embodiments of the present invention, by generating a marker file when the main database stops normally and determining the stop type based on the existence of the file during startup, accurate differentiation of database restart scenarios is achieved. This enables timely triggering of comprehensive detection and automatic repair in abnormal power outage scenarios, allowing potential data corruption problems to be detected and handled early, thereby improving the reliability of the database under abnormal shutdown scenarios.

[0108] In some embodiments, the primary and standby databases can be checked periodically each day during periods of low business traffic, such as early morning. If the primary database is damaged, an attempt is made to repair it; if repair is unsuccessful, a primary-standby switchover is performed. If the standby database is damaged, a primary-standby dual-machine rebuild is performed.

[0109] According to embodiments of the present invention, the above method can promptly detect database corruption and take appropriate protective measures to prevent business losses due to database corruption, and automatically repair the damaged database in a timely manner.

[0110] Based on the above-mentioned database exception handling method, this invention also provides a database exception handling device. The following will be combined with... Figure 5 The device is described in detail.

[0111] Figure 5 A structural block diagram of a database exception handling apparatus according to an embodiment of the present invention is shown.

[0112] like Figure 5 As shown, the database exception handling device 500 of this embodiment includes a determination module 510, a comparison module 520, and a sending module 530.

[0113] The determination module 510 is used to respond to receiving an exception notification sent by the main database, determine the type of exception triggered by the main database based on the exception notification, the main database stores process identifiers pre-registered by the management component, and sends the exception notification to the management component based on the process identifier when an exception is detected.

[0114] The comparison module 520 is used to compare the time of the exception in the operation log in the exception notification with the time of the mark when the exception type is that there is an exception in the operation log used to record the data change process in the main database. The time of the mark is the time when the data of the first table stored in memory in the main database is used to update the corresponding data of the second table in the physical disk during the business processing.

[0115] The sending module 530 is used to send an update instruction to the main database when the comparison result indicates that the marker time is earlier than the abnormal time. The update instruction is used to instruct the main database to update the corresponding second table data in the physical disk based on the first table data, obtain the updated second table data, and update the marker time to the target marker time, so that in the event of a restart of the main database or data synchronization between the main database and the backup database, data loading is performed based on the updated second table data and / or log data in the operation log that is located after the target marker time.

[0116] According to an embodiment of the present invention, the abnormal time is the start time of the abnormal period in which the operation log becomes abnormal. The operation log has a corresponding status flag. The database exception handling device 500 further includes a flagging module.

[0117] The marking module is used to send a state switching instruction to the main database when the comparison result indicates that the marking time is later than the abnormal time. The state switching instruction is used to instruct the main database to mark the log data in the operation log corresponding to the abnormal period as invalid and switch the log status of the operation log from abnormal status to normal status.

[0118] According to an embodiment of the present invention, the database exception handling device 500 further includes a data sending module.

[0119] The data sending module is used to send a data sending instruction to the master database in response to receiving a notification from the master database that the update of the second table data is complete. The data sending instruction is used to instruct the master database to send the updated second table data to the backup database.

[0120] According to an embodiment of the present invention, the database exception handling device 500 further includes a data repair module.

[0121] The data repair module sends a table data repair command to the main database when the anomaly type is table data anomaly. This command instructs the main database to perform the following operations: scan the target table data (first and second tables) for anomalies to identify expired rows (deleted or updated historical rows); identify the data status of each expired row; if the identification results indicate the presence of anomaly expired rows in the target table with a visible data status, update the data status of these rows from visible to invisible; and mark the storage space occupied by the expired rows as free (memory or physical disk space).

[0122] According to an embodiment of the present invention, the database exception handling device 500 further includes a primary / backup switching module.

[0123] The master-slave switchover module is used to respond to a repair failure notification received from the master database and control the master and slave databases to switch over so that the slave database can act as the new master database for business processing. The repair failure notification indicates that the repair command for the target table data has failed to repair.

[0124] According to an embodiment of the present invention, the primary / standby switching module includes: a first switching submodule, a second switching submodule, and a third switching submodule.

[0125] The first switching submodule is used to send a log transmission command to the primary database in response to the backup database storing updated data in the second table. The log transmission command is used to instruct the primary database to send the operation log to the backup database.

[0126] The second switching submodule is used to send log application instructions to the standby database. The log application instructions are used to instruct the standby database to receive and apply operation logs based on the updated data in the second table, so that the data in the standby database is consistent with the data in the primary database.

[0127] The third switching submodule is used to respond to the application completion instruction received from the backup database, send a promotion instruction to the backup database to upgrade the backup database to the new primary database, and send a demotion instruction to the primary database to suspend business processing or switch to the new backup database.

[0128] According to an embodiment of the present invention, the database exception handling device 500 further includes a reconstruction module.

[0129] The rebuild module is used to send an index rebuild instruction to the main database when the exception type is a database index exception. The index rebuild instruction is used to instruct the main database to rebuild the database index based on the table data in the main database.

[0130] According to an embodiment of the present invention, the database exception handling device 500 further includes a query module and a detection module.

[0131] The query module is used to respond to an exception type when the master database restarts by sending a marker file query command to the master database. The marker file query command instructs the master database to perform a marker file query. The marker file is generated by the master database under normal shutdown conditions.

[0132] The detection module is used to respond to a query result from the main database indicating that a marker file does not exist, determine that the main database has been restarted after a power outage, and instruct the main database to perform anomaly detection on the operation log, the data in the first table, the data in the second table, and the database indexes.

[0133] According to embodiments of the present invention, any plurality of modules among the determining module 510, comparing module 520, and transmitting module 530 may be combined into one module, or any one of these modules may be split into multiple modules. Alternatively, at least a portion of the functionality of one or more of these modules may be combined with at least a portion of the functionality of other modules and implemented in one module. According to embodiments of the present invention, at least one of the determining module 510, comparing module 520, and transmitting module 530 may be at least partially implemented as hardware circuitry, such as a field-programmable gate array (FPGA), a programmable logic array (PLA), a system-on-a-chip, a system-on-a-substrate, a system-on-package, an application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuitry, or implemented in software, hardware, or firmware, or in any suitable combination of any of these three implementation methods. Alternatively, at least one of the determining module 510, comparing module 520, and transmitting module 530 may be at least partially implemented as a computer program module, which, when run, can perform corresponding functions.

[0134] Figure 6 A block diagram of an electronic device suitable for implementing an exception handling method for a database according to an embodiment of the present invention is shown.

[0135] like Figure 6As shown, an electronic device 600 according to an embodiment of the present invention includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a read-only memory ROM 602 or a program loaded from a storage portion 608 into a random access memory RAM 603. The processor 601 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 601 may also include onboard memory for caching purposes. The processor 601 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of the present invention.

[0136] RAM 603 stores various programs and data required for the operation of electronic device 600. Processor 601, ROM 602, and RAM 603 are interconnected via bus 604. Processor 601 executes various operations of the method flow according to embodiments of the present invention by executing programs in ROM 602 and / or RAM 603. It should be noted that the programs may also be stored in one or more memories other than ROM 602 and RAM 603. Processor 601 may also execute various operations of the method flow according to embodiments of the present invention by executing programs stored in said one or more memories.

[0137] According to an embodiment of the present invention, the electronic device 600 may further include an input / output (I / O) interface 605, which is also connected to a bus 604. The electronic device 600 may also include one or more of the following components connected to the input / output (I / O) interface 605: an input section 606 including a keyboard, mouse, etc.; an output section 607 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc.; and a communication section 609 including a network interface card such as a LAN card, modem, etc. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the input / output (I / O) interface 605 as needed. A removable medium 611, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 610 as needed so that computer programs read from it can be installed into the storage section 608 as needed.

[0138] The present invention also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs, which, when executed, implement the method according to the embodiments of the present invention.

[0139] According to embodiments of the present invention, a computer-readable storage medium may be a non-volatile computer-readable storage medium, such as including, but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In the present invention, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of the present invention, a computer-readable storage medium may include ROM 602 and / or RAM 603 and / or one or more memories other than ROM 602 and RAM 603 described above.

[0140] Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code is used to enable the computer system to implement the database exception handling method provided in the embodiments of the present invention.

[0141] When the computer program is executed by the processor 601, it performs the functions defined in the system / apparatus of this invention. According to embodiments of the invention, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0142] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and downloaded and installed via the communication section 609, and / or installed from the removable medium 611. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.

[0143] In such an embodiment, the computer program can be downloaded and installed from a network via the communication section 609, and / or installed from the removable medium 611. When the computer program is executed by the processor 601, it performs the functions defined in the system of this embodiment of the invention. According to embodiments of the invention, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0144] According to embodiments of the present invention, program code for executing the computer programs provided in the embodiments of the present invention can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages ​​include, but are not limited to, languages ​​such as Java, C++, Python, "C", or similar programming languages. The program code can be executed entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0145] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0146] Those skilled in the art will understand that the features described in the various embodiments of the present invention can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in the present invention. In particular, the features described in the various embodiments of the present invention can be combined and / or combined in various ways without departing from the spirit and teachings of the present invention. All such combinations and / or combinations fall within the scope of the present invention.

[0147] The embodiments of the present invention have been described above. However, these embodiments are merely illustrative and not intended to limit the scope of the invention. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of the invention, and all such substitutions and modifications should fall within the scope of the invention.

Claims

1. A database exception handling method, characterized in that, Applied to a management component, the method includes: In response to receiving an exception notification from the main database, the exception type triggered by the main database is determined based on the exception notification. The main database stores process identifiers pre-registered by the management component, and sends the exception notification to the management component based on the process identifier when an exception is detected. When the exception type is that the operation log used to record the data change process in the main database has an exception, the exception time in the operation log in the exception notification is compared with the marking time to obtain the comparison result. The marking time is the time when the data of the first table stored in memory by the main database during the business processing is used to update the corresponding data of the second table in the physical disk. If the comparison result indicates that the marked time is earlier than the abnormal time, an update instruction is sent to the primary database. The update instruction is used to instruct the primary database to update the corresponding second table data in the physical disk based on the first table data to obtain updated second table data, and update the marked time to the target marked time, so that in the event that the primary database restarts or the primary database synchronizes with the backup database, data loading is performed based on the updated second table data and / or log data in the operation log that is located after the target marked time.

2. The anomaly handling method according to claim 1, characterized in that, The abnormal time is the start time of the abnormal period in which the operation log shows an abnormality; The operation log has a corresponding status flag; the method further includes: If the comparison result indicates that the marking time is later than the abnormal time, a state switching instruction is sent to the master database. The state switching instruction is used to instruct the master database to mark the log data in the operation log corresponding to the abnormal time period as invalid and switch the log status of the operation log from abnormal status to normal status.

3. The anomaly handling method according to claim 1 or 2, characterized in that, The method further includes: In response to receiving a notification from the primary database that the update of the second table data is complete, a data sending instruction is sent to the primary database, the data sending instruction being used to instruct the primary database to send the updated second table data to the backup database.

4. The anomaly handling method according to claim 1, characterized in that, The method further includes: In the case where the exception type is table data exception, a table data repair instruction is sent to the main database. The table data repair instruction is used to instruct the main database to perform the following operations: Scan the target table data containing anomalies in the first table data and the second table data to identify expired row data in the target table data, wherein the expired row data is historical row data that has been deleted or updated; The data status of each expired row data is identified to obtain the identification result; If the identification result indicates that there are abnormal expired rows in the target table data with a visible data status, the data status of the abnormal expired rows will be updated from visible to invisible. The storage space occupied by the expired row data is marked as free, and the storage space is the space in the memory or the physical disk.

5. The anomaly handling method according to claim 4, characterized in that, The method further includes: In response to receiving a repair failure notification from the primary database, the system controls the primary database and the backup database to perform a primary-backup switch, so that the backup database can act as the new primary database for business processing. The repair failure notification indicates that the repair result of the table data repair instruction on the target table data is a repair failure.

6. The anomaly handling method according to claim 5, characterized in that, The control of the primary database and the backup database to perform primary / backup switchover includes: In response to the fact that the backup database stores the updated second table data, a log transmission instruction is sent to the primary database, the log transmission instruction being used to instruct the primary database to send the operation log to the backup database; Send a log application instruction to the backup database. The log application instruction is used to instruct the backup database to receive and apply the operation log based on the updated data of the second table, so that the data of the backup database is consistent with the data of the primary database. In response to receiving an application completion instruction from the backup database, a promotion instruction is sent to the backup database to upgrade it to a new primary database, and a demotion instruction is sent to the primary database to suspend business processing or convert it to a new backup database.

7. The anomaly handling method according to claim 1, characterized in that, The method further includes: In the case where the exception type is a database index exception, an index rebuild instruction is sent to the main database. The index rebuild instruction is used to instruct the main database to rebuild the database index based on the table data in the main database.

8. The anomaly handling method according to claim 1, characterized in that, The method further includes: In response to the exception type being a primary database restart, a marker file query instruction is sent to the primary database. The marker file query instruction is used to instruct the primary database to perform a marker file query, wherein the marker file is generated by the primary database during a normal shutdown. In response to receiving a query result from the main database indicating that the marker file does not exist, the system determines that the main database has been restarted after a power outage and instructs the main database to perform anomaly detection on the operation log, the data in the first table, the data in the second table, and the database indexes.

9. An exception handling device for a database, characterized in that, Applied to a management component, the device includes: The determination module is used to respond to receiving an exception notification sent by the main database, determine the type of exception triggered by the main database based on the exception notification, wherein the main database stores process identifiers pre-registered by the management component, and sends the exception notification to the management component based on the process identifier when an exception is detected; The comparison module is used to compare the time of the exception in the operation log of the exception notification with the time of the mark when the exception type is that there is an exception in the operation log used to record the data change process in the main database, and obtain the comparison result. The time of the mark is the time when the data of the first table stored in memory in the main database is used to update the corresponding data of the second table in the physical disk during the business processing. The sending module is configured to send an update instruction to the main database when the comparison result indicates that the marked time is earlier than the abnormal time. The update instruction is configured to instruct the main database to update the corresponding second table data in the physical disk based on the first table data to obtain updated second table data, and update the marked time to the target marked time, so that when the main database restarts or the main database and the backup database perform data synchronization, data loading is performed based on the updated second table data and / or log data in the operation log that is located after the target marked time.

10. An electronic device, comprising: One or more processors; Memory, used to store one or more computer programs. The characteristic feature is that the one or more processors execute the one or more computer programs to implement the steps of the exception handling method according to any one of claims 1 to 9.