A data recovery method, system, device and storage medium of a database

CN122220149APending Publication Date: 2026-06-16TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2024-12-12
Publication Date
2026-06-16

Smart Images

  • Figure CN122220149A_ABST
    Figure CN122220149A_ABST
Patent Text Reader

Abstract

The application discloses a kind of data recovery method, system, equipment and storage medium of database, the log stream of database is parsed, first log item containing the source data of database is determined, in turn with the table name of each first log item data table to which source data belongs, label pair, the first log sequence number of first log item and the first time stamp corresponding to the source data contained in first log item as different index level, establish first index data;Then, according to the data recovery request of target object, determine the target deletion operation for data recovery, then generate corresponding recovery query information, according to recovery query information, matching query is carried out in first index data, the target log item matched can be conveniently determined, using the source data in target log item, data recovery processing is carried out to database.The application can accurately realize data recovery processing, improve processing efficiency.The technical scheme of the application can be widely applied in the field of database technology.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of database technology, and in particular to a database data recovery method, system, device and storage medium. Background Technology

[0002] A database is a repository for organizing, storing, and managing data according to a specified data structure, stored electronically within a computer system. Generally, databases store data in a series of tables using rows and columns, supporting convenient access, management, modification, and organization of data. During database use, data may be accidentally deleted due to operational errors or program mistakes, affecting the normal operation of business.

[0003] In related technologies, data recovery is required for accidentally deleted data in a database. Currently, a common data recovery strategy uses a cold backup and log stream as data sources, starting an additional database instance. The cold backup is imported first, and then the log stream is used to restore the data to a specified point in time. However, in practice, the amount of data contained in the log stream is often large, leading to significant redundant data recovery in this approach, resulting in a long processing cycle and low efficiency. Furthermore, the additional database instance is not the actual affected database instance, requiring further data transfer, which makes database use inconvenient and prone to anomalies. Summary of the Invention

[0004] This application provides a database data recovery method, system, device, and storage medium, which can perform data recovery processing locally in the database, reduce the probability of database anomalies, and improve the efficiency of recovery processing.

[0005] One aspect of this application provides a database data recovery method, including:

[0006] The log stream of the database is parsed to determine the first log entry containing the source data of the database;

[0007] Based on the first log entry, first index data is established; wherein, the index hierarchy of the first index data is, in order, the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log entry, and the first timestamp corresponding to the source data contained in the first log entry;

[0008] In response to a data recovery request from a target object for the database, it is determined that the target object specifies a target deletion operation for data recovery;

[0009] Based on the target deletion operation, recovery query information is determined; wherein, the recovery query information includes at least one of the attribute information or time range information of the data table corresponding to the target deletion operation, and the attribute information includes the table name or the tag pair;

[0010] Based on the recovery query information and the first index data, a matching target log entry is determined in the first log entry;

[0011] Data recovery processing is performed on the database based on the source data contained in the target log item.

[0012] On the other hand, embodiments of this application provide a database data recovery system, including:

[0013] The parsing unit is used to parse the log stream of the database and determine the first log item containing the source data of the database;

[0014] The establishment unit is used to establish first index data based on the first log item; wherein, the index hierarchy of the first index data is, in order, the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log item, and the first timestamp corresponding to the source data contained in the first log item;

[0015] A response unit is configured to, in response to a data recovery request from a target object for the database, determine the target deletion operation specified by the target object for data recovery;

[0016] An execution unit is configured to determine recovery query information based on the target deletion operation; wherein the recovery query information includes at least one of attribute information or time range information of the data table corresponding to the target deletion operation, and the attribute information includes the table name or the tag pair;

[0017] The matching unit is used to determine the target log entry to match in the first log entry based on the recovery query information and the first index data;

[0018] The processing unit is used to perform data recovery processing on the database based on the source data contained in the target log item.

[0019] Optionally, in some embodiments, the execution unit is specifically used for:

[0020] Detect the deletion type of the target deletion operation;

[0021] If the deletion type of the target deletion operation is table deletion, the first recovery query information is generated based on the table name and time range information of the data table corresponding to the target deletion operation;

[0022] If the deletion type of the target deletion operation is timeline deletion, a second recovery query is generated based on the label pairs and time range information of the data table corresponding to the target deletion operation.

[0023] Optionally, in some embodiments, the matching unit is specifically used for:

[0024] Using the table name and time range information in the first recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data; or, using the tag pair and time range information in the second recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data.

[0025] The first log entry corresponding to the first log sequence number is identified as the matching target log entry.

[0026] Optionally, in some embodiments, the database data recovery system further includes a second establishment unit, which is specifically used for:

[0027] The log stream of the database is parsed to determine the second log entry corresponding to the historical deletion operation;

[0028] Based on the second log entry, a second index data is established; wherein, the index hierarchy of the second index data is, in order, the second log sequence number of the second log entry and the second timestamp corresponding to the historical deletion operation in the second log entry.

[0029] Optionally, in some embodiments, the response unit is specifically used for:

[0030] In response to the data recovery request from the target object for the database, determine the historical time period for which the target object is to perform data recovery;

[0031] Query the second timestamp that belongs to the historical time period in the second index data, and determine the second target sequence number by the second log sequence number corresponding to the second timestamp that belongs to the historical time period.

[0032] The historical deletion operation corresponding to the second target serial number is identified as the target deletion operation for data recovery specified for the target object.

[0033] Optionally, in some embodiments, the data recovery system for the database further includes an encoding unit, which is specifically used for:

[0034] The frequency of occurrence of each field in the first index data is detected, and the fields whose frequency of occurrence is greater than a preset threshold are identified as target fields.

[0035] Different identifier information is assigned to each of the target fields, and the association between the target fields and the identifier information is stored; wherein the number of characters in the identifier information is less than the number of characters in the target field;

[0036] The target field is replaced by the identification information to obtain the updated first index data.

[0037] Optionally, in some embodiments, the processing unit is specifically used for:

[0038] The source data contained in the target log item is merged to obtain the target data;

[0039] Based on the target data, perform data recovery processing on the database.

[0040] Optionally, in some embodiments, the processing unit is specifically used for:

[0041] Detect the table name, label pairs, indicator value labels, and first timestamp of each data table to which the source data belongs;

[0042] If multiple first data tables have the same table name, label pair, indicator value label, and first timestamp, the first log sequence number of the target log item where each first data is located is detected; wherein, the first data is any source data;

[0043] Compare the size of the first log sequence number corresponding to each of the first data;

[0044] The first data corresponding to the largest first log sequence number is retained, and the other first data are discarded.

[0045] Optionally, in some embodiments, the processing unit is specifically used for:

[0046] Based on the recovery query information, a query is performed in the cold backup corresponding to the database to obtain the second data;

[0047] The source data and the second data are merged to obtain the target data;

[0048] Based on the target data, perform data recovery processing on the database.

[0049] Optionally, in some embodiments, the processing unit is specifically used for:

[0050] Obtain the time interval information corresponding to each data shard in the database;

[0051] Based on the time interval information, the target data is divided to obtain a data subset corresponding to each data slice;

[0052] According to the corresponding time interval information, each data subset is imported into the data shards of the database.

[0053] Optionally, in some embodiments, the processing unit is specifically used for:

[0054] Query the format information of the memory table used by each of the data shards;

[0055] Based on the format information, the data subset corresponding to the data fragment is format converted;

[0056] The data subset obtained after format conversion is imported into the corresponding data fragment.

[0057] On the other hand, embodiments of this application provide an electronic device, including a processor and a memory;

[0058] The memory is used to store computer programs;

[0059] The processor executes the computer program to implement the aforementioned database data recovery method.

[0060] On the other hand, embodiments of this application provide a computer-readable storage medium storing a computer program that is executed by a processor to implement the aforementioned database data recovery method.

[0061] On the other hand, embodiments of this application also provide a computer program product, which includes a computer program stored in a computer-readable storage medium. The processor of a computer device reads the computer program from the computer-readable storage medium and executes the computer program, causing the computer device to perform the aforementioned database data recovery method.

[0062] The embodiments of this application include at least the following beneficial effects: This application provides a database data recovery method, system, device, and storage medium. This application parses the database log stream to determine the first log item containing the database source data. First index data is established by sequentially using the table name of the data table to which the source data belongs in each first log item, the tag pair, the first log sequence number of the first log item, and the first timestamp corresponding to the source data contained in the first log item as different index levels. Then, based on the data recovery request of the target object, the target deletion operation for data recovery is determined, and corresponding recovery query information is generated. Based on the recovery query information, a matching query is performed in the first index data, which can easily determine the matching target log item. Thus, the source data in the target log item can be used to perform data recovery processing on the database. This application establishes first index data based on the database log stream, which facilitates finding the mistakenly deleted source data according to the target deletion operation specified by the target object during subsequent data recovery processing. This allows for accurate data recovery processing, reduces the amount of data to be processed, and improves processing efficiency. Furthermore, performing data recovery processing locally on the database reduces the probability of database anomalies and improves the user experience. Attached Figure Description

[0063] The accompanying drawings are used to provide a further understanding of the technical solutions of this application and constitute a part of the specification. They are used together with the embodiments of this application to explain the technical solutions of this application and do not constitute a limitation on the technical solutions of this application.

[0064] Figure 1 This is a schematic diagram illustrating an application of data recovery processing in related technologies.

[0065] Figure 2 This is a system architecture diagram of a database data recovery method provided in the embodiments of this application;

[0066] Figure 3 This is a schematic diagram illustrating a version update and iteration of a game application provided in this embodiment of the application;

[0067] Figure 4 This is a flowchart illustrating a database data recovery method provided in an embodiment of this application.

[0068] Figure 5 This is a schematic diagram of a first index data provided in an embodiment of this application;

[0069] Figure 6 This is a schematic diagram illustrating a target deletion operation provided in an embodiment of this application;

[0070] Figure 7 This is a schematic diagram of a database architecture provided in an embodiment of this application;

[0071] Figure 8 This is a schematic diagram illustrating the principle of a log summary generation module provided in an embodiment of this application;

[0072] Figure 9 This is a schematic diagram of a data recovery table provided in an embodiment of this application;

[0073] Figure 10 This is a schematic diagram illustrating how data is imported into different data fragments, as provided in an embodiment of this application.

[0074] Figure 11 This is a schematic diagram illustrating a data import into a memory table provided in an embodiment of this application;

[0075] Figure 12 This is a structural block diagram of a database data recovery system provided in the embodiments of this application;

[0076] Figure 13 This is a structural block diagram of an electronic device provided in an embodiment of this application. Detailed Implementation

[0077] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0078] It is understood that the terms “first,” “second,” etc., used in this application may be used to describe various concepts herein, but unless otherwise stated, these concepts are not limited by these terms. These terms are used only to distinguish one concept from another.

[0079] As used in this application, the terms "at least one", "multiple", "each", "any", etc., "at least one" includes one, two or more, "multiple" includes two or more, "each" refers to each of the corresponding multiples, and "any" refers to any one of the multiples.

[0080] Before providing a further detailed description of the embodiments of this application, the nouns and terms used in the embodiments of this application are explained, and the nouns and terms used in the embodiments of this application shall be interpreted as follows:

[0081] 1) A Database Management System (DBMS) is a software system that can be used to create, maintain, and manage databases.

[0082] 2) Write-Ahead Log (WAL) is a database technology used to ensure data consistency and recoverability. The core idea of ​​WAL is to record any changes in a separate log file before writing them to the actual data files. The purpose of this is to allow recovery of incomplete transactions from the log file in the event of a system failure, thereby guaranteeing data consistency and integrity.

[0083] 3) Time Series Database (TSDB): A database specifically designed for storing and processing time-series data. Time-series data typically features high write frequency and large volume, such as sensor data from IoT devices, transaction records from financial markets, and server monitoring metrics.

[0084] 4) Log Sequence Number (LSN): A unique identifier used in a database management system to identify log entries. LSNs are typically identified using incrementing numbers and play a crucial role in database recovery mechanisms and transaction consistency.

[0085] 5) MemTable: An in-memory data structure used to store data before it is flushed to the SST file. It can be used for both reading and writing.

[0086] 6) SST (Sorted String Table) file is a data storage format used in distributed database systems. It is mainly used to persist sorted data to disk for efficient read operations.

[0087] A database is a repository for organizing, storing, and managing data according to a specified data structure, stored electronically within a computer system. Generally, databases store data in a series of tables using rows and columns, supporting convenient access, management, modification, and organization of data. In the current field of computer technology, databases are widely used and play a vital role in various scenarios.

[0088] For example, in e-commerce applications, online shopping platforms often build databases to store registered account information, merchant information, product data, order data, and other related information (such as shopper reviews and interactions between shoppers and merchants). In financial services applications, banking systems and financial transaction platforms often build databases to store customer account information, balances, and transaction records. This information can be used to provide related financial services or to assist in customer credit assessment and risk management. In social media applications, social media platforms often build databases to store information about social media accounts, such as profile pictures, avatars, and interests, as well as records of posts, articles, images, and videos published by these accounts. Social media platform databases can also store relationships between social media accounts, such as following lists and friend relationships, and chat data between various social media accounts.

[0089] It is understandable that the data types and uses of databases may differ in different application scenarios. Furthermore, the actual application scenarios of databases are not limited to the types shown above. For example, in other examples, databases can also be used in application scenarios such as game services, the Internet of Things, education platforms, logistics management, and healthcare, which will not be elaborated upon here.

[0090] In the process of using a database, data may be accidentally deleted. There are many factors that can cause accidental data deletion. For example, in some scenarios, it may be due to human error. For instance, a database administrator might select the wrong data when performing a delete operation. The administrator's intention might be to delete a specific record in the table, but due to an interface error, multiple records might be accidentally selected and deleted. Alternatively, errors might occur in the related command statements during the delete operation, such as forgetting to add a scope-limiting clause, resulting in the deletion of the entire table. In other scenarios, it may be due to system errors. For example, logical errors might occur in the application associated with the database, leading to incorrect condition judgments and the accidental deletion of data that should not be deleted. Or, the database management system itself might have some malfunctions or performance issues, resulting in data loss or corruption, thus causing accidental deletion. Of course, it should be noted that the actual situations involving accidental data deletion are not limited to the above scenarios. In this embodiment, the factors related to accidental deletion for data recovery are not limited.

[0091] Accidental deletion of data in a database can trigger a series of problems, impacting normal business operations. For example, in some scenarios, accidental data deletion may lead to business interruption. For instance, if order data is accidentally deleted from the database of an online retail website, it will directly affect subsequent order processing, causing the business to malfunction. In some embodiments, accidental data deletion may also cause losses to customers. For example, in financial services, it may result in direct financial losses for customers, and in social media services, it may lead to the loss of interactive information, affecting communication efficiency.

[0092] Therefore, in related technologies, data recovery is often performed when data is accidentally deleted from a database. For example, please refer to... Figure 1 , Figure 1 This diagram illustrates an application of a data recovery process provided in related technologies. Figure 1 In the illustrated technical solution, the database uses cold backups combined with log streams as backup data during runtime. The data recovery strategy involves using the cold backup and log streams as data sources, starting another database instance outside the original database instance, importing the cold backup first, and then using the log streams to restore the data to a specified point in time. For example... Figure 1 As shown, for example, a database performed a cold backup at 20:00, and the relevant data was saved as a snapshot. Data after 20:00 is initially stored as a log stream. Now, suppose we discover that the database has accidentally deleted data, and we want to restore the data to its state at 22:00. During the recovery process, another database instance (Instance 2) will be created outside the original database instance (Instance 1), and the cold backup data from 20:00 will be imported into Instance 2. Then, using the log data from 20:00 to 22:00, the data in Instance 2 will be adjusted to the state at 22:00, thus achieving database data recovery.

[0093] In practical applications, it has been found that, Figure 1 The technical solutions shown often involve large amounts of data in the log stream. Even if only a portion of the full log stream is used during data recovery, a significant amount of redundant data (i.e., data that wasn't originally deleted) still needs processing, resulting in a long overall processing cycle and low recovery efficiency. Furthermore, this implementation requires starting another database instance on top of the existing one. The resulting database instance is not the original, affected instance. Using the newly created instance may require numerous additional operations to integrate with business logic. Transferring data between database instances can also lead to inconvenience and increase the risk of errors.

[0094] In view of this, this application provides a database data recovery method, system, device, and storage medium. This application parses the database log stream to determine the first log item containing the database's source data. First index data is established by sequentially using the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log item, and the first timestamp corresponding to the source data contained in the first log item as different index levels. Next, the target deletion operation for data recovery is determined based on the target object's data recovery request. Then, corresponding recovery query information is generated. Based on the recovery query information, a matching query is performed in the first index data, which can easily determine the matching target log item. Thus, the source data in the target log item can be used to perform data recovery processing on the database. This application establishes first index data based on the database log stream, which facilitates finding the mistakenly deleted source data according to the target deletion operation specified by the target object during subsequent data recovery processing. This allows for accurate data recovery processing, reduces the amount of data to be processed, and improves processing efficiency. Furthermore, performing data recovery processing locally on the database reduces the probability of database anomalies and improves the user experience.

[0095] System architecture and scenario description used in the embodiments of this application

[0096] Figure 2 This is a system architecture diagram of a database data recovery method provided in this application embodiment, which includes a terminal device 240, an Internet 230, a gateway 220, a backend server 210, etc.

[0097] In this embodiment, the terminal device 240 can install and run related applications, such as video playback applications and browser applications. Users of the terminal device can initiate resource acquisition requests for related resources based on the applications on the terminal device 240. The terminal device 240 can take various forms, including desktop computers, laptops, PDAs (personal digital assistants), mobile phones, in-vehicle terminals, home theater terminals, and dedicated terminals. Furthermore, it can be a single device or a collection of multiple devices. The terminal device 240 can communicate with the Internet 230 via wired or wireless means to exchange data.

[0098] A backend server 210 refers to a computer system that can provide certain services to terminal devices 240. Compared to ordinary terminal devices 240, backend servers 210 have higher requirements in terms of stability, security, and performance. A backend server 210 can be a single high-performance computer in a network platform, a cluster of multiple high-performance computers, a portion of a single high-performance computer (e.g., a virtual machine), or a combination of portions of multiple high-performance computers (e.g., virtual machines).

[0099] Gateway 220, also known as an internetwork connector or protocol converter, is a computer system or device that acts as a translator, enabling network interconnection at the transport layer. It bridges the gap between two systems using different communication protocols, data formats, languages, or even completely different architectures. Gateways can also provide filtering and security functions. Messages sent from terminal device 240 to backend server 210 are forwarded to the corresponding backend server 210 via gateway 220. Messages sent from backend server 210 to terminal device 240 are also forwarded to the corresponding terminal device 240 via gateway 220.

[0100] The database data recovery method provided in this application embodiment can be executed on a backend server 210. The backend server 210 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

[0101] The database data recovery method provided in this application embodiment can be executed in various scenarios, and the following is an exemplary description of it.

[0102] (a) Scenarios for game applications

[0103] The method described in this application embodiment can be applied to game application scenarios.

[0104] For example, consider a game operator offering an online multiplayer game where players can create characters, purchase virtual items, and participate in various activities. During the game's operation, frequent version updates and iterations are required. (See reference...) Figure 3 , Figure 3 This illustration shows a schematic diagram of a game application undergoing version updates and iterations, as provided in an embodiment of this application. During version updates and iterations of the game application, due to the large amount of program content updates, there may be issues such as accidental data deletion in the game application's database. This could lead to problems such as loss of player progress and disappearance of virtual items, severely impacting the gameplay experience and game operation.

[0105] Based on the data recovery method in this application embodiment, when an anomaly is detected during a version update, i.e., when data is accidentally deleted, timely data recovery can be performed. For example, the data can be restored to its state before the version update, implementing a rollback mechanism. This effectively ensures the reliability of game application updates, reduces the probability of business anomalies due to data loss, and improves the stability of game operation.

[0106] (II) Online shopping scenarios

[0107] Currently, online shopping is widely popular. E-commerce websites typically contain a large number of product images, videos, and related documents. Shoppers can log in to these websites to make purchases. A good e-commerce website should have high-performance data transmission capabilities and reliable data storage capabilities.

[0108] Generally, e-commerce websites process a large amount of business data daily, potentially involving various transactions. Data loss in such cases can lead to significant losses. In this scenario, the data recovery method described in this application can be applied to the e-commerce website's database to promptly recover from accidental deletions, thereby improving the reliability of website operations, indirectly promoting sales growth, and enhancing customer satisfaction.

[0109] General Description of Embodiments in this Application

[0110] Please refer to Figure 4 , Figure 4 A schematic flowchart of a database data recovery method provided in an embodiment of this application is shown. Figure 4 As shown, a database data recovery method according to an embodiment of this application includes, but is not limited to, the following steps:

[0111] Step 410: Parse the database log stream to determine the first log entry containing the source data of the database;

[0112] Step 420: Based on the first log entry, establish the first index data; wherein, the index hierarchy of the first index data is, in order, the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log entry, and the first timestamp corresponding to the source data contained in the first log entry;

[0113] Step 430: In response to the target object's data recovery request for the database, determine the target deletion operation specified by the target object for data recovery;

[0114] Step 440: Determine the recovery query information based on the target deletion operation; wherein, the recovery query information includes at least one of the following: attribute information or time range information of the data table corresponding to the target deletion operation; the attribute information includes table name or label pairs.

[0115] Step 450: Based on the recovery query information and the first index data, determine the matching target log entry in the first log entry;

[0116] Step 460: Perform data recovery processing on the database based on the source data contained in the target log item.

[0117] This application provides a database data recovery method that enables local data recovery processing within the database, reducing the probability of database anomalies and improving recovery efficiency. Specifically, the data recovery method provided in this application can be used for databases in various application scenarios, including but not limited to e-commerce, financial services, social media, gaming services, the Internet of Things, education platforms, logistics management, and healthcare. This application does not impose any limitations on these applications.

[0118] In the technical solution provided in this application embodiment, for each database, its corresponding log data can be recorded during its operation. The log data here can be a series of records generated by the database management system during the execution of transactions. These records are arranged in chronological order to form a continuously updated data stream, which is referred to as a log stream in this application embodiment.

[0119] A log stream can include multiple log files, which are physical files storing the log data in the log stream. Each log file can contain multiple log entries, which record various events occurring in the database. In this embodiment, there are no restrictions on the data content, data size, number of log files, or number of log entries in the log stream corresponding to the database.

[0120] For example, databases can generate log entries and log files using Write-Ahead Log (WAL) technology. The write-ahead log records changes in a log file before they are actually written to the database. Specifically, when a database transaction begins, a corresponding transaction identifier is generated, and a log file is created. For each operation within the transaction (such as insert, delete, update, etc.), the database generates a corresponding log entry and writes it to the log file corresponding to the transaction identifier.

[0121] Generally, log entry types can include various categories, such as delete operation log entries, source data log entries, metadata operation log entries, and empty log entries. Delete operation log entries record information related to deletion operations; source data log entries record the content of the source data; metadata operation log entries record the content of database metadata (such as table structure, indexes, views, etc.) to facilitate maintaining database consistency; and empty log entries record no actual operation content, typically used as placeholders or to mark certain special states. Of course, it is understood that the types of log entries in actual log files are not limited to those described above, and this application does not impose any restrictions on this.

[0122] In step 410, when performing data recovery processing on the database, for the database requiring data recovery, its corresponding log stream can be queried, the database log stream can be parsed, each log file can be extracted, and then the log entries containing the source data of the database can be determined from these log files. In this embodiment of the application, the log entry containing the source data of the database is recorded as the first log entry.

[0123] Specifically, in step 410, when parsing the database log stream to determine the first log item, the data format and log file format used by the log stream can be identified first. Then, the first log item can be searched from the log file of the log stream based on specific keywords or fields. For example, source data log items often contain specific fields such as "INSERT" and "UPDATE". In this embodiment, the first log item can be determined by locating these fields, which can be achieved using a command-line tool.

[0124] Of course, in other embodiments, the first log entry can also be determined in other ways, such as using regular expressions to match log entries that conform to a specific operation type, or using relevant log analysis tools. This application does not limit this.

[0125] It should be noted that databases often store a large amount of data. Therefore, in this embodiment, the number of first log entries determined from the database log stream is generally multiple. Each first log entry may include at least one source data entry. This application does not limit the specific number of first log entries determined or the number of source data entries included in each log entry.

[0126] For each piece of source data in the database, its data model can be flexibly set according to the database's needs. For example, in some embodiments, the database in this application can be a time series database (TSDB), and its data model can be represented as follows:

[0127] measuremnet tagpairs fieldpairs timestmp

[0128] In this context, "measuremnet" represents a table name in the database, used to identify a set of related data. Typically, a collection of data of the same type can use the same measurementmnet, such as temperature data, humidity data, or CPU utilization. Each measurementmnet can contain multiple tag pairs and field pairs. Tag pairs are key-value pairs, specifically including tagkey and tagvalue. Tag pairs can be used to identify data metadata, serving as identifiers for indexed data. Based on tag pairs, related data can be quickly located. Tag pairs can also be used to distinguish data from different data sources or data points within the same measurementmnet, such as data corresponding to different device identifiers or different device types. Field pairs are specific data value pairs, similarly key-value pairs, used to store the actual data. Field pairs are not indexed and are not suitable as query conditions. "timestamp" is the timestamp corresponding to each data entry, representing the time node of data collection. Generally, time-series databases automatically add timestamps to each data point, accurate to milliseconds or microseconds; this application does not impose such limitations.

[0129] For example, consider a temperature monitoring system whose database records temperature data from different locations and devices. Using the data model described above, MeasuremNet can use `temperature`, `location` and `device_id` can be used as tag keys in `tagpairs`, `Field Pairs` can store specific temperature values, and `timestmp` can record the time point of each temperature data acquisition. Of course, it is understood that the actual data content that can be stored in a database is diverse. The above example is only used to introduce and illustrate the data model involved in the embodiments of this application and does not imply any limitation on the actual application of this application.

[0130] In step 420, after determining the first log entry, the various first log entries can be summarized, and a set of index data can be established based on the first log entries. In this embodiment, this can be recorded as the first index data. Please refer to... Figure 5 , Figure 5This illustration shows a schematic diagram of a first index data provided in an embodiment of this application. In this embodiment, the index hierarchy of the first index data may include multiple levels, from higher to lower levels: the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log entry, and the first timestamp corresponding to the source data contained in the first log entry. The highest index level is the table name of the data table to which the source data belongs, followed by the tag pair to which the source data belongs (specifically including tag key and tag value). It is understood that, generally, for a first log entry, the source data contained within it is often a small part of the data table, belonging to a certain indexable tag pair. Therefore, in this embodiment, each first log entry can be divided into different tag pair scales based on the table name and tag pair of the data table to which the source data in the first log entry belongs.

[0131] like Figure 5 As shown, in the first index data, the highest index level is the table name (measurement). Based on the table name, several table name index entries can be created, such as... Figure 5 The table name _1 in the table is an index entry. Figure 5 For ease of description, only the table name index entry "table_name_1" is shown. It is understood that databases often contain multiple tables, and the source data contained in different first log entries may belong to different tables. Therefore, when creating the first index data, there can be more than one table name index entry; for example, it can also include index entries such as "table_name_2" and "table_name_3." Different table name index entries correspond to different tables. In this embodiment, the number of table name index entries contained in the first index data is not limited.

[0132] Under the index level of the table name (measuremnet), there may be sub-index entries. These sub-index entries are at the tag key level. In this embodiment, the index entries at the tag key level are denoted as tag key index entries, such as... Figure 5 As shown, under the index entry for table name_1, there are multiple tag key index entries, namely tag key_a, tag key_b, etc. In this embodiment, the number of tag key index entries under a table name index entry can be one or more. Similarly, under the tag key index level, there can also be sub-index entries. These sub-index entries are tag value levels. In this embodiment, the index entries at the tag value level are denoted as tag value index entries, such as... Figure 5 As shown, under the index item of the tag key _b, there are multiple tag value index items, namely tag value _b_1, tag value _b_2, etc. In this embodiment of the application, similarly, the number of tag value index items under a tag key index item can be one or more, and this application does not limit this.

[0133] As described above, in this embodiment of the application, the source data contained in a first log item is often a small part of the data table, belonging to a certain indexable tag pair. In other words, the first log item can be assigned to various tag value index items as a sub-index item under the tag value index item. However, the data content recorded in the first log item itself is relatively large. To reduce the data volume of the first index data and facilitate the indexing of information, in this embodiment of the application, the log sequence number (LSN) of each log item can be used as the identification information for indexing that log item. In the first index data, an index hierarchy of log items can be established based on the log sequence number of the first log item belonging to the tag pair. For example... Figure 5 As shown, the source data in a batch of first log entries are all located in a data table named "table_name_1", belonging to the data under the tag key "tag_key_b" and the tag value "tag_value_b_2". Then we can query the log sequence number of this batch of first log entries. Assuming there are X first log entries (X is a positive integer), the determined log sequence numbers are recorded as first log sequence number_1, first log sequence number_2, first log sequence number_3, and so on, up to first log sequence number_X. Then these first log sequence numbers can be used as sub-index entries under the tag value index entry "tag_value_b_2".

[0134] In this embodiment, within the first index data, timestamp data can be introduced as a sub-index item for each source data entry at the index level of the log entries. Specifically, timestamp index items can be further established at the index level of the log entries. For each first log entry, there may be multiple source data entries. Each source data entry corresponds to a timestamp data entry; therefore, at the timestamp index level, each timestamp data entry can be used as a timestamp index item. In this embodiment, the timestamp corresponding to the source data is recorded as the first timestamp, and the lowest level index level in the first index data is the first timestamp index level.

[0135] In summary, in this embodiment of the application, the first index data can be composed of the table name of the data table to which the source data belongs, the tag pair, the first log sequence number corresponding to the first log item, and the first timestamp corresponding to the source data contained in the first log item. The index hierarchy can be represented as: table name → tag pair → first log sequence number → first timestamp.

[0136] It should be noted that in some embodiments, the number of source data in the first log entry may be large. If each first timestamp is used as a timestamp index entry, it may lead to slow indexing. In the embodiments of this application, for each first log entry, the first timestamps corresponding to each source data may be distributed at intervals. For example, these source data may have been written in different time periods, and the corresponding first timestamps may be distributed in different time period ranges. Therefore, similar first timestamps can be aggregated into a single timestamp range for statistical analysis. Figure 5 As shown, for the first log entry with log sequence number "first log sequence number_2", the first timestamp corresponding to the source data can be divided into multiple timestamp ranges, and each timestamp range can be used as a timestamp index entry. In this way, when indexing later, the relevant first timestamp can be quickly determined based on the timestamp range, which helps to improve the efficiency of indexing.

[0137] Of course, in this embodiment of the application, Figure 5 The implementation shown is only used to illustrate the first index data in the embodiments of this application. In practice, when creating the first index data, the index items of each index level can be flexibly set according to specific needs. This application does not impose any restrictions on this.

[0138] In this embodiment of the application, the first index data can be synchronously updated in real time when log items are generated in the log stream, so as to facilitate responding to data recovery requests at any point in time.

[0139] In step 430, in the event of accidental deletion in the database, the target object can initiate a data recovery request, thereby triggering the relevant data recovery processing flow. Here, the target object can be a database user, administrator, or maintenance personnel; in this embodiment, their identity is not specifically limited. Of course, in some scenarios, to improve database security, the target object's identity can be verified before initiating a data recovery request. Only after confirming that the target object has the authority to execute the data recovery task can the corresponding data recovery processing flow be executed. This application does not restrict the specific authentication method; it can be implemented based on existing technologies in the relevant field.

[0140] In this application, the target object can initiate a data recovery request in various ways. For example, in some embodiments, a relevant operation entry point can be set within the database management system. When a problem of accidental data deletion is discovered in the database, the target object can access the relevant interface through the operation entry point set within the database management system and click the corresponding button to initiate a data recovery request. In some embodiments, the target object can also trigger a data recovery request by sending relevant instructions to the database management system. This application does not limit the form and content of such instructions; they can be flexibly set according to actual needs.

[0141] In step 430, when a data recovery request initiated by the target object is received, the data recovery request can be parsed to determine the deletion operation specified by the target object for data recovery. In this embodiment, the deletion operation specified by the target object for data recovery is denoted as the target deletion operation. The specific implementation method for determining the target deletion operation can be implemented according to the specific content of the target object's data recovery request. For example, in some embodiments, the target object may know which deletion operation(s) was mistakenly deleted. For instance, if the target object is a database administrator and discovers that a deletion operation was performed by mistake, a data recovery request can be initiated for that deletion operation. In this case, the deletion operation that needs to be recovered has already been specified when the data recovery request is initiated. At this time, the deletion operation specified by the target object for data recovery can be directly determined as the target deletion operation based on the information obtained from parsing the data recovery request.

[0142] In other embodiments, the target object may not know which deletion operation(s) were mistakenly deleted. For example, in some embodiments, the target object discovers a database anomaly and some business data has been mistakenly deleted, but it is unclear which deletion operation caused the anomaly. In this case, the target object can directly specify the desired database recovery point and then initiate a data recovery request. In this case, all deletion operations between the target object's specified database recovery point and the current time point can be identified as the target deletion operations.

[0143] It should be noted that although the technical solutions in this application are intended to achieve data recovery processing after a database has been accidentally deleted, this does not mean that the target deletion operation is equivalent to the accidental deletion operation. For example, for a data recovery request where the target object specifies that the database should be restored to a certain point in time, among all deletion operations between the specified time point to which the database should be restored and the current time point, there may be not only accidental deletion operations but also normal data deletion operations. These normal deletion operations can also be identified as the target deletion operation, and this application does not impose any restrictions on this.

[0144] In step 440, after determining the target deletion operation, corresponding recovery query information can be generated based on the target deletion operation. In this embodiment, the recovery query information can be used to match the first index data in subsequent processes to facilitate the determination of which specific first log entries need to be recovered.

[0145] Specifically, in this embodiment, when determining the recovery query information, the target deletion operation can be parsed to determine the relevant information of the source data to be deleted in the target deletion operation, and then this relevant information can be extracted as the recovery query information. Subsequently, when searching for the source data to be recovered, it can be quickly located based on the recovery query information. The relevant information of the source data to be deleted in the target deletion operation may differ slightly depending on the type of deletion operation. For example, in some embodiments, the target deletion operation may indicate the deletion of a data table; in this case, the target deletion operation will carry the table name of the data table to be deleted. In other embodiments, the target deletion operation may indicate the deletion of part of the data in a data table. Generally, the data in the data table can be indexed by tag pairs; in this case, the target deletion operation will carry the tag pairs of the data table to be deleted. In this embodiment, the above-mentioned table name and tag pairs can be recorded as the attribute information of the data table. The recovery query information may include one or more of the attribute information, such as the table name or the tag pairs.

[0146] Furthermore, in some embodiments, the target deletion operation may specify which time range of data to delete; that is, the target deletion operation will correspond to time range information, for example, the time range information may be from 12:38:21 on January 1st to 13:38:21 on January 1st. In the embodiments of this application, there is no limitation on the length of the time range corresponding to the target deletion operation.

[0147] The recovery query information may include at least one of attribute information or time range information. For example, in some embodiments, the target deletion operation instructs the deletion of a certain data table, and the corresponding generated recovery query information may include the table name; in some embodiments, the target deletion operation instructs the deletion of data in a certain data table that is within a specified time range, and the corresponding generated recovery query information may include the table name and time range information; in some embodiments, the target deletion operation instructs the deletion of all data in the database that matches a certain tag pair and is within a specified time range, and the corresponding generated recovery query information may include the tag pair and time range information. This application does not impose any limitations on this.

[0148] In step 450, after determining the recovery query information, a matching target log entry can be determined in the first log entry based on the recovery query information and the first index data. Specifically, in this embodiment, the recovery query information may include at least one of the attribute information or time range information of the data table corresponding to the target deletion operation. Based on the information contained therein, matching log entry index entries can be searched in the first index data, that is, it is determined which first log sequence numbers can be matched, and then the first log entry corresponding to the matched first log sequence number is determined as the target log entry.

[0149] For example, in some embodiments, if the current recovery query information includes the table name of a data table, for a certain first log item, if its corresponding first log sequence number belongs to a sub-index item of the index item corresponding to the table name, then the first log item can be determined as the target log item; if its corresponding first log sequence number does not belong to a sub-index item of the index item corresponding to the table name, then the first log item can be determined as not belonging to the target log item.

[0150] In some embodiments, if the current recovery query information includes a pair of labels for a data table, for a certain first log item, if its corresponding first log sequence number belongs to a sub-index item of the index item corresponding to the label value in the label pair, then the first log item can be identified as the target log item; similarly, if its corresponding first log sequence number does not belong to a sub-index item of the index item corresponding to the label value in the label pair, then the first log item can be identified as not belonging to the target log item.

[0151] In some embodiments, if the current recovery query information includes the table name and time range information of the data table, then in addition to determining whether the table name matches, it is also necessary to determine whether the sub-index item (i.e., the first timestamp) of the first log sequence number corresponding to the first log item is within the time range identified by the time range information. If it is within the time range, the first log item can be identified as the target log item.

[0152] In step 460, after the target log entry is identified, data recovery processing can be performed on the database based on the source data contained in the target log entry.

[0153] Specifically, in this embodiment, the source data contained in the target log item can be extracted and imported into the database that needs data recovery processing. This enables rapid recovery of accidentally deleted data without switching database instances, making it seamless for database users and reducing interference with database-related business operations.

[0154] In this embodiment, the mistakenly deleted data that needs to be recovered can be directly located through the target log item. Compared with the related technology that uses the full amount of log data for data recovery processing, this embodiment can greatly reduce the amount of data to be processed, which is conducive to saving implementation costs and improving processing efficiency.

[0155] It is understood that the database data recovery method provided in this application embodiment parses the database log stream to determine the first log item containing the source data of the database. Then, it sequentially establishes first index data using the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log item, and the first timestamp corresponding to the source data contained in the first log item as different index levels. Next, it determines the target deletion operation to be recovered based on the data recovery request of the target object, and then generates corresponding recovery query information. Based on the recovery query information, it performs a matching query in the first index data, which can easily determine the matching target log item. Therefore, the source data in the target log item can be used to perform data recovery processing on the database. This method establishes first index data based on the database log stream, which facilitates finding the mistakenly deleted source data according to the target deletion operation specified by the target object during subsequent data recovery processing. This allows for accurate data recovery processing, reduces the amount of data to be processed, and improves processing efficiency. Furthermore, performing data recovery processing locally on the database reduces the probability of database anomalies and improves the user experience.

[0156] Specifically, in some embodiments, determining recovery query information based on the target deletion operation includes:

[0157] Detect the deletion type of the target deletion operation;

[0158] If the deletion type of the target deletion operation is table deletion, the first recovery query information is generated based on the table name and time range information of the data table corresponding to the target deletion operation;

[0159] If the deletion type of the target deletion operation is timeline deletion, a second recovery query is generated based on the label pairs and time range information of the data table corresponding to the target deletion operation.

[0160] In this embodiment of the application, when determining recovery query information based on the target deletion operation, the deletion type of the target deletion operation can be detected in some scenarios. For example, for time-series databases, commonly used deletion operations can be divided into two categories: table deletion and timeline deletion. These two deletion types will be described and explained below.

[0161] In time-series databases, the `delete` operation (table deletion type) can be used to delete data within a specified time range from a given table. For example, a time-series database might be used for data monitoring. When monitoring data from a particular data source is no longer needed, a `delete` operation (table deletion type) can be performed on a table within that data source. Specifically, the information corresponding to a `delete` operation (table deletion type) can be represented as follows:

[0162] drop measure xxx where time>t1 and time <t2

[0163] The above information means that data in a data table with a time range between t1 and t2 should be deleted.

[0164] Timeline deletion operations can be used to delete data within a specified time range from the database that matches corresponding tag pairs. Unlike table deletion operations, timeline deletion operations are more flexible and can pinpoint unwanted data points based on actual needs. Specifically, the information corresponding to table deletion operations can be represented as follows:

[0165] elete series from xxx where tagkey1=tagvalue1 and tagkey2=tagvalue2and

[0166] time>t3 and time <t4

[0167] The above information means that data matching tag pairs (tagkey1, tagvalue1) and tag pairs (tagkey2, tagvalue2) in the database, and whose time range is between t3 and t4, will be deleted.

[0168] Based on the above description of deletion types, in this embodiment, when determining recovery query information, corresponding information can be extracted as recovery query information according to the deletion type of the target deletion operation. For example, if the deletion type of the target deletion operation is table deletion, recovery query information can be generated based on the table name and time range information of the data table corresponding to the target deletion operation. In this embodiment, this is recorded as the first recovery query information. Similarly, if the deletion type of the target deletion operation is timeline deletion, recovery query information can be generated based on the label pairs and time range information of the data table corresponding to the target deletion operation. In this embodiment, this is recorded as the second recovery query information.

[0169] It is understood that, in this embodiment of the application, by detecting the deletion type corresponding to the target deletion operation, it can help to quickly locate the information that serves as the deletion instruction within the target deletion operation, thereby efficiently generating the corresponding recovery query information, which is beneficial to improving the efficiency and accuracy of data recovery.

[0170] In some embodiments, determining a matching target log entry in a first log entry based on recovery query information and first index data includes:

[0171] Using the table name and time range information in the first recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data; or, using the tag pair and time range information in the second recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data.

[0172] The first log entry corresponding to the first log sequence number is identified as the target log entry for matching.

[0173] In this embodiment of the application, when determining the matching target log item from the first log item based on the recovery query information and the first index data, specifically, the matching index item can be found in the first index data based on the information in the recovery query information, and the first log sequence number of these index items at the log item index level can be determined, thereby determining the matching target log item.

[0174] Specifically, as mentioned in the previous embodiments, the recovery query information can be the first recovery query information, which includes the table name and time range information. In this embodiment, the table name and time range information can be used as index conditions to index the first index data. For example, the relevant table name can be indexed in the first index data first, based on the table name, to determine the first index data matching at the table name index level. Then, based on the time range information, it can be determined whether the first timestamp of each first log sequence number's index level in the matching first index data is within the time range specified by the time range information. If it is, the first log sequence number can be determined as the matching log sequence number. In this embodiment, it can be recorded as the first target sequence number.

[0175] In other embodiments, the recovery query information can also be second recovery query information, which includes tag pairs and time range information. In this embodiment, the tag pairs and time range information can be used as indexing conditions to index the first index data. Similarly, based on the tag pairs, the relevant tag pairs can be indexed in the first index data to determine the first index data that matches at the tag key level. Then, based on the time range information, it can be determined whether the first timestamp of each first log sequence number's index level in the matched first index data is within the time range specified by the time range information. If it is, the first log sequence number can be determined as the matched log sequence number, i.e., the first target sequence number.

[0176] After determining the first target sequence number, the first log entry corresponding to the first log sequence number can be identified as the matching target log entry. In this embodiment, there is no limitation on the number of target log entries or the actual matching method corresponding to each target log entry.

[0177] Specifically, in some embodiments, the database data recovery method provided in this application further includes:

[0178] Parse the database log stream to determine the second log entry corresponding to the historical deletion operation;

[0179] Based on the second log entry, a second index data is created; wherein, the index level of the second index data is, in order, the second log sequence number of the second log entry and the second timestamp corresponding to the historical deletion operation in the second log entry.

[0180] In this embodiment of the application, in order to facilitate the data recovery of the database, in some scenarios, index data for each deletion operation can also be established, and these index data can be recorded as the second index data.

[0181] Specifically, in this embodiment, when parsing the database log stream, the log entries corresponding to previous deletion operations can be determined. In this embodiment, deletion operations already performed in the database are recorded as historical deletion operations, and the log entries corresponding to these historical deletion operations are recorded as second log entries. After determining each second log entry, second index data can be built based on the second log entries. In this embodiment, the second index data may include two index levels. The higher index level is the log entry index level, used to index the corresponding log entries. For the second log entry, its corresponding log sequence number can be used as the information of the log entry index level in the second index data. In this embodiment, it is recorded as the second log sequence number. The lower index level is the timestamp corresponding to the historical deletion operation. In this embodiment, it is recorded as the second timestamp.

[0182] In summary, in this embodiment of the application, the second index data can be composed of the second log sequence number corresponding to each second log item and the second timestamp corresponding to the historical deletion operation recorded therein. The index hierarchy can be represented as: second log sequence number → second timestamp.

[0183] Specifically, in some embodiments, in response to a data recovery request from a target object for a database, determining the target deletion operation specified by the target object for data recovery includes:

[0184] In response to a data recovery request from the target object for the database, determine the historical time period specified by the target object for data recovery;

[0185] Query the second timestamp that belongs to the historical time period in the second index data, and determine the second target sequence number by the second log sequence number corresponding to the second timestamp that belongs to the historical time period.

[0186] The historical deletion operation corresponding to the second target sequence number is identified as the target deletion operation for data recovery.

[0187] In this embodiment of the application, after the second index data is established, when determining the target deletion operation based on the target object's data recovery request for the database, the matching determination can be performed based on the second index data.

[0188] Specifically, as described above, in some embodiments, the target object discovers an anomaly in the database, with some business data being mistakenly deleted, but it is unclear which deletion operation caused the anomaly. In this case, the target object can directly specify the desired database recovery point in time and then initiate a data recovery request. In this embodiment, for this situation, the historical time period requiring data recovery can be determined based on the target object's specified recovery point in time. This historical time period is the time interval between the target object's specified database recovery point in time and the current time.

[0189] Reference Figure 6 , Figure 6 This illustration shows a schematic diagram of a target deletion operation provided in an embodiment of this application. In this embodiment, after determining the historical time period specified in the data recovery request for data recovery, the historical time period can be matched with the established second index data. The second timestamp belonging to the historical time period is then queried from the second index data. Then, the second log sequence number corresponding to the second timestamp belonging to the historical time period can be determined as the matching sequence number. In this embodiment, these sequence numbers can be determined as the second target sequence number.

[0190] It is understood that the historical deletion operation in the second log entry identified by the second target sequence number belongs to the target deletion operation within the historical time period. Therefore, in this embodiment, the historical deletion operation corresponding to the second target sequence number can be determined as the target deletion operation for data recovery specified by the target object. Here, the number of determined second target sequence numbers can be one or more, and this application does not limit this.

[0191] It is understood that, in this embodiment of the application, by establishing second index data, it is convenient to respond to data recovery requests initiated by the target object, quickly determine the target deletion operation specified by the target object, and improve the processing efficiency of data recovery.

[0192] In some embodiments, after the step of building the first index data based on the first log entry, the method further includes:

[0193] The frequency of occurrence of each field in the first index data is detected, and the fields whose frequency of occurrence is greater than a preset threshold are identified as target fields.

[0194] Different identifier information is assigned to each target field, and the association between the target field and the identifier information is stored; wherein, the number of characters in the identifier information is less than the number of characters in the target field;

[0195] The target field is replaced by the identification information to obtain the updated first index data.

[0196] In some scenarios, the first index data can be encoded using dictionary-based encoding. Specifically, it is understood that since the first index data needs to store a large amount of information such as table names and tag pairs, storing it as usual may consume a lot of storage resources.

[0197] In this embodiment, the frequency of occurrence of each field in the first index data can be detected, and a threshold for the frequency of occurrence can be preset. For each field, if its frequency of occurrence is greater than the preset threshold, it can be identified as a target field; conversely, if its frequency of occurrence is less than or equal to the preset threshold, it can be excluded from being identified as a target field.

[0198] Next, different identifier information can be assigned to each target field. This identifier information can be abbreviated characters or numbers, without actual meaning, and the number of characters must be less than the number of characters in the target field. This application does not restrict its specific content and form. Then, the association between each target field and the assigned identifier information can be stored. After storing the association, the target fields can be replaced using the identifier information to obtain the updated first index data. In this way, fields that appear frequently in the first index data can be dictionary-encoded, and the original content can be replaced with shorter identifier information.

[0199] It is understood that, in this embodiment of the application, by performing dictionary-based encoding on the first index data, the amount of first index data that needs to be stored can be effectively reduced without affecting the normal use of the first index data, thereby reducing the cost of storage resources.

[0200] Specifically, in some embodiments, data recovery processing of the database is performed based on the source data contained in the target log item, including:

[0201] The source data contained in the target log item is merged to obtain the target data;

[0202] Based on the target data, perform data recovery processing on the database.

[0203] In this embodiment of the application, when using source data in the target log entries to perform data recovery processing on the database, in some cases, each target log entry may be determined from different log files. The source data may be recorded repeatedly in different log files. Therefore, before importing into the database, the source data contained in the target log entries can be merged to obtain merged source data, which is recorded as the target data. Then, the target data is used to perform data recovery processing on the database.

[0204] Specifically, in some embodiments, the source data contained in the target log item is merged, including:

[0205] Detect the table name, label pairs, indicator value labels, and first timestamp of each source data table;

[0206] If multiple first data tables have the same table name, label pair, indicator value label, and first timestamp, check the first log sequence number of the target log item where each first data is located; where the first data can be any source data.

[0207] Compare the size of the first log sequence number corresponding to each first data point;

[0208] The first data corresponding to the largest first log sequence number is retained, and the other first data are discarded.

[0209] In this embodiment of the application, when merging the source data contained in the target log item, the table name, tag pair, indicator value tag and first timestamp of the data table to which each source data belongs can be detected. If these contents are all the same, it means that they are the same source data and they can be deduplicated.

[0210] Specifically, during deduplication, data whose table name, label pair, indicator value label, and first timestamp are all the same can be recorded as first data. The first log sequence number of the target log item to which each first data belongs is determined. The size of these first log sequence numbers is compared, and only the first data with the largest first log sequence number is retained. The other first data are deleted.

[0211] It should be noted that, in this embodiment, identical first data can be grouped together. When processing the first data in each group, only the first data with the largest corresponding first log sequence number is retained, and other first data within the group are deleted. This effectively improves the efficiency of data import and reduces redundant processing operations.

[0212] Specifically, in some embodiments, data recovery processing of the database is performed based on the source data contained in the target log item, including:

[0213] Based on the recovery query information, a query is performed in the corresponding cold backup database to obtain the second data;

[0214] The source data and the second data are merged to obtain the target data;

[0215] Based on the target data, perform data recovery processing on the database.

[0216] In this embodiment, in some scenarios, the data involved in accidental deletion operations may include not only the data in the log stream but also the data in the cold backup. In this case, when performing data recovery on the database, using only the log stream data still presents the problem of missing data. To address this, in this embodiment, during data recovery, a query can be performed in the corresponding cold backup based on the determined recovery query information to obtain the accidentally deleted data existing in the cold backup. In this embodiment, this is recorded as the second data. Specifically, the information in the recovery query information can be used as the query content to search for corresponding data in the cold backup, thereby determining the second data. Next, the source data and the second data can be merged to obtain the target data. The specific implementation method for merging can be referred to the aforementioned embodiments and will not be elaborated here.

[0217] It is understood that, in this embodiment of the application, the comprehensive use of log stream and data from cold backup for data recovery processing can reduce the probability of data loss problems, facilitate the accurate and complete recovery of accidentally deleted data, and improve the stability and reliability of database operation.

[0218] In some embodiments, data recovery processing of the database includes:

[0219] Obtain the time interval information corresponding to each data shard in the database;

[0220] Based on the time interval information, the target data is divided to obtain the data subsets corresponding to each data slice;

[0221] According to the corresponding time interval information, each data subset is imported into the data shards of the database.

[0222] In this embodiment, the data in the database is generally stored in a fragmented manner. During data recovery, the target data can be imported into each data fragment.

[0223] Specifically, each data shard has its own corresponding time interval information. Different data shards have different time interval information, and the combined time interval information of all data shards is generally continuous. When importing target data into a data shard, the target data can be divided according to the first timestamp of each source data point. Based on the time interval to which the first timestamp belongs, each source data point is divided into a data subset corresponding to the data shard. Then, each data subset is imported into the database's data shard, thereby achieving data recovery.

[0224] Specifically, it should be noted that in some embodiments, different data shards may use memory tables with different formats, which may prevent direct writing during import. Therefore, in this embodiment, the format information of the memory table used by the data shard can be queried, and the format of the data subset corresponding to the data shard can be converted according to the format information. Then, the format-converted data subset can be imported into the corresponding data shard. In this way, data format adaptation can be achieved, making data recovery convenient, efficient, and fast.

[0225] The following describes and explains a database data recovery method provided in this application embodiment, with reference to specific application examples.

[0226] Reference Figure 7 , Figure 7 This diagram illustrates a database architecture provided in an embodiment of this application, in which the data recovery method provided in the embodiment of this application is applied. Figure 7 As shown, the architecture mainly includes a log export module, a data recovery module, and a data import module. The log export module comprises two parts: a log summary generation module and a log stream import module. The log summary generation module generates log summaries, facilitating rapid data filtering and identification of the target data to be recovered during the data recovery process. The log stream import module is primarily responsible for importing the log streams generated by the database into object storage. Object storage stores the corresponding cold backups, log streams, and log summaries for the database.

[0227] Figure 7 In this process, the data recovery module responds to data recovery requests based on cold backups, log streams, and log summaries. It determines the target data to be recovered by executing the method strategy described in this embodiment, and generates data subsets in the format of corresponding in-memory tables based on different data shards in the database. These data subsets are then forwarded to the data recovery module. The data recovery module is responsible for executing the data recovery and import tasks according to the format of the in-memory tables corresponding to each data shard. The entire process is seamless for the database users, resulting in efficient and accurate recovery.

[0228] Reference Figure 8 , Figure 8 This diagram illustrates the principle of a log digest generation module provided in an embodiment of this application. In this embodiment, the log digest generation module is primarily responsible for parsing the log stream and generating a corresponding log digest. For example... Figure 8As shown, the database log stream contains multiple log files, such as log file 1, log file 2... log file N (N is a positive integer). Each log file includes a batch of log entries, which may be of the following types: deleted log entries, source data log entries, metadata operation log entries, empty log entries, etc. In this embodiment, the log summary generation module mainly focuses on deleted log entries and source data log entries.

[0229] To minimize implementation costs, this embodiment generates a log summary of the log stream using a log summary generation module. The log summary comprises three parts: a dictionary, an inverted index (i.e., first index data), and a deletion index (i.e., second index data). The inverted index is generated based on the source data log entries and includes the table name of the data table to which the source data belongs, a tag pair, the log sequence number of the source data log entry (i.e., the first log sequence number), and the timestamp corresponding to the source data contained in the source data log entry (i.e., the first timestamp). Specifically, as follows... Figure 5 As shown, details will not be elaborated here. The deletion index includes the log sequence number of the deleted log entry (i.e., the second log entry) and the timestamp corresponding to the deletion operation (i.e., the second timestamp). Furthermore, the inverted index is dictionary-encoded, with the dictionary storing the association between the encoded fields and characters, thus reducing storage costs.

[0230] In use Figure 7 The database architecture shown can retrieve the historical time period for the requested recovery when performing data recovery processing. For each log file in the database, the deletion operation corresponding to the time period can be found in the deletion index first. Based on the relevant information in the deletion operation, the recovery query information is determined. Then, the first log sequence number of the corresponding time period is found in the inverted index to determine the first log entry. For the first log entry found in each log file, a corresponding data recovery table can be built in memory based on the source data it contains. (Refer to...) Figure 9 , Figure 9 The diagram illustrates a data recovery table provided in an embodiment of this application. In the data recovery table, data is stored in the format of table name, tag pairs, data values, and timestamps, and "null" in the table represents an empty value.

[0231] In this embodiment, a cold backup can also be used as a data source for querying to determine if there is any mistakenly deleted data in the cold backup. This data, combined with the data determined in the log stream, is then integrated to obtain the target data. The obtained target data can be divided into different data shards according to timestamps. (Refer to...) Figure 10 , Figure 10 This illustration shows a schematic diagram of importing data into different data fragments, as provided in an embodiment of this application. For example... Figure 10As shown, for data in the same data table, the data shards to which they belong can be determined based on the timestamps, and then the import operation can be performed. In this way, the data can be accurately restored.

[0232] In this embodiment, the data import module can import target data into the memory table corresponding to each data shard. Generally, the storage engine used for data shards needs to write data during memory table generation. In this embodiment, a storage engine that supports direct insertion into the memory table can be selected, as described above. Figure 11 , Figure 11 This illustration shows a schematic diagram of data import into a memory table according to an embodiment of this application. In this embodiment, a storage engine that supports direct insertion into memory tables is used. Data can be directly processed into a memory table and inserted into the memory table list corresponding to the data shard, thereby achieving rapid data recovery. When the memory table is full, it will be permanently stored as a disk file.

[0233] Reference Figure 12 In this embodiment of the application, a database data recovery system is also provided, which includes:

[0234] The parsing unit 1210 is used to parse the log stream of the database and determine the first log item containing the source data of the database.

[0235] Establishment unit 1220 is used to establish first index data based on the first log item; wherein, the index hierarchy of the first index data is, in order, the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log item, and the first timestamp corresponding to the source data contained in the first log item;

[0236] The response unit 1230 is used to respond to the data recovery request for the database from the target object and determine the target deletion operation specified by the target object for data recovery;

[0237] The execution unit 1240 is used to determine recovery query information based on the target deletion operation; wherein, the recovery query information includes at least one of the attribute information or time range information of the data table corresponding to the target deletion operation, and the attribute information includes table name or label pair;

[0238] Matching unit 1250 is used to determine the target log entry to be matched in the first log entry based on the recovery query information and the first index data.

[0239] The processing unit 1260 is used to perform data recovery processing on the database based on the source data contained in the target log item.

[0240] Optionally, in some embodiments, the execution unit is specifically used for:

[0241] Detect the deletion type of the target deletion operation;

[0242] If the deletion type of the target deletion operation is table deletion, the first recovery query information is generated based on the table name and time range information of the data table corresponding to the target deletion operation;

[0243] If the deletion type of the target deletion operation is timeline deletion, a second recovery query is generated based on the label pairs and time range information of the data table corresponding to the target deletion operation.

[0244] Optionally, in some embodiments, the matching unit is specifically used for:

[0245] Using the table name and time range information in the first recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data; or, using the tag pair and time range information in the second recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data.

[0246] The first log entry corresponding to the first log sequence number is identified as the target log entry for matching.

[0247] Optionally, in some embodiments, the database data recovery system further includes a second establishment unit, which is specifically used for:

[0248] Parse the database log stream to determine the second log entry corresponding to the historical deletion operation;

[0249] Based on the second log entry, a second index data is created; wherein, the index level of the second index data is, in order, the second log sequence number of the second log entry and the second timestamp corresponding to the historical deletion operation in the second log entry.

[0250] Optionally, in some embodiments, the response unit is specifically used for:

[0251] In response to a data recovery request from the target object for the database, determine the historical time period specified by the target object for data recovery;

[0252] Query the second timestamp that belongs to the historical time period in the second index data, and determine the second target sequence number by the second log sequence number corresponding to the second timestamp that belongs to the historical time period.

[0253] The historical deletion operation corresponding to the second target sequence number is identified as the target deletion operation for data recovery.

[0254] Optionally, in some embodiments, the database data recovery system further includes an encoding unit, which is specifically used for:

[0255] The frequency of occurrence of each field in the first index data is detected, and the fields whose frequency of occurrence is greater than a preset threshold are identified as target fields.

[0256] Different identifier information is assigned to each target field, and the association between the target field and the identifier information is stored; wherein, the number of characters in the identifier information is less than the number of characters in the target field;

[0257] The target field is replaced by the identification information to obtain the updated first index data.

[0258] Optionally, in some embodiments, the processing unit is specifically used for:

[0259] The source data contained in the target log item is merged to obtain the target data;

[0260] Based on the target data, perform data recovery processing on the database.

[0261] Optionally, in some embodiments, the processing unit is specifically used for:

[0262] Detect the table name, label pairs, indicator value labels, and first timestamp of each source data table;

[0263] If multiple first data tables have the same table name, label pair, indicator value label, and first timestamp, check the first log sequence number of the target log item where each first data is located; where the first data can be any source data.

[0264] Compare the size of the first log sequence number corresponding to each first data point;

[0265] The first data corresponding to the largest first log sequence number is retained, and the other first data are discarded.

[0266] Optionally, in some embodiments, the processing unit is specifically used for:

[0267] Based on the recovery query information, a query is performed in the corresponding cold backup database to obtain the second data;

[0268] The source data and the second data are merged to obtain the target data;

[0269] Based on the target data, perform data recovery processing on the database.

[0270] Optionally, in some embodiments, the processing unit is specifically used for:

[0271] Obtain the time interval information corresponding to each data shard in the database;

[0272] Based on the time interval information, the target data is divided to obtain the data subsets corresponding to each data slice;

[0273] According to the corresponding time interval information, each data subset is imported into the data shards of the database.

[0274] Optionally, in some embodiments, the processing unit is specifically used for:

[0275] Query the format information of the memory table used by each data shard;

[0276] Based on the format information, the data subsets corresponding to the data fragments are converted in format;

[0277] Import the data subset obtained after format conversion into the corresponding data slice.

[0278] It is understandable that, such as Figure 4 The content of the data recovery method embodiments shown in the examples is applicable to the data recovery system embodiments of this database. The specific functions implemented by the data recovery system embodiments of this database are the same as those shown in the examples. Figure 4 The data recovery method for the database shown is the same as that implemented in this example, and the beneficial effects achieved are the same as those described above. Figure 4 The beneficial effects achieved by the data recovery method embodiment shown in the diagram are also the same.

[0279] This application also discloses an electronic device, including:

[0280] At least one processor;

[0281] At least one memory for storing at least one program;

[0282] When at least one program is executed by at least one processor, such that at least one processor implements as Figure 4 The illustrated example is a data recovery method for a database.

[0283] The electronic device in the embodiments of this application may be a terminal device, a computer device, or a server device.

[0284] For example, refer to Figure 13 , Figure 13 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Taking a terminal device as an example, Figure 13In this context, the electronic device 1300 may include an RF (Radio Frequency) circuit 1310, a memory 1320 including one or more computer-readable storage media, an input unit 1330, a display unit 1340, a sensor 1350, an audio circuit 1360, a short-range wireless transmission module 1370, a processor 1380 including one or more processing cores, and a power supply 1390, among other components. Those skilled in the art will understand that... Figure 13 The device structure shown does not constitute a limitation on the terminal device and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0285] RF circuit 1310 can be used for receiving and transmitting signals during information transmission or calls. Specifically, it receives downlink information from the base station and hands it over to one or more processors 1380 for processing; additionally, it transmits uplink data to the base station. Typically, RF circuit 1310 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a SIM card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, etc. Furthermore, RF circuit 1310 can also communicate wirelessly with networks and other devices. Wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (Short Messaging Service), etc.

[0286] Memory 1320 can be used to store software programs and modules (or units). Processor 1380 executes various functional applications and data processing by running the software programs and modules (or units) stored in memory 1320. Memory 1320 may primarily include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.); the data storage area may store data created based on the use of electronic device 1300 (such as audio data, telephone directory, etc.). Furthermore, memory 1320 may include high-speed random access memory and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 1320 may also include a memory controller to provide access to memory 1320 for processor 1380 and input unit 1330. Although Figure 13 The RF circuit 1310 is shown, but it is understood that it is not a necessary component of the electronic device 1300 and can be omitted as needed without changing the nature of the invention.

[0287] The input unit 1330 can be used to receive input digital or character information, and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to object settings and function control. Specifically, the input unit 1330 may include a touch-sensitive surface 1331 and other input devices 1332. The touch-sensitive surface 1331, also known as a touch display screen or touchpad, can collect touch operations on or near the object (such as operations performed by the object using a finger, stylus, or any suitable object or accessory on or near the touch-sensitive surface 1331), and drive the corresponding connection device according to a pre-set program. Optionally, the touch-sensitive surface 1331 may include two parts: a touch detection device and a touch controller. The touch detection device detects the touch position of the object and the signal generated by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, sends it to the processor 1380, and can receive and execute instructions from the processor 1380. In addition, the touch-sensitive surface 1331 can be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave. Besides the touch-sensitive surface 1331, the input unit 1330 may also include other input devices 1332. Specifically, other input devices 1332 may include, but are not limited to, one or more of the following: a physical keyboard, function keys (such as volume control buttons, power buttons, etc.), a trackball, a mouse, and a joystick.

[0288] Display unit 1340 can be used to display information input by an object or information provided to an object, as well as various graphical object interfaces for controlling electronic device 1300. These graphical object interfaces can be composed of graphics, text, icons, video, and any combination thereof. Display unit 1340 may include display panel 1341, optionally configured as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), etc. Further, touch-sensitive surface 1331 may cover display panel 1341. When touch-sensitive surface 1331 detects a touch operation on or near it, it transmits the information to processor 1380 to determine the type of touch event. Subsequently, processor 1380 provides corresponding visual output on display panel 1341 according to the type of touch event. Although in Figure 13 In this embodiment, the touch-sensitive surface 1331 and the display panel 1341 are implemented as two separate components to realize input and output functions. However, in some embodiments, the touch-sensitive surface 1331 and the display panel 1341 can be integrated to realize input and output functions.

[0289] The electronic device 1300 may also include at least one sensor 1350, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 1341 according to the ambient light level, and the proximity sensor can turn off the display panel 1341 or the backlight when the electronic device 1300 is moved to the ear. As a type of motion sensor, a gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes). When stationary, it can detect the magnitude and direction of gravity and can be used for applications that recognize the phone's posture (such as landscape / portrait switching, related games, magnetometer posture calibration), vibration recognition-related functions (such as pedometers, taps), etc. Other sensors that the electronic device 1300 may be equipped with, such as gyroscopes, barometers, hygrometers, thermometers, and infrared sensors, will not be described in detail here.

[0290] Audio circuitry 1360, speaker 1361, and microphone 1362 provide an audio interface between the device and electronic device 1300. Audio circuitry 1360 converts received audio data into electrical signals and transmits them to speaker 1361, where speaker 1361 converts them into sound signals for output. Conversely, microphone 1362 converts collected sound signals into electrical signals, which are then received by audio circuitry 1360, converted back into audio data, processed by processor 1380, and transmitted via RF circuitry 1310 to another electronic device, or output to memory 1320 for further processing. Audio circuitry 1360 may also include an earphone jack to facilitate communication between external headphones and electronic device 1300.

[0291] The short-range wireless transmission module 1370 can be a WIFI (wireless fidelity) module, Bluetooth module, or infrared module, etc. The electronic device 1300 can transmit information with wireless transmission modules on other devices via the short-range wireless transmission module 1370.

[0292] Processor 1380 is the control center of electronic device 1300. It connects various parts of the device via various interfaces and lines, and performs various functions and processes data of electronic device 1300 by running or executing software programs or modules stored in memory 1320 and calling data stored in memory 1320, thereby providing overall control of the device. Optionally, processor 1380 may include one or more processing cores; optionally, processor 1380 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the aforementioned modem processor may also not be integrated into processor 1380.

[0293] Electronic device 1300 also includes a power supply 1390 (such as a battery) for supplying power to various components. Optionally, the power supply 1390 can be logically connected to the processor 1380 through a power management system, thereby enabling functions such as managing charging, discharging, and power consumption through the power management system. The power supply 1390 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

[0294] Although not shown, the electronic device 1300 may also include a camera, Bluetooth module, etc., which will not be described in detail here.

[0295] This application also discloses a computer-readable storage medium storing a processor-executable program, which, when executed by a processor, is used to implement, for example... Figure 4 The illustrated example is a data recovery method for a database.

[0296] Understandable, Figure 4 The data recovery method embodiments shown are applicable to the embodiments of this computer-readable storage medium. The specific functions implemented in the embodiments of this computer-readable storage medium are the same as those in the embodiments of this computer-readable storage medium. Figure 4 The data recovery method for the database shown is the same as the embodiment, and the beneficial effects achieved are the same. Figure 4 The beneficial effects achieved by the data recovery method embodiment shown in the diagram are also the same.

[0297] This application also discloses a computer program product or computer program, which includes computer instructions stored in the aforementioned computer-readable storage medium. Figure 13 The processor of the illustrated electronic device can read the computer instructions from the aforementioned computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform... Figure 4 The illustrated example is a data recovery method for a database.

[0298] Understandable, Figure 4 The data recovery method embodiments shown are applicable to this computer program product or computer program embodiment, and the specific functions implemented by this computer program product or computer program embodiment are the same as those described above. Figure 4 The data recovery method for the database shown is the same as the embodiment, and the beneficial effects achieved are the same. Figure 4 The beneficial effects achieved by the data recovery method embodiment shown in the diagram are also the same.

[0299] In some alternative embodiments, the functions / operations mentioned in the block diagrams may not occur in the order shown in the operation diagrams. For example, depending on the functions / operations involved, two consecutively shown blocks may actually be executed substantially simultaneously, or the blocks may sometimes be executed in reverse order. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and sub-operations described as part of a larger operation are executed independently.

[0300] Furthermore, although this application is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the functions and / or features may be integrated into a single physical device and / or software module, or one or more functions and / or features may be implemented in a separate physical device or software module. It is also understood that a detailed discussion of the actual implementation of each module is unnecessary for understanding this application. Rather, given the properties, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the module will be understood within the scope of conventional technology for an engineer. Therefore, those skilled in the art can implement the application set forth in the claims using ordinary techniques without excessive experimentation. It is also understood that the specific concepts disclosed are merely illustrative and not intended to limit the scope of this application, which is determined by the full scope of the appended claims and their equivalents.

[0301] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0302] In this application embodiment, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0303] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable storage medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.

[0304] It should be understood that various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0305] In the foregoing description of this specification, the references to terms such as "one embodiment," "another embodiment," or "some embodiments," etc., indicate that a specific feature, structure, material, or characteristic described in connection with an embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0306] Although embodiments of this application have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and spirit of this application, the scope of which is defined by the claims and their equivalents.

[0307] The above is a detailed description of the preferred embodiments of this application, but this application is not limited to the embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of this application, and these equivalent modifications or substitutions are all included within the scope defined by the claims of this application.

Claims

1. A method for data recovery from a database, characterized in that, The method includes: The log stream of the database is parsed to determine the first log entry containing the source data of the database; Based on the first log entry, first index data is established; wherein, the index hierarchy of the first index data is, in order, the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log entry, and the first timestamp corresponding to the source data contained in the first log entry; In response to a data recovery request from a target object for the database, it is determined that the target object specifies a target deletion operation for data recovery; Based on the target deletion operation, recovery query information is determined; wherein, the recovery query information includes at least one of the attribute information or time range information of the data table corresponding to the target deletion operation, and the attribute information includes the table name or the tag pair; Based on the recovery query information and the first index data, a matching target log entry is determined in the first log entry; Data recovery processing is performed on the database based on the source data contained in the target log item.

2. The database data recovery method according to claim 1, characterized in that, The step of determining the recovery query information based on the target deletion operation includes: Detect the deletion type of the target deletion operation; If the deletion type of the target deletion operation is table deletion, the first recovery query information is generated based on the table name and time range information of the data table corresponding to the target deletion operation; If the deletion type of the target deletion operation is timeline deletion, a second recovery query is generated based on the label pairs and time range information of the data table corresponding to the target deletion operation.

3. The database data recovery method according to claim 2, characterized in that, The step of determining the matching target log entry in the first log entry based on the recovery query information and the first index data includes: Using the table name and time range information in the first recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data; or, using the tag pair and time range information in the second recovery query information as index conditions, determine the first target sequence number matching in the first log sequence number of the first index data. The first log entry corresponding to the first log sequence number is identified as the matching target log entry.

4. The database data recovery method according to claim 1, characterized in that, The method further includes: The log stream of the database is parsed to determine the second log entry corresponding to the historical deletion operation; Based on the second log entry, a second index data is established; wherein, the index hierarchy of the second index data is, in order, the second log sequence number of the second log entry and the second timestamp corresponding to the historical deletion operation in the second log entry.

5. The database data recovery method according to claim 4, characterized in that, In response to a data recovery request from a target object for the database, determining the target deletion operation specified by the target object for data recovery includes: In response to the data recovery request from the target object for the database, determine the historical time period for which the target object is to perform data recovery; Query the second timestamp that belongs to the historical time period in the second index data, and determine the second target sequence number by the second log sequence number corresponding to the second timestamp that belongs to the historical time period. The historical deletion operation corresponding to the second target serial number is identified as the target deletion operation for data recovery specified for the target object.

6. The database data recovery method according to claim 1, characterized in that, After the step of establishing the first index data based on the first log entry, the method further includes: The frequency of occurrence of each field in the first index data is detected, and the fields whose frequency of occurrence is greater than a preset threshold are identified as target fields; Different identifier information is assigned to each of the target fields, and the association between the target fields and the identifier information is stored; wherein the number of characters in the identifier information is less than the number of characters in the target field; The target field is replaced by the identification information to obtain the updated first index data.

7. The database data recovery method according to claim 1, characterized in that, The step of performing data recovery processing on the database based on the source data contained in the target log item includes: The source data contained in the target log item is merged to obtain the target data; Based on the target data, perform data recovery processing on the database.

8. The database data recovery method according to claim 7, characterized in that, The merging process of the source data contained in the target log item includes: Detect the table name, label pairs, indicator value labels, and first timestamp of each data table to which the source data belongs; If multiple first data tables have the same table name, label pair, indicator value label, and first timestamp, the first log sequence number of the target log item where each first data is located is detected; wherein, the first data is any source data; Compare the size of the first log sequence number corresponding to each of the first data; The first data corresponding to the largest first log sequence number is retained, and the other first data are discarded.

9. The database data recovery method according to claim 1, characterized in that, The step of performing data recovery processing on the database based on the source data contained in the target log item includes: Based on the recovery query information, a query is performed in the cold backup corresponding to the database to obtain the second data; The source data and the second data are merged to obtain the target data; Based on the target data, perform data recovery processing on the database.

10. The data recovery method for a database according to any one of claims 7-9, characterized in that, The data recovery process for the database includes: Obtain the time interval information corresponding to each data shard in the database; Based on the time interval information, the target data is divided to obtain a data subset corresponding to each data slice; According to the corresponding time interval information, each data subset is imported into the data shards of the database.

11. The data recovery method for a database according to claim 10, wherein importing each of the data subsets into the data shards of the database comprises: Query the format information of the memory table used by each of the data shards; Based on the format information, the data subset corresponding to the data fragment is format converted; The data subset obtained after format conversion is imported into the corresponding data fragment.

12. A database data recovery system, characterized in that, The system includes: The parsing unit is used to parse the log stream of the database and determine the first log item containing the source data of the database; The establishment unit is used to establish first index data based on the first log item; wherein, the index hierarchy of the first index data is, in order, the table name of the data table to which the source data belongs, the tag pair, the first log sequence number of the first log item, and the first timestamp corresponding to the source data contained in the first log item; A response unit is configured to, in response to a data recovery request from a target object for the database, determine the target deletion operation specified by the target object for data recovery; An execution unit is configured to determine recovery query information based on the target deletion operation; wherein the recovery query information includes at least one of attribute information or time range information of the data table corresponding to the target deletion operation, and the attribute information includes the table name or the tag pair; The matching unit is used to determine the target log entry to match in the first log entry based on the recovery query information and the first index data; The processing unit is used to perform data recovery processing on the database based on the source data contained in the target log item.

13. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the database data recovery method according to any one of claims 1 to 11.

14. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the database data recovery method according to any one of claims 1 to 11.

15. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the database data recovery method according to any one of claims 1 to 11.