Data consistency mechanism for hybrid data processing

By employing a global snapshot isolation mechanism and the Quorum, Gossip, and LSN mechanisms, the data consistency problem between OLTP and OLAP components in the HTAP system is resolved, achieving strong data consistency and flexible modular design, thereby improving the real-time performance and consistency of data processing.

CN117677943BActive Publication Date: 2026-06-19FACE CUTE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FACE CUTE CO LTD
Filing Date
2022-08-02
Publication Date
2026-06-19

Smart Images

  • Figure CN117677943B_ABST
    Figure CN117677943B_ABST
Patent Text Reader

Abstract

This disclosure describes a technique for providing data consistency for hybrid transactional and analytical processing. A logical log and a Log Sequence Number (LSN) associated with the logical log can be generated based on data captured by a first processing engine. The logical log and LSN can be propagated to a storage subsystem configured to communicate with both the first and second processing engines. The LSN and information indicating the LSN mode version can be stored and distributed by a metadata service. The first processing engine, the second processing engine, the storage subsystem, and the metadata service are modularized and support an LSN mechanism for maintaining data consistency.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application claims priority to U.S. Application No. 17 / 462,938, filed August 31, 2021, entitled “DATA CONSISTENCY MECHANISM FOR HYBRID DATA PROCESSING,” the entire contents of which are incorporated herein by reference. Background Technology

[0003] Data processing refers to the process of performing specific operations on a collection of data or a database. A database is an organized collection of facts and information, such as records about inventory, customers, etc. Various forms of data processing exist and serve different applications within a business environment. As databases are increasingly used to store large amounts of complex data, improvements in data processing technologies are expected. Attached Figure Description

[0004] The following detailed description will be better understood when read in conjunction with the accompanying drawings. For illustrative purposes, exemplary embodiments of various aspects of this disclosure are shown in the drawings; however, the invention is not limited to the specific methods and tools disclosed.

[0005] Figure 1 An example system including cloud services is shown.

[0006] Figure 2 An example system including more than one cloud service is shown.

[0007] Figure 3 An example system for hybrid transactional and analytical processing is shown.

[0008] Figure 4 An example architecture is shown to ensure data consistency between replicas during read and write processes.

[0009] Figure 5 Example snapshots of the incremental repository and the base repository are shown.

[0010] Figure 6A Example snapshots of the incremental repository and the base repository are shown.

[0011] Figure 6B It shows Figure 6A Another example snapshot of the incremental and base repositories.

[0012] Figure 7A Example snapshots of the incremental repository and the base repository are shown.

[0013] Figure 7B It shows Figure 7AAnother example snapshot of the incremental and base repositories.

[0014] Figure 8 An example process for ensuring strong data consistency in a hybrid transactional and analytical processing system is shown.

[0015] Figure 9 An example process for ensuring strong data consistency in a hybrid transactional and analytical processing system is shown.

[0016] Figure 10 An example process for ensuring strong data consistency in a hybrid transactional and analytical processing system is shown.

[0017] Figure 11 An example computing device is shown that can be used to perform any of the techniques disclosed herein. Detailed Implementation

[0018] Online Analytical Processing (OLAP) systems enable users to interactively analyze multidimensional data from multiple perspectives. Multidimensional data includes datasets with three or more dimensions. OLAP systems allow users to analyze information from multiple database systems simultaneously. OLAP systems enable analysts to extract and view data, such as business data, from different perspectives. Analysts often need to group, aggregate, and join data. These OLAP operations in data mining are resource-intensive. OLAP allows for pre-computation and pre-aggregation of data, making analysis faster. However, traditionally, OLAP systems typically load large amounts of data periodically in batches. This makes OLAP systems susceptible to outdated data.

[0019] OLAP is typically contrasted with OLTP (Online Transaction Processing) systems. OLTP systems capture, store, and process data from transactions. OLTP systems are generally characterized by lower query complexity and higher query volume, and are used for transaction processing rather than business intelligence or reporting purposes. Traditional OLTP systems support Data Manipulation Language (DML), a computer programming language used to add (insert), delete, and modify (update) data in the database. Traditional OLTP systems can also efficiently support point lookup queries.

[0020] Significant differences exist between OLAP and OLTP systems. For example, OLTP systems typically lack massively parallel query engines (such as those found in OLAP systems) to support efficient and complex query processing of large amounts of data. As another example, OLAP systems are primarily optimized for read-only scenarios and may not support other types of queries, while OLTP systems handle all types of queries (read, insert, update, and delete). Yet another example, OLTP systems are associated with short atomic transactions, while OLAP systems allow for more flexible distribution patterns and higher scalability, but with increased latency and no guaranteed upper limit on processing time.

[0021] Many popular data platforms / systems focus on only one of these workloads (e.g., OLAP or OLTP). However, many scenarios (e.g., business scenarios) require both complex OLAP-like analyses of newly imported data and the transaction support and strong data consistency offered by OLTP systems. Hybrid Transactional / Analytical Processing (HTAP) systems are best suited for such scenarios.

[0022] HTAP systems offer several unique advantages. In an HTAP system, OLAP and OLTP workloads are unified within a single system. By consolidating OLAP and OLTP workloads into a single system, the complexity and cost of deployment and maintenance can be significantly reduced. This versatile system can significantly reduce the staleness of query results (which is typically introduced by time-consuming and expensive ETL processes from the operational database to the data warehouse). Such systems also have the ability to perform complex analyses of real-time data, thus addressing modern business models that often require efficient responses to transient opportunities that must be utilized in real time.

[0023] However, many existing HTAP systems have drawbacks. Many existing products simply build HTAP systems by gluing OLTP and OLAP systems together. For example, many existing products simply use ETL technology to glue OLTP and OLAP systems together to load data from OLTP storage devices to OLAP storage devices. By merely gluing OLTP and OLAP systems together, HTAP systems lack desired characteristics, such as strong data consistency across the OLTP and OLAP components of the HTAP system. Strong data consistency across the OLTP and OLAP components of the HTAP system is crucial for many business scenarios. Therefore, an HTAP system that addresses these shortcomings is desired.

[0024] This paper describes an HTAP system with strong data consistency across the OLTP and OLAP components of an HTAP system. Unlike existing HTAP systems, the HTAP system described herein is configured to provide global snapshot isolation across the OLTP and OLAP components and is configured to support OLAP queries that read the latest updates in the current system—thus ensuring strong data consistency. For example, mechanisms including, but not limited to, Quorum and Gossip protocols and Log Sequence Numbers (LSNs) can be used to provide global snapshot isolation across the OLTP and OLAP components. This mechanism provides access to a consistent snapshot of the dataset to all queries in the HTAP system. Dirty, corrupted, and / or outdated data is not provided to users. In some embodiments, a flexible interface can be provided to users of the HTAP system described herein to select the desired level of data consistency.

[0025] HTAP systems with strong data consistency across OLTP and OLAP components can have a flexible, modular design. A modular HTAP system can consist of several decoupled main components: an OLTP engine, an OLAP engine, decoupled storage serving both the OLTP and OLAP engines, a metadata service, and an intelligent agent. The decoupled storage components can include an OLTP row repository and an OLAP repository, which have two parts: an in-memory incremental repository that keeps things up-to-date and an on-disk basic repository that holds large data blocks. Data from the incremental repository can be periodically flushed to the basic repository as new data blocks. The LSN mechanism used by the HTAP system to ensure strong data consistency can be a collaborative effort of all these decoupled components, without being coupled to any particular component. Each component can be unplugged and replaced with a replacement component, provided that the replacement component is configured to extend LSN support through a public API.

[0026] HTAP systems, such as the improved HTAP systems described above, can be used by a variety of different systems or entities. Figure 1 The illustration shows an example system 100 including an HTAP system. System 100 may include a cloud network 102 and multiple client devices 104a-d. The cloud network 102 and the multiple client devices 104a-d can communicate with each other via one or more networks 120.

[0027] Cloud network 102 may be located in a data center (such as a single location) or distributed across different geographical locations (e.g., in several locations). Cloud network 102 may provide services via one or more networks 120. Network 120 includes various network devices such as routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices, etc. Network 120 may include physical links such as coaxial cable links, twisted-pair cable links, fiber optic links, and combinations thereof. Network 120 may include wireless links such as cellular links, satellite links, Wi-Fi links, etc.

[0028] Cloud network 102 may include multiple computing nodes 118 hosting various services. In one embodiment, node 118 hosts service 112. Service 112 may include content streaming services, such as Internet Protocol (IP) video streaming services. Service 112 may be configured to distribute content via various transport technologies. Service 112 is configured to provide content, such as video, audio, text data, combinations thereof, etc. Content may include content streams (e.g., video streams, audio streams, information streams), content files (e.g., video files, audio files, text files), and / or other data. Content may be stored in a database. For example, service 112 may include video sharing services, video hosting platforms, content distribution platforms, collaborative gaming platforms, etc. Service 112 may include any other type of service besides or in lieu of content streaming services.

[0029] In one embodiment, service 112 may be provided to client device 104 via network 120. If service 112 is a content streaming service, then content may be output to different client devices 104 via network 120. Content may be streamed to client devices 104. The content stream may be a stream of short videos received from service 112. Multiple client devices 104 may be configured to access content from service 112. In one embodiment, client device 104 may include an application. The application outputs (e.g., displays, renders, presents) content to a user associated with client device 104. The content may include video, audio, comments, text data, etc.

[0030] In one embodiment, a user can use an application on client device 104 to create content and upload short videos to cloud network 102. Client device 104 can access an interface of the application. This interface may include input elements. For example, the input elements may be configured to allow the user to create content. To create content, the user can grant the application permission to access an image capture device, such as the camera or microphone of client device 104. After the user creates the content, the user can use the application to upload the content to cloud network 102 and / or save the content locally to user device 104. Service 112 may store the uploaded content and any metadata associated with that content in one or more databases.

[0031] Multiple client devices 104 can include any type of computing device, such as mobile devices, tablets, laptops, desktop computers, smart TVs or other smart devices (e.g., smartwatches, smart speakers, smart glasses, smart helmets), gaming devices, set-top boxes, digital streaming devices, robots, etc. Multiple client devices 104 can be associated with one or more users. A single user can use one or more of the multiple client devices 104 to access the cloud network 102. Multiple client devices 104 can travel to various locations and use different networks to access the cloud network 102.

[0032] Multiple compute nodes 118 can handle tasks associated with service 112. Multiple compute nodes 118 can be implemented as one or more compute devices, one or more processors, one or more virtual compute instances, or combinations thereof. Multiple compute nodes 118 can be implemented by one or more compute devices. One or more compute devices can include virtualized compute instances. Virtualized compute instances can include virtual machines, such as simulations of computer systems, operating systems, servers, etc. Virtual machines can be loaded by compute devices based on virtual images and / or other data defining specific software (e.g., operating systems, dedicated applications, servers) used for simulation. When the demand for different types of processing services changes, different virtual machines can be loaded and / or terminated on one or more compute devices. A hypervisor can be implemented to manage the use of different virtual machines on the same compute device.

[0033] In one embodiment, service 112 includes HTAP system 110. HTAP system 110 may include multiple different components (e.g., subsystems). For example, HTAP system 110 may include one or more of a transactional OLTP engine, an analytics OLAP engine, an underlying decoupled storage device, a metadata service, and / or an intelligent agent. References are provided below. Figure 3 Figure 7 discusses the architecture of the HTAP system 110 in more detail. The following also discusses... Figure 3Figure 7 discusses additional details about each subsystem.

[0034] HTAP system 110 can have an architecture that supports heterogeneous query engines. This architecture can handle both transactional OLTP workloads and complex analytical (OLAP) workloads. The architecture can follow a modular design, and its main components can be fully decoupled, providing flexibility and ease of expansion. For example, components of HTAP system 110 can be easily changed to similar, established subsystems. This architecture can eliminate interference between OLTP and OLAP workloads by having separate query processing engines and different data replicas.

[0035] HTAP system 110 can store user data in different formats for OLTP and OLAP engines. For example, HTAP system 110 can store user data in row format for OLTP engines and in column format for OLAP engines for efficient query processing. This architecture can have a single point of truth for metadata and can use a separate metadata service to provide up-to-date metadata to certain components of HTAP system 110. The architecture of HTAP system 110 can include an intelligent agent that dispatches queries to OLTP and OLAP subsystems based on the nature of the query (and thus hides internal details from the user / client). Users / clients can utilize HTAP system 110 with a single unified interface. For example, users / clients can utilize HTAP system 110 with an interface for client device 104. This architecture can support various APIs (e.g., ANSI SQL, JDBC, ODBC, etc.) based on user requirements.

[0036] The architecture of the HTAP system 110 is capable of handling large-scale data. This is a result of the fact that the compute and storage components in the HTAP system 110 can be decoupled. The decoupled storage system can persistently store the large amounts of data we use because it does not assume that the data can fit into memory. The compute and storage resources in the HTAP system 110 can also be scaled, and therefore can flexibly handle large amounts of data and large-scale (OLTP and OLAP) workloads.

[0037] The architecture of HTAP System 110 enables efficient and real-time data processing. DML queries can be efficiently processed by the OLTP engine and written efficiently to the underlying storage in row format. The architecture of HTAP System 110 can include an OLAP query engine with distributed query processing capabilities (high parallelism, better resource utilization) to efficiently handle complex OLAP queries, including joins and aggregations. Efficient and real-time processing is facilitated by the architecture of HTAP System 110, which promotes storing different copies of data on the OLTP and OLAP sides, thus minimizing interference between OLTP and OLAP workloads. OLTP and OLAP data formats can be individually optimized to suit their workloads. A single source of data change can exist through HTAP System 110 (from the OLTP side), simplifying consistency models and concurrent processing across OLTP and OLAP components.

[0038] The architecture of HTAP System 110 provides fresh / real-time data changes for OLAP queries. Logical logs of DML operations are propagated from the OLTP component to the OLAP component immediately after commit. These logs can be dispatched to distributed partitions and continuously applied to the in-memory incremental repository via in-memory operations, which is typically very fast. Data changes carried by the logical logs are immediately available for OLAP queries after being applied to the in-memory incremental repository. The architecture of HTAP System 110 leverages unified versioning across HTAP System 110, ensuring strong data consistency. The OLTP component of HTAP System 110 supports snapshot isolation and other (weaker) consistency models, similar to most transactional OLTP database engines.

[0039] Although Figure 1 System 100 presents HTAP system 110 as being provided by a single cloud network 102, but various components / subsystems of HTAP system 110 may instead be provided by multiple different cloud networks. Figure 2 An example system 200 is illustrated, comprising an HTAP system with components / subsystems spanning multiple cloud networks. System 200 may include cloud networks 202a-b and multiple client devices 204a-d. Cloud networks 202a-b and multiple client devices 204a-d may communicate with each other via one or more networks 220.

[0040] Each of the cloud networks 202a-b can be similar to the one described above. Figure 1The cloud network 102 described herein. Each of the cloud networks 202a-b may be located in a data center, such as a single location, or distributed across different geographical locations (e.g., in several locations). The cloud network 202a-b may provide services(s) via one or more networks 220. The cloud network 202a-b includes various network devices, such as routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices, etc. The cloud network 202a-b may include physical links, such as coaxial cable links, twisted-pair cable links, fiber optic links, combinations thereof, etc. The cloud network 202a-b may include wireless links, such as cellular links, satellite links, Wi-Fi links, etc.

[0041] Each of cloud networks 202a-b may include multiple compute nodes hosting various services. In one embodiment, there is a node hosting service 212a associated with cloud network 202a, and a node hosting service 212b associated with cloud network 202b. Services 212a-b may include any type of service, such as those mentioned above. Figure 1 The content streaming service described.

[0042] Multiple client devices 204 can include any type of computing device, such as mobile devices, tablets, laptops, desktop computers, smart TVs or other smart devices (e.g., smartwatches, smart speakers, smart glasses, smart helmets), gaming devices, set-top boxes, digital streaming devices, robots, etc. Multiple client devices 104 can be associated with one or more users. A single user can use one or more of the multiple client devices 104 to access at least one of the cloud networks 202a-b. Multiple client devices 104 can travel to various locations and use different networks to access the cloud networks 202a-b.

[0043] In one embodiment, each of services 212a-b includes one or more components / subsystems of an HTAP system (e.g., HTAP system 110). HTAP system 110 may include multiple different components (e.g., subsystems). For example, the HTAP system may include one or more of a transactional OLTP engine, an analytics OLAP engine, an underlying decoupled storage device, a metadata service, and / or an intelligent agent. Service 212a may include one or more (but not all) components of the HTAP system. Service 212b may include the remaining components of the HTAP system. Although... Figure 2 Two cloud networks and their respective services are shown in the document, but it should be recognized that any number of cloud networks / services can be used to implement the HTAP system described herein.

[0044] Figure 3An exemplary architecture 300 for an HTAP system (e.g., HTAP system 110) according to this disclosure is shown. Architecture 300 includes multiple different components (e.g., subsystems). These subsystems include an OLTP engine 304, an OLAP engine 308, a data storage system 310, a metadata service 306, and / or a proxy 302.

[0045] As described above, OLTP engine 304 can receive user data, such as data from user devices 301a-n. OLTP engine 304 can utilize a model that separates the computing engine from the underlying shared / cloud storage device (similar to AWS Aurora). OLTP engine 304 can provide functionality such as ACID transaction support, row storage, write-ahead log (WAL), and log replication. For example, OLTP engine 304 can capture user data in real-time or near real-time.

[0046] The OLTP engine 304 can be configured to store and / or process at least some of the received user data in a specific format (e.g., a row-based format). For example, upon receiving an OLTP query, the OLTP engine 304 can be configured to process at least some of the received user data in response to that query. By way of example and not limitation, an OLTP query may include a query to retrieve a specific data item, a query to filter the received data to find a specific data item / description of a data item, and / or a query to filter the received data to identify a specific subset of the received data. For example, the query may be a DML query and / or a point lookup query.

[0047] OLTP engine 304 can be configured to be coupled to and / or decoupled from a larger HTAP system 300. If OLTP engine 304 is decoupled from the larger HTAP system 300, then OLTP engine 304 can be replaced with an alternative OLTP engine (e.g., a similar established OLTP engine). For example, OLTP engine 304 can be replaced with an alternative OLTP engine configured to extend LSN support via a public API. For example, OLTP engine 304 could be MySQL, but it can be easily swapped to any other established OLTP engine, such as PostgreSQL.

[0048] User equipment 301a-n can include any type of computing device, such as mobile devices, tablets, laptops, desktop computers, smart TVs or other smart devices (e.g., smartwatches, smart speakers, smart glasses, smart helmets), gaming devices, set-top boxes, digital streaming devices, robots, etc. User equipment 301a-n can be associated with one or more users. A single user can use one or more user equipment 301a-n to access the cloud network including OLTP engine 304. User equipment 301a-n can travel to various locations and use different networks to access the cloud network including OLTP engine 304.

[0049] As also stated above, OLAP engine 308 can receive data captured by OLTP engines (such as OLTP engine 304). The data received by OLAP engine 308 can be in a specific format readable by OLAP engine 308 (e.g., a mixed row + column format). OLAP engine 308 can perform analysis on at least a portion of the data in the specific format in response to receiving an OLAP query.

[0050] For example, the OLAP engine 308 can perform analytical operations on at least a portion of the data. OLAP consists of three basic analytical operations: merging (summarizing), drill-down, and slicing and dicing. Merging involves aggregating data that can be accumulated and calculated across one or more dimensions. For example, summarizing all sales offices into a sales department or sales division to predict sales trends. In contrast, drill-down is a technique that allows users to browse details. For example, a user can view the sales figures of individual products that make up the sales figures for a particular region. Slicing and dicing are features that allow users to extract (slice) specific datasets from an OLAP cube and view (dic) slices from different perspectives. These perspectives are sometimes referred to as dimensions (such as viewing the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). The results of the processing can be sent or forwarded to client device 301.

[0051] OLAP engine 308 can be any OLAP engine capable of efficiently handling complex analytical queries, such as established open-source query engines (e.g., FlinkSQL). OLAP engine 308 can be configured to be coupled to and / or decoupled from the larger HTAP system 300. If OLAP engine 308 is decoupled from the larger HTAP system 300, then OLAP engine 308 can be replaced with an alternative OLAP engine (e.g., a similar established OLAP engine). For example, OLAP engine 308 can be replaced with an alternative OLAP engine configured to extend LSN support via a public API. For example, OLAP engine 308 can be FlinkSQL, but it can be easily swapped to any other established OLAP engine, such as Presto or SparkSQL.

[0052] Storage subsystem 310 can be decoupled from OLTP engine 304 and OLAP engine 308. Also as described above, data storage system 310 can persistently store user data in one format (e.g., row format) for use by OLTP engines (such as OLTP engine 304), while persistently storing the same user data in a different format (e.g., mixed row + column format) for use by OLAP engines (such as OLAP engine 308). This can significantly reduce interference between OLAP and OLTP workloads. Multiple data replicas (e.g., three data replicas) can be maintained for both formats to achieve high availability.

[0053] Data storage system 310 can be used as a unified storage layer. However, the architecture of data storage system 310 can be divided into two parts: TP part and AP part. Transaction DML and DDL received by the OLTP engine can be presented as physical / redo logs (with information about the underlying storage device) and logical logs in log repository 312. These logs can be persistently stored in the TP part of the storage device. The physical logs can then be replicated and distributed to other storage devices by log distributor 314 and replayed to construct data pages. Data in the pages can be organized in row format and stored in row data repository 316. Data stored in row data repository 316 can be used by the OLTP engine for simple queries including point lookup queries. Logical logs can also be replicated and distributed to the AP part of the storage device by log distributor 314.

[0054] Each user table in the AP portion of the storage device can be partitioned based on a partitioning scheme defined at table creation time. Each partition 317 can reside on a physical storage node (e.g., a server) and can maintain several replicas of the data for high availability. Each partition 317 can also be divided into an in-memory incremental repository 318 and an on-disk basic repository 320. The logical log of committed DML transactions (such as a MySQL binary log) can be continuously dispatched from the OLTP engine to the user table partitions on each AP storage node. As an example, updates can be modeled as deletes followed by inserts. Therefore, there can be only inserts and deletes in the logical log, without updates.

[0055] Logical logs arriving at each AP storage node can be sorted, persisted, and then applied sequentially to the in-memory incremental repository 318 of each partition 317. The incremental repository 318 can store data in a row-based format, sorted by its log sequence number (LSN, a sequence number / version indicating the order of data operations). For example, when the size of the data in the incremental repository 318 grows beyond a predefined threshold or after a fixed time interval, the data in the incremental repository 318 can be periodically flushed to the base repository 320. After flushing, the memory occupied by the flushed data can be garbage collected.

[0056] Data in the base repository 320 can be organized in a columnar format for better performance of analytical workloads and is persistently stored in the local file system (using the current implementation). However, it should be recognized that this architecture works with any underlying storage method. Data in the base repository 320 may not be updated in-place, so a deletion bitmap can be maintained to mark rows that have been deleted. As more and more data units are deleted and overlapped with refreshes, compressing and rearranging internal data can benefit storage and potential query performance, so AP data units can be compressed periodically and redundant data cleaned up. Files in the base repository 320 can be versioned, which is the largest LSN in the file when the file is refreshed. The LSN may not be retained in the base repository 320, resulting in all files in the base repository 320 having the same version (i.e., the LSN from the previous refresh). This can provide several advantages. For example, it can save storage space, make deletion more efficient by using a deletion bitmap, and make scans faster without comparing LSNs.

[0057] Storage subsystem 310 can be configured to be coupled to and / or decoupled from the larger HTAP system 300, such as Figure 3As shown in the diagram. If storage subsystem 310 is decoupled from the larger HTAP system 300, then storage subsystem 310 can be replaced with an alternative storage subsystem (e.g., a similar established storage subsystem). For example, storage subsystem 310 can be replaced with an alternative storage subsystem configured to extend support for LSNs via a public API.

[0058] Metadata service 306 can be configured to align the metadata of OLTP engine 304 with that of OLAP engine 308. Metadata service 306 can extract metadata from events (e.g., DDL) generated by certain resources (e.g., user input via OLTP engine 308), generate metadata versions, align them with the order of the DML, make them globally available, and persist them. Metadata service 306 can generate replicas of metadata versions for high availability. Metadata service 306 can extract metadata from DDL received by the OLTP engine (with versions aligned to the same LSN system used for DML), persist it to a dedicated database, and push / pull it from FlinkSQL and AP storage servers.

[0059] Metadata service 306 can be configured to be coupled to and / or decoupled from the larger HTAP system 300. If metadata service 306 is decoupled from the larger HTAP system 300, then metadata service 306 can be replaced by an alternative metadata service (e.g., a similarly established metadata service). For example, metadata service 306 can be replaced by an alternative metadata service configured to extend LSN support via a public API.

[0060] Agent 302 can be configured to connect user devices 301a-n to OLTP engine 304 and / or OLAP engine 308 through a common proxy layer. Agent 302 can be a smart agent. Agent 302 can provide a single, unified API to users / clients (default: ANSL SQL plus some common OLAP extensions), meaning that the underlying system details are transparent to the client if the client connects through the agent. Various APIs (e.g., ANSI SQL, JDBC, ODBC, etc.) can be used based on user requirements. Using a single, unified API reduces the workload for users interacting with the system. Agent 302 can have the ability to automatically dispatch different client requests / queries to different engines (e.g., OLTP engine 304 or OLAP engine 308) based on the nature of the request. For example, complex OLAP queries will be routed to OLAP engine 308, while DML, DDL, and point lookup queries will be routed to OLTP engine 304.

[0061] Agent 302 can be configured to be coupled to and / or decoupled from the larger HTAP system 300. If agent 302 is decoupled from the larger HTAP system 300, then agent 302 can be replaced with an alternative public agent (e.g., a similar established agent). For example, agent 302 can be replaced with an alternative agent configured to extend LSN support via a public API.

[0062] Each of the OLTP engine 304, OLAP engine 308, storage subsystem 310, and metadata service 306 can utilize LSNs to ensure strong data consistency in the HTAP system. The OLTP engine handles OLTP transactions and generates LSNs. The LSN used for each data update is generated using the Write-Ahead Log (WAL) during the transaction commit process. For example, the OLAP engine 304 can coordinate OLAP queries by requesting a global read LSN from the metadata service 306 and initiating a distributed scan task to use this read LSN request data. The LSN, along with the log, is propagated from the TP to the AP component of the decoupled storage device 310 and can be applied to the incremental repository of storage device 310. Storage device 310 can use multiple LSNs to indicate the progress / status of various operations. The metadata service 306 can store the progress of persistent LSNs on the storage server and use it as a read_LSN for OLAP queries. The metadata service 306 can also store schema LSNs to support DDL. By default, the intelligent agent 302 is unaware of any LSNs. In READ_YOUR_WRITE consistency mode, agent 302 may need to collect the LSN returned by the query and use that LSN to issue the next query.

[0063] As discussed above, to ensure strong data consistency in HTAP systems, the Quorum and / or Gossip protocols can be used to provide global snapshot isolation across OLTP and OLAP engines. The Quorum protocol and Gossip framework can extend single-partition consistent reads and writes to a globally distributed cluster. Figure 4 An example architecture 400 is shown to ensure data consistency between replicas during read and write processes. Architecture 400 utilizes the Quorum and Gossip protocols as building blocks to ensure strong data consistency in the HTAP system. Data in replicated partitions can be stored and / or accessed across multiple data nodes 402, 404, and 406 in cluster 401. For a given partition, multiple replicas can exist on different nodes, thus preserving the same data for fault tolerance.

[0064] The Quorum protocol can be applied to ensure data consistency among these replicas during read and write processes. Quorum is a type of technology that can be used to perform constant functions or operations in a "distributed system." It uses distributed transactions, obtaining the least-vote agreement from the distributed system. Leveraging Quorum-based data consistency, read and write requests can always be serviced by multiple healthy nodes (e.g., two or more nodes in clusters 402, 404, and 406), and data across cluster 401 can remain in a consistent state. Any failed, corrupted, or slow replicas will not expose any data correctness issues to external clients, such as stale, dirty, or corrupted data.

[0065] For example, architecture 400 utilizes a 2+2 Quorum setup. Each partition is served by three replicas / nodes (nodes 402, 404, and 406). Framework 400 sends write requests to all three replicas / nodes 402, 404, and 406, and the write operation succeeds only after two of them have succeeded. Figure 4 As shown, write requests sent to nodes 402 and 404 succeeded, while a write request sent to node 406 failed. However, since the writes succeeded for two of these nodes, the writes are still considered successful. During a read, frame 400 can query at least two nodes to request data status and select the node with data that satisfies the read consistency requirements. Because there are at least two nodes that keep the latest writes, we can tolerate the failure of one node while still ensuring successful and consistent reads. Figure 4 As shown, node 406 failed, but other nodes 402 and 404 succeeded. Therefore, despite the failure of node 406, the reads were still successful and consistent.

[0066] Besides the Quorum algorithm, the Gossip protocol can also be used to remedy any temporary inconsistencies between replicas 402, 404, and 406. The Gossip protocol is a program or process of peer-to-peer communication between computers based on the propagation of epidemics. Some distributed systems use peer-to-peer Gossip to ensure that data is distributed to all members of the group. For example, if any replica 402, 404, or 406 falls behind other replicas due to an unexpected failure such as disk or network failure, then using the Gossip protocol, the replica will catch up with its peer during message exchange between nodes. For example, node 406 can catch up with LSN 110 by itself due to the Gossip protocol.

[0067] Quorum and Gossip protocols can be fully utilized to keep data in a consistent state across distributed partitions in the cluster. Furthermore, an LSN mechanism can be used to ensure local consistency of data reads and writes on each partition. When a query is compiled in the coordinator, a snapshot of the query's data can be calculated in the form of a global LSN. This global LSN can be distributed to each node that maintains the desired partition, and the local data portion of the snapshot LSN can be retrieved. Therefore, even if data is updated immediately after the compilation point in the coordinator, no more or less data will be accessed on each node.

[0068] On a single partition, a Log Sequence Number (LSN) can be used. An LSN is an incrementing integer indicating the local storage state. Each partition maintains its own local LSN, and the LSN can also represent a global and logical snapshot of the storage state across all partitions. For example, when a storage device has LSN 100, and after a new data insertion arrives, the storage device LSN will move to 101. A global read_LSN can be initially assigned to queries from the metadata service by the coordinator. The global read_LSN can be an LSN that all storage partitions have been holding at compiling time. This global read_LSN can then be sent to each storage server to determine how much data should be read locally on each partition. For example, a global read LSN of 100 would retrieve data on every partition up to LSN 100.

[0069] The LSN used in OLAP storage devices is described in more detail below. However, it should be recognized that OLTP storage devices in HTAP systems can be disposed of in a similar manner, and queries in our HTAP system can access both OLAP and OLTP storage devices in a consistent manner using a single read LSN.

[0070] As mentioned above Figure 3 The foundation of the storage system discussed (each replica of each partition of the table) consists of an in-memory incremental repository and an on-disk base repository. The incremental repository accepts new data as a logical log, which is then converted into two lists: an insert incremental list and a delete incremental list. After a predetermined amount of time and / or if capacity criteria are met, the current incremental list can be flushed to the base repository. Multiple LSNs can be used, each indicating multiple storage states simultaneously.

[0071] Data in a storage system can change continuously due to multiple operations. These operations can include, for example, append / delete, scan, flush, compaction, and / or garbage collection (GC). During an append / delete operation, data can be added to the insert and delete lists of the incremental repository within memory. During a scan operation, current data can be read from the incremental and basic repositories relative to the read LSN. During a flush operation, data can be flushed from memory to disk. For example, current data can be moved from the incremental repository to the basic repository. During a compaction operation, basic repository data blocks can be reorganized. During a garbage collection operation, old and unused data can be removed from the incremental and / or basic repositories.

[0072] Multiple LSNs can be applied to indicate the different states of these operations. These LSNs allow data scanning, updating, and maintenance operations to work together concurrently and consistently without interfering with each other. Multiple LSNs can include the read LSN for each query to indicate a data snapshot. This LSN ensures that scans will retrieve data across all partitions up to the read LSN, assembling the final global result for the data up to the read LSN. Multiple LSNs can additionally include one or more of the following: an LSN to preserve the minimum read LSN of the active query, an LSN to preserve the end point of the previous refresh, and an LSN to indicate the end point of the next refresh. These LSNs guarantee that scans on each partition will run concurrently with refresh operations and still produce consistent results regarding the read LSNs of the queries. Multiple LSNs can additionally include one or more of the following: an LSN to calculate the point where incremental repository data can be safely truncated and an LSN to indicate which basic repository blocks can be safely truncated. These LSNs ensure that maintenance operations (such as GC) will run concurrently with scans.

[0073] For example, on the incremental repository, the following LSNs can be utilized: min_read_LSN, last_flush_LSN, next_flush_LSN, and / or delta_truncate_LSN. min_read_LSN indicates the minimum read LSN for active queries. All active read LSNs from each query are monitored / tracked, allowing the minimum of all read_LSNs to be retrieved efficiently to determine how much data can be flushed. To ensure that flush operations do not affect the current data scan, only data with LSNs less than min_read_LSN can be flushed from the incremental repository to the base repository. last_flush_LSN indicates the maximum LSN from the previous flush. After the next flush, it can be reset to next_flush_LSN. A scan can start from last_flush_LSN up to its read LSN, plus all flushed data in the base repository. While the last_flush_LSN of the incremental repository changes with each flush, a snapshot of last_flush_LSN (where the scanned data starts) for each scan remains unchanged. `next_flush_LSN` indicates the maximum flush LSN for the next flush. You can choose an LSN between `min_read_LSN` and `last_flush_LSN` as `next_flush_LSN`, indicating that rows in [`last_flush_LSN`, `next_flush_LSN`] will be flushed. `delta_truncate_LSN` indicates that data with an LSN less than `delta_truncate_LSN` can be truncated in a background GC thread, where `delta_truncate_LSN = MIN(last_flush_LSN of all real-time queries, last_flush_LSN of the incremental repository)`.

[0074] An example of these concurrent operations is... Figure 5 As shown in the image. Figure 5An exemplary snapshot 500 depicts the incremental and base repositories. Snapshot 500 indicates that the previous flush of the incremental repository moved data associated with LSN < 100 to the base repository. Therefore, last_flush_LSN is 100. Each active query holds a read LSN, which will read data from last_flush_LSN to its read_LSN on the incremental repository, as well as data in the base repository up to last_flush_LSN. For example, query 1 with LSN 110 will read [0, 100) from the base repository and [100, 110] from the incremental repository. Query 2 will also read [0, 100) from the base repository and [100, 111] from the incremental repository. Since the minimum read LSN of all active queries is min_read_LSN, 110 (from query 1) is min_read_LSN. This min_read_LSN is continuously updated as old queries complete and new queries arrive. If query 1 completes before query 2, then min_read_LSN will be updated to 111.

[0075] The next refresh can be triggered based on time or incremental repository capacity criteria. Figure 6A Figure B shows example snapshots 600 and 601 of the incremental and base repositories. Snapshot 600 depicts the incremental and base repositories before the next refresh, and snapshot 601 depicts the incremental and base repositories after the next refresh. To determine how much data to refresh in the next refresh, a next_flush_LSN less than the current min_read_LSN is chosen. For example, next_flush_LSN could be 110, which is less than the current min_read_LSN 111. The incremental repository is then refreshed with data between [last_flush_LSN, next_flush_LSN) (i.e., [100, 110)). At the point where the refresh is complete, last_flush_LSN is updated to next_flush_LSN. The refresh does not remove any data from the incremental repository, and active queries that read data from their own last_flush_LSN (a snapshot for each query) do not update this LSN with the refresh. After the refresh is complete, newly arriving queries will begin using the new last_flush_LSN (110). For example, query 4 will start using the new last_flush_LSN(110). In this way, flushing can work concurrently with scanning without affecting the query results.

[0076] During garbage collection operations, old and unused data can be removed from the incremental repository and / or the basic repository. Figure 7AFigure B illustrates example snapshots 700 and 701 of the incremental and basic repositories during a garbage collection (GC) operation. Snapshot 700 depicts the incremental and basic repositories before the GC operation, and snapshot 701 depicts the incremental and basic repositories after the GC operation. The GC operation on the incremental repository will select delta_truncate_LSN as the MIN (last_flush_LSN of all live queries, last_flush_LSN of the incremental repository), which is 110 in this example. This means that all live and future queries do not need to scan the incremental repository before LSN 110, and that data can be safely removed from the incremental repository concurrently with live scans without affecting the scan results of any queries.

[0077] For the base repository, there are two possible implementation options. The first is a base repository with versioned data. Each row in the base repository will retain its LSN. Scanning the base repository can use reading the LSN, and it's possible to enable time travel for the scan. The second alternative implementation is a base repository without explicit versions. Scanning always reads all data in the base repository. A non-versioned base repository can be more efficient in terms of scanning and deletion, and can save storage space.

[0078] Similar to the incremental repository, the base repository can also have maintenance operations such as compaction and garbage collection (GC). These operations work on the base repository data blocks and therefore do not affect the incremental repository. To enable the scanning and maintenance operations to work in parallel, at the start of a scan, the query takes a snapshot of all live data blocks from that base repository. Old blocks, if removed, are made invisible at the end point and marked with `base_truncate_LSN`, which is the maximum `read_LSN` of the live query at the end point. Those old blocks in the GC thread can be removed after all active queries with `read_LSN <= base_truncate_LSN` have completed. In summary, we use multiple concurrent operations, such as data insertion, query scans, flushing, compaction, and GC, on both our incremental and base repositories, and leverage LSNs to ensure these operations can work concurrently while still delivering consistent results to query clients.

[0079] The above methods ensure consistency during read, write, and maintenance operations. Additionally, a centralized metadata service ensures consistent metadata and Data Definition Language (DDL) operations. The HTAP system described herein handles distributed DDL operations by using a metadata service with storage schema version LSNs and distributing them to all storage nodes. The metadata service can be used to store storage schema version LSNs and distribute them to all storage nodes. For example, a centralized metadata service would store and persist HTAP tables and partition schemas, along with the latest DDL LSN for each table to represent its metadata version. On each HTAP storage server, managed partitions will also maintain their own metadata version.

[0080] DDL updates can be accomplished through the following steps. DDL operations from OLTP (such as table creation or schema updates) can issue a DDL update log with LSNs to the HTAP storage device. The metadata service can periodically pull DDL information (e.g., the latest DDL LSN) from OLTP. If changes exist, the metadata service can first persist the latest LSN as a metadata version, and then it can use that LSN to pull the latest catalog information, such as the schema, from the OLTP catalog. After updating the catalog information, the metadata service can update the corresponding entries in the HTAP partition information and statistics tables in storage and persistence. Subsequently, the DDL log can be broadcast to all partitions and applied by the log applicator when it reaches the incremental repository. For partitions of tables(multiple) affected by the DDL, their schemas can be modified accordingly. For other partitions, they can simply update their metadata version (updating to the DDL LSN).

[0081] During the compilation of a scan query, the query in the centralized coordinator is able to obtain metadata information consistent with read_LSN. Therefore, the actual scan task on each partition can compare its read_LSN with the metadata version LSN of that partition to determine whether the current metadata version of that partition is consistent with the read LSN of the query.

[0082] For example, if the query's read_LSN is 1000 and the metadata LSN is 900, this means the query was compiled using the latest schema and has good reputation. On the other hand, if the metadata is updated and promoted to 1100 after the query is compiled, the partition version LSN is updated accordingly before the scan arrives. In this case, read_LSN can be 1000, while metadata is 1100, indicating that the partition's schema has been updated. The scan can be recompiled with the updated read_LSN to ensure that the scan always reads data in a schema consistent with the schema at the time of compilation.

[0083] The HTAP system described in this paper provides: a scanning function to generate globally consistent scan results, which is enabled by a centralized compilation service (coordinator) to retrieve read LSNs from a metadata service; a distributed query execution environment that initiates scan tasks to retrieve data related to read LSNs on partitions; and a storage system capable of supporting concurrent insert, update, delete, refresh, compaction, and GC operations while still providing scan results consistent with read LSNs.

[0084] A table scan can be distributed as a scan task per partition, using the same read LSN across all tasks. The steps can be as follows to ensure the scan is an aggregation of subtask results. Each task can retrieve the exact data for a partition up to the query-specific read LSN. First, after retrieving the DDL information and read LSN from the metadata service, the query can be compiled in the coordinator. Then, the query can create distributed scan tasks, each reading one or more partitions.

[0085] For example, if the read LSN is 1000, then each query subtask can initiate a read client. The client can initiate a round trip to collect LSNs from replicas of that partition, and then select a replica with an LSN greater than or equal to the read LSN or the largest LSN. This can follow the quorum protocol discussed above. For example, if three replicas hold LSNs of 1100, 1100, and 900, then either of the first two replicas can be selected. If three replicas hold LSNs of 900, 900, and 800, meaning the latest update is not on a replica of this partition, then we select the largest replica, i.e., 900. If a node fails and only two replicas return results, then we will select the replica with the larger LSN.

[0086] Clients can send scan requests to the server managing the desired replicas. Scanning on the server can begin until all logs in the partition with a value less than or equal to Read_LSN (1000) have been applied to the incremental repository in storage, even if this requires a short wait. The Gossip protocol can be invoked to fill "holes" in the LSN. The scan can check its read_LSN against the partition's metadata version LSN. If the metadata LSN is greater than read_LSN, the query can fail and be retried. If the metadata version is 1100, this indicates that the query was compiled using an older version of the metadata and the query may need to be recompiled.

[0087] The scan can read the incremental repository from the last_flush_LSN snapshot down to its read LSN (1000). It may be necessary to examine the deletion hash to filter out subsequently deleted rows so they shouldn't appear in the snapshot. For example, if a row has an insert LSN of 800 and a delete LSN of 900, both < 1000, so that row may not appear in the results.

[0088] The scan can read refreshed data from the base repository. Predicate filters and projection columns can be applied to reduce the amount of data required for the slave disk scan. Additionally, the delete bitmap within the base repository can be examined to exclude deleted rows. The results of the base repository scan also need to exclude deleted rows from the delete graph of the incremental repository based on row primary key matching and whether the read_LSN of the scanned row is greater than or equal to the delete_LSN of the deleted row. For example, if a row's delete LSN is 900 (which is less than 1000), then that row from the base repository can be excluded. The combined scan results can be returned to the client. If the returned data volume is large, multiple round trips of continuous scans can be used.

[0089] While the techniques described above provide strong data consistency by default for mixed transactional and analytical data processing, they can additionally or alternatively offer flexibility to users who wish to adjust the consistency level based on each use case. If QUERY_DEFAULT is selected, the read LSN is the checkpoint_LSN from the log scheduler. Any LSN preceding this LSN is guaranteed to be received and persisted by the HTAP storage device. The metadata server can periodically retrieve the checkpoint_LSN from the log scheduler and cache it. The scan task will theoretically experience latency if the log up to the checkpoint_LSN has been persisted but not yet applied to the incremental repository in storage. This can be the default mode and the most common use case for users who want strong consistency with minimal wait time.

[0090] If QUERY_LATEST is selected, the LSN read is the most recently committed LSN in the OLTP log repository. Compared to QUERY_DEFAULT, this mode offers strong consistency and better data freshness, but it comes with additional latency because the OLAP query coordinator must wait until checkpoint_LSN becomes greater than or equal to that LSN before executing the query to ensure the HTAP repository has the logs up to that LSN. The theoretical latency in QUERY_DEFAULT also applies to this mode.

[0091] If QUERY_DIRTY_READ is selected, users can choose to relax data consistency requirements and perform reads on the latest available data on each partition. No read LSN is used during the scan, so there is no waiting time during the scan. Scans in this mode will experience best-effort data freshness and latency; however, data may be read early or late without consistency guarantees.

[0092] If READ_YOUR_WRITE is selected, the commit LSN of each transaction can be propagated back to the client session that issued them. The client session can then use these LSNs as the read_LSN for its subsequent OLAP queries. This provides stronger data consistency and freshness for individual clients than other modes, but introduces additional latency because the OLAP query coordinator must wait until the checkpoint_LSN becomes greater than or equal to that LSN before executing the query to ensure that the HTAP repository already has the logs persistently saved to that LSN. The theoretical latency in QUERY_DEFAULT also applies to this mode.

[0093] Figure 8 The diagram illustrates what can be achieved by an HTAP system (e.g., such as...) Figure 3 The example process 800 performed by the HTAP system 300 shown is illustrated. The HTAP system can execute process 800 to ensure strong data consistency within the HTAP system. Although in Figure 8 The operations are depicted as a sequence of operations, but those skilled in the art will recognize that various embodiments may add, remove, reorder, or modify the depicted operations.

[0094] Data can be captured in real-time (or near real-time) by a first computing system. The first computing system can be, for example, an OLTP engine (e.g., OLTP engine 304). At 802, logical logs and associated log sequence numbers (LSNs) can be generated based on the data captured by the first processing engine. An LSN is an incrementing integer indicating the storage state. For example, an LSN associated with a specific logical log can indicate the storage state of that logical log.

[0095] At position 804, the logical log and LSN can be propagated to the storage subsystem. The storage subsystem can be configured to communicate with both the first and second processing engines. The second processing engine can, for example, be configured to perform Online Analytical Processing (OLAP). The architecture of the HTAP system described in this paper can provide fresh / real-time data changes for OLAP queries. The logical log of DML operations can be propagated from the OLTP component to the OLAP component immediately after commit. These logs can be dispatched to distributed partitions and continuously applied to the in-memory incremental repository via in-memory operations, which is typically very fast. The data changes carried by the logical log are immediately available for OLAP queries after being applied to the in-memory incremental repository. The architecture of the HTAP system can leverage unified versioning across the HTAP system to ensure strong data consistency. The OLTP component of the HTAP system can additionally support snapshot isolation and other (weaker) consistency models.

[0096] At point 806, LSNs and information indicating LSN schema versions can be stored and distributed by the metadata service. The metadata service can store the progress of persistently stored LSNs on the storage server and use it as the `read_LSN` for OLAP queries. The metadata service can also store schema LSNs to support DDL. The first processing engine, second processing engine, storage subsystem, and metadata service can each be modularized and can each support LSN mechanisms for maintaining data consistency. For example, each of the first processing engine, second processing engine, storage subsystem, and metadata service can be configured to be decoupled from the larger HTAP system and replaced with alternative components configured to extend LSN support through a public API.

[0097] Figure 9 The diagram illustrates what can be achieved by an HTAP system (e.g., such as...) Figure 3 The example process 900 performed by the HTAP system 300 shown is illustrated. The HTAP system can execute process 900 to ensure strong data consistency within the HTAP system. Although in Figure 9 The operations are depicted as a sequence of operations, but those skilled in the art will recognize that various embodiments may add, remove, reorder, or modify the depicted operations.

[0098] As discussed above, to ensure strong data consistency in HTAP systems, the Quorum and / or Gossip protocols can be used to provide global snapshot isolation across OLTP and OLAP engines. Leveraging the Quorum protocol and the Gossip framework extends single-partition consistent reads and writes to a globally distributed cluster. Data in replicated partitions can be stored and / or accessed across multiple data nodes in the cluster. For a given partition, multiple replicas can exist on different nodes, thus preserving identical data for fault tolerance.

[0099] At position 902, the Quorum protocol can be applied to replicas of the logical log in each partition of the storage subsystem. The Quorum protocol can be applied to ensure data consistency between these replicas during read and write processes. Quorum is a type of technology that can be used to perform constant functions or operations in a "distributed system." It uses distributed transactions, obtaining the least-vote agreement from the distributed system. Leveraging Quorum-based data consistency, read and write requests can always be serviced by multiple healthy nodes (e.g., two or more nodes in a cluster), and data across the cluster can remain in a consistent state. Any failed, corrupted, or slow replicas will not expose any data correctness issues to external clients, such as stale, dirty, or corrupted data.

[0100] For example, the above about Figure 4 The described architecture 400 utilizes a 2+2 Quorum configuration. Each partition is served by three replicas / nodes (nodes 402, 404, and 406). The framework 400 sends write requests to all three replicas / nodes 402, 404, and 406; a write operation succeeds only after two of the three replicas / nodes have succeeded. Figure 4 As shown, write requests sent to nodes 402 and 404 succeeded, while a write request sent to node 406 failed. However, since the writes succeeded for two of these nodes, the writes are still considered successful. During a read, frame 400 can query at least two nodes to request data status and select the node with data that satisfies the read consistency requirements. Because there are at least two nodes that keep the latest writes, we can tolerate the failure of one node while still ensuring successful and consistent reads. Figure 4 As shown, node 406 failed, but other nodes 402 and 404 succeeded. Therefore, despite the failure of node 406, the reads were still successful and consistent.

[0101] Besides the Quorum algorithm, the Gossip protocol can also be used to remedy any temporary inconsistencies between replicas. At position 904, the Gossip protocol can be applied to replicas of the logical log in each partition of the storage subsystem. The Gossip protocol is a program or process of peer-to-peer communication between computers based on the propagation of epidemics. Some distributed systems use peer-to-peer Gossip to ensure that data is distributed to all members of the group. For example, if any replica falls behind other replicas due to an unexpected failure such as disk or network failure, then using the Gossip protocol, the replica will catch up with its peers during message exchange between nodes.

[0102] Quorum and Gossip protocols can be used to maintain data consistency across distributed partitions in a cluster. Furthermore, an LSN mechanism can be used to ensure local consistency of data reads and writes on each partition. When a query is compiled in the coordinator, a snapshot of the query's data can be calculated as a global LSN. This global LSN can be distributed to each node that maintains the desired partition, and the local data portion of the snapshot LSN can be retrieved. Therefore, even if data is updated immediately after the compilation point in the coordinator, no more or less data will be accessed on each node.

[0103] On a single partition, a Log Sequence Number (LSN) can be used. An LSN is an incrementing integer indicating the local storage state. In 906, the read LSN of a query can be retrieved from the metadata service by a centralized compilation service. Each partition maintains its own local LSN, and the LSN can also represent a global and logical snapshot of the storage state across all partitions. For example, when a storage device has LSN 100, and after a new insertion of data arrives, the storage device LSN will move to 101.

[0104] The coordinator can first assign a global read_LSN from the metadata service to the query. At 908, the read LSN can be assigned to the query. The global read_LSN can be an LSN that all storage partitions have been holding at compiling time. This global read_LSN can then be sent to each storage server to determine how much data should be read locally on each partition. For example, global read LSN 100 will retrieve data exactly on each partition up to LSN 100.

[0105] At point 910, a scan task can be initiated to retrieve data from partitions of the storage subsystem based on the read LSN. For example, an OLAP engine can coordinate OLAP queries by requesting a global read LSN from the metadata service and initiating a distributed scan task to use these read LSN request data. The read LSN ensures that the scan will retrieve data across all partitions up to the read LSN, assembling the final global result for the data up to the read LSN. Multiple LSNs may additionally include one or more of the following: an LSN for preserving the minimum read LSN for active queries, an LSN for preserving the end point of the previous refresh, and an LSN for indicating the end point of the next refresh. These LSNs guarantee that scans on each partition will be performed concurrently with refresh operations and will still produce results consistent with the read LSNs of the queries. Multiple LSNs may additionally include one or more of the following: an LSN for calculating the point where incremental repository data can be safely truncated and an LSN for indicating which basic repository blocks can be safely truncated. These LSNs further ensure that maintenance operations (such as GC) will be performed concurrently with the scans.

[0106] Figure 10 The diagram illustrates what can be achieved by an HTAP system (e.g., such as...) Figure 3 The example process 1000 executed by the HTAP system 300 shown is illustrated. The HTAP system can execute process 1000 to ensure strong data consistency within the HTAP system. Although in Figure 10 The operations are depicted as a sequence of operations, but those skilled in the art will recognize that various embodiments may add, remove, reorder, or modify the depicted operations.

[0107] Data can be captured in real time (or near real time) by a first computing system. The first computing system can be, for example, an OLTP engine (e.g., OLTP engine 304). At 1002, logical logs and log sequence numbers (LSNs) associated with the logical logs can be generated based on the data captured by the first processing engine.

[0108] At position 1004, logical logs and LSNs can be propagated to the storage subsystem. An LSN is an incrementing integer indicating the storage state. For example, the LSN associated with a specific logical log can indicate the storage state of that logical log. The storage subsystem can be configured to communicate with a first processing engine and a second processing engine. The second processing engine can, for example, be configured to perform online analytical processing (OLAP).

[0109] Logical logs and LSNs can be propagated to the storage subsystem. The storage subsystem can be configured to communicate with both a first and a second processing engine. The second processing engine can, for example, be configured to perform Online Analytical Processing (OLAP). The architecture of the HTAP system described in this paper can provide fresh / real-time data changes for OLAP queries. Logical logs of DML operations can be propagated from the OLTP component to the OLAP component immediately after commit. These logs can be dispatched to distributed partitions and continuously applied to the in-memory incremental repository via in-memory operations, which is typically very fast. Data changes carried by the logical logs are immediately available for OLAP queries after being applied to the in-memory incremental repository. The architecture of the HTAP system can leverage unified versioning across the HTAP system to ensure strong data consistency. The OLTP component of the HTAP system can additionally support snapshot isolation and other (weaker) consistency models.

[0110] At position 1006, the LSN and information indicating the LSN schema version can be stored by the metadata service. The metadata service can store the progress of the persistently saved LSN on the storage server and use it as the read_LSN for OLAP queries. The metadata service can also store schema LSNs to support DDL.

[0111] The LSN mechanism can be used to ensure local consistency of data reads and writes on each partition. When a query is compiled in the coordinator, a snapshot of the data used for the query can be calculated in the form of a global LSN. The global LSN can be distributed to each node that maintains the desired partition, and the local data portion of the query snapshot LSN can be retrieved. Therefore, even if the data is updated immediately after the compilation point in the coordinator, no more or less data will be accessed on each node.

[0112] On a single partition, a Log Sequence Number (LSN) can be used. An LSN is an incrementing integer indicating the local storage state. The read LSN used for queries can be retrieved from the metadata service by a centralized compilation service. Each partition maintains its own local LSN, and the LSN can also represent a global and logical snapshot of the storage state across all partitions. For example, when a storage device has LSN 100, and after a new insertion of data arrives, the storage device LSN will move to 101.

[0113] The coordinator can first assign a global read_LSN from the metadata service to the query. At 1008, the read LSN can be assigned to the query. The global read_LSN can be an LSN that all storage partitions have been saving at compiling the query. This global read_LSN can then be sent to each storage server to determine how much data should be read locally on each partition. For example, global read LSN 100 will retrieve data on exactly each partition up to LSN 100.

[0114] Data in a storage system can change continuously due to multiple operations. These operations can include, for example, append / delete, scan, flush, compaction, and / or garbage collection (GC). During an append / delete operation, data can be added to the insert and delete lists of the incremental repository within memory. During a scan operation, current data can be read from the incremental and basic repositories relative to the read LSN. During a flush operation, data can be flushed from memory to disk. For example, current data can be moved from the incremental repository to the basic repository. During a compaction operation, basic repository data blocks can be reorganized. During a garbage collection operation, old and unused data can be removed from the incremental and / or basic repositories.

[0115] Multiple LSNs can be applied to indicate the different states of these operations. At 1010, multiple LSNs indicating the states of multiple operations can be applied, including data scanning, data insertion, data deletion, data flushing, compaction, and garbage collection. Using these LSNs, data scanning, updating, and maintenance operations can work together concurrently and consistently without interfering with each other. Multiple LSNs can include the read LSN for each query to indicate a data snapshot. This LSN ensures that scans will retrieve data across all partitions up to the read LSN, assembling the final global result for the data up to the read LSN. Multiple LSNs can additionally include one or more of the following: an LSN for preserving the minimum read LSN of the active query, an LSN for preserving the end point of the previous flush, and an LSN for indicating the end point of the next flush. These LSNs guarantee that scans on each partition will be performed concurrently with flush operations and still produce consistent results regarding the read LSNs of the queries. Multiple LSNs can additionally include one or more of the following: an LSN for calculating the point where incremental repository data can be safely truncated and an LSN for indicating which basic repository blocks can be safely truncated. These LSNs ensure that maintenance operations (such as GC) will be performed concurrently with scans.

[0116] Additional or alternative flexibility can be provided to users who wish to adjust the consistency level based on each use case. At 1012, multiple modes can be provided. These modes can be selected by the user. Modes may include, for example, QUERY_DEFAULT, QUERY_LATEST, QUERY_DIRTY_READ, READ_YOUR_WRITE, and / or any other mode through which the user can select the desired consistency level.

[0117] If QUERY_DEFAULT is selected, the read LSN is the checkpoint_LSN from the log scheduler. Any LSNs preceding this LSN are guaranteed to be received and persisted by the HTAP storage device. The metadata server can periodically retrieve the checkpoint_LSN from the log scheduler and cache it. If the log up to the checkpoint_LSN has been persisted but not yet applied to the incremental repository in storage, the scan task will theoretically experience latency. This can be the default mode and the most common use case for users who want strong consistency with minimal wait time.

[0118] If QUERY_LATEST is selected, the LSN read is the most recently committed LSN in the OLTP log repository. Compared to QUERY_DEFAULT, this mode offers strong consistency and better data freshness, but it comes with additional latency because the OLAP query coordinator must wait until checkpoint_LSN becomes greater than or equal to that LSN before executing the query to ensure the HTAP repository has logs up to that LSN. The theoretical latency in QUERY_DEFAULT also applies to this mode.

[0119] If QUERY_DIRTY_READ is selected, users can choose to relax data consistency requirements and perform reads on the latest available data on each partition. No read LSN is used during the scan, so there is no waiting time during the scan. Scans in this mode will experience best-effort data freshness and latency; however, early or late data may be read in this mode without consistency guarantees.

[0120] If READ_YOUR_WRITE is selected, the commit LSN of each transaction can be propagated back to the client session that issued them. The client session can then use these LSNs as the read_LSN for its subsequent OLAP queries. This provides stronger data consistency and freshness for individual clients than other modes, but introduces additional latency because the OLAP query coordinator must wait until the checkpoint_LSN becomes greater than or equal to that LSN before executing the query to ensure that the HTAP repository has a log persistently saved up to that LSN. The theoretical latency in QUERY_DEFAULT also applies to this mode.

[0121] Figure 11 The illustration shows computing devices that can be used in various aspects, such as Figure 1 The services, networks, modules, and / or devices described herein. About Figure 1 In this example architecture, cloud network 102, network 120, client devices 104a-d, service 112, HTAP system 110, and / or node 118 can each operate independently. Figure 11 This is achieved through one or more instances of the computing device 1100. Figure 11 The computer architecture shown illustrates a conventional server computer, workstation, desktop computer, laptop computer, tablet computer, network appliance, PDA, e-reader, digital cellular phone, or other computing node, and can be used to perform any aspect of the computer described herein, such as for implementing the methods described herein.

[0122] The computing device 1100 may include a substrate or “motherboard,” which is a printed circuit board to which multiple components or devices may be connected via a system bus or other electrical communication paths. One or more central processing units (CPUs) 1104 may operate in conjunction with a chipset 1106. The CPUs 1104 may be standard programmable processors that perform the arithmetic and logic operations required to perform the operations of the computing device 1100.

[0123] Multiple CPUs 1104 can perform necessary operations by manipulating switching elements to transition from one discrete physical state to the next, these switching elements distinguishing and changing these states. Switching elements typically include electronic circuitry, such as flip-flops, that maintains one of two binary states, and electronic circuitry, such as logic gates, that provides an output state based on a logical combination of the states of one or more other switching elements. These basic switching elements can be combined to create more complex logic circuits, including registers, adder-subtractor units, arithmetic logic units, floating-point units, etc.

[0124] The (multiple) CPUs 1104 can be enhanced or replaced by other processing units such as (multiple) GPUs 1105. The (multiple) GPUs 1105 may include processing units specifically designed for, but not necessarily limited to, highly parallel computing (such as graphics and other visualization-related processing).

[0125] Chipset 1106 provides an interface between CPU(s) 1104 and the remaining components and devices on the substrate. Chipset 1106 provides an interface to random access memory (RAM) 1108, which serves as the main memory in computing device 1100. Chipset 1106 may also provide an interface to computer-readable storage media, such as read-only memory (ROM) 1120 or non-volatile RAM (NVRAM) (not shown), for storing basic routines that help boot computing device 1100 and transfer information between various components and devices. ROM 1120 or NVRAM may also store other software components required for the operation of computing device 1100 according to the aspects described herein.

[0126] Computing device 1100 can operate in a networked environment using a logical connection via a local area network (LAN) to remote computing nodes and computer systems. Chipset 1106 may include functionality for providing network connectivity via a network interface controller (NIC) 1122, such as a Gigabit Ethernet adapter. NIC 1122 may be able to connect computing device 1100 to other computing nodes via network 1116. It should be understood that multiple NICs 1122 may exist in computing device 1100, connecting the computing device to other types of networks and remote computer systems.

[0127] Computing device 1100 can be connected to mass storage device 1128, which provides non-volatile storage for the computer. Mass storage device 1128 can store system programs, application programs, other program modules, and data, which have been described in more detail herein. Mass storage device 1128 can be connected to computing device 1100 via storage controller 1124, which is connected to chipset 1106. Mass storage device 1128 can consist of one or more physical storage units. Mass storage device 1128 may include management unit 1010. Storage controller 1124 can interface with physical storage units via a Serial Amount SCSI (SAS) interface, a Serial Advanced Technology (SATA) interface, a Fibre Channel (FC) interface, or other types of interfaces used for physical connection and data transfer between the computer and physical storage units.

[0128] The computing device 1100 can store data on the mass storage device 1128 by changing the physical state of the physical storage units to reflect the information being stored. The specific changes in the physical state can depend on various factors and the different implementations described herein. Examples of these factors may include, but are not limited to, the technology used to implement the physical storage units and whether the mass storage device 1128 is characterized as a primary storage device or a secondary storage device.

[0129] For example, computing device 1100 can store information in mass storage device 1128 by issuing instructions via storage controller 1124 to change the magnetic properties of a specific location within a disk drive unit, the reflection or refraction properties of a specific location in an optical storage unit, or the electrical properties of a specific capacitor, transistor, or other discrete component in a solid-state storage unit. Other transformations of the physical medium are possible without departing from the scope and spirit of this description; the foregoing examples are provided merely for the purpose of illustration. Computing device 1100 can also read information from mass storage device 1128 by detecting the physical state or characteristics of one or more specific locations within a physical storage unit.

[0130] In addition to the aforementioned high-capacity storage device 1128, the computing device 1100 may also access other computer-readable storage media to store and retrieve information, such as program modules, data structures, or other data. Those skilled in the art will recognize that the computer-readable storage medium can be any available medium that provides storage for non-transitory data and can be accessed by the computing device 1100.

[0131] By way of example and not limitation, computer-readable storage media may include volatile and non-volatile, transient and non-transitory computer-readable storage media implemented in any method or technology, as well as removable and non-removable media. Computer-readable storage media include, but are not limited to, RAM, ROM, erasable programmable ROM (“EPROM”), electrically erasable programmable ROM (“EEPROM”), flash memory or other solid-state storage technologies, optical disc ROM (“CD-ROM”), digital multifunction disc (“DVD”), high-definition DVD (“HD-DVD”), BLU-RAY or other optical storage devices, magnetic cartridges, magnetic tapes, disk storage devices, other magnetic storage devices, or any other medium that can be used to store desired information in a non-transitory manner.

[0132] Such as Figure 11Mass storage devices such as the mass storage device 1128 depicted herein can store an operating system used to control the operation of the computing device 1100. The operating system may include a version of the Linux operating system. The operating system may include a version of the Windows Server operating system from Microsoft. Depending on another aspect, the operating system may include a version of the UNIX operating system. Various mobile phone operating systems, such as iOS and Android, may also be utilized. It should be recognized that other operating systems may also be used. The mass storage device 1128 may store other systems, applications, and data used by the computing device 1100.

[0133] Mass storage device 1128 or other computer-readable storage media may also be encoded with computer-executable instructions that, when loaded into computing device 1100, transform the computing device from a general-purpose computing system into a special-purpose computer capable of implementing the aspects described herein. These computer-executable instructions transform computing device 1100 by specifying how CPU(s)1104 transition between states, as described above. Computing device 1100 can access computer-readable storage media storing computer-executable instructions that, when executed by computing device 1100, can perform the methods described herein.

[0134] Such as Figure 11 Computing devices such as the computing device 1100 depicted may also include an input / output controller 1132 for receiving and processing input from multiple input devices such as a keyboard, mouse, touchpad, touchscreen, electronic stylus, or other types of input devices. Similarly, the input / output controller 1132 may provide output to a display such as a computer monitor, flat panel display, digital projector, printer, plotter, or other types of output devices. It will be appreciated that computing device 1100 may not include... Figure 11 All components shown may include Figure 11 Other components not explicitly shown in the document, or those that can be used with Figure 11 The system architecture shown is completely different from the system architecture shown.

[0135] As described herein, a computing device can be a physical computing device, such as... Figure 11 The computing device 1100. A computing node may also include virtual machine host processes and one or more virtual machine instances. Computer-executable instructions may be indirectly executed by the physical hardware of the computing device by interpreting and / or executing instructions stored and executed in the virtual machine environment.

[0136] It should be understood that the methods and systems are not limited to any particular method, specific components, or particular implementation. It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.

[0137] As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural indicators unless the context clearly indicates otherwise. A range may be expressed herein as from “about” one particular value and / or to “about” another particular value. When such a range is expressed, another embodiment includes from one particular value and / or to another particular value. Similarly, when a value is expressed as an approximation using the antecedent “about,” it will be understood that the particular value forms another embodiment. It should also be understood that the endpoints of each range are significant both relative to and independent of the other endpoint.

[0138] "Optional" or "optionally" means that the event or situation described below may or may not occur, and the description includes instances where the event or situation occurs and instances where it does not occur.

[0139] Throughout the description and claims of this specification, the word "comprise" and variations thereof, such as "comprising" and "comprises," mean "including but not limited to" and are not intended to exclude, for example, other components, integers, or steps. "Exemplary" means "an example of..." and is not intended to convey indications of preferred or ideal embodiments. "Like" is not used in a limiting sense but for interpretive purposes.

[0140] Components that can be used to perform the described methods and systems are described. When describing combinations, subsets, interactions, groupings, etc., of these components, it should be understood that while specific references to each of the various individual and collective combinations and permutations of these may not be explicitly described, each is specifically considered and described herein for all methods and systems. This applies to all aspects of this application, including but not limited to operations in the described methods. Therefore, if multiple additional operations are available, it should be understood that each of these additional operations can be performed using any specific embodiment or combination of embodiments of the described methods.

[0141] The method and system can be more easily understood by referring to the following detailed description of preferred embodiments and examples included therein, as well as the figures and their descriptions.

[0142] As those skilled in the art will recognize, the methods and systems may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the methods and systems may take the form of a computer program product on a computer-readable storage medium having computer-readable program instructions (e.g., computer software) implemented therein. More particularly, the methods and systems may take the form of computer software implemented on the web. Any suitable computer-readable storage medium may be utilized, including hard disks, CD-ROMs, optical storage devices, or magnetic storage devices.

[0143] Embodiments of methods and systems are described below with reference to block diagrams and flowcharts of methods, systems, apparatuses, and computer program products. It should be understood that each block in the block diagrams and flowcharts, as well as combinations of blocks in the block diagrams and flowcharts, can be implemented by computer program instructions. These computer program instructions can be loaded onto a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute on the computer or other programmable data processing apparatus, create components for implementing the functions specified in one or more flowchart blocks.

[0144] These computer program instructions may also be stored in a computer-readable storage medium, which may direct a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of writing comprising computer-readable instructions for implementing the functions specified in one or more flowchart blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus, thereby producing a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowchart blocks.

[0145] The various features and processes described above can be used independently of each other or combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. Additionally, certain method or process blocks may be omitted in some embodiments. The methods and processes described herein are not limited to any particular order, and the blocks or states associated with them may be performed in other suitable orders. For example, the described blocks or states may be performed in a different order than specifically described, or multiple blocks or states may be combined in a single block or state. Example blocks or states may be performed serially, in parallel, or in some other manner. Blocks or states may be added to or removed from the described example embodiments. The example systems and components described herein may be configured differently from those described. For example, elements may be added, removed, or rearranged compared to the described example embodiments.

[0146] It should also be recognized that various items are shown to be stored in memory or on storage devices during use, and these items, or portions thereof, may be transferred between memory and other storage devices for memory management and data integrity purposes. Alternatively, in other embodiments, some or all of the software modules and / or system may be executed in memory on another device and communicate with the illustrated computing system via inter-computer communication. Furthermore, in some embodiments, some or all of the system and / or modules may be implemented or provided in other ways, such as at least in part as firmware and / or hardware, including, but not limited to, one or more application-specific integrated circuits (“ASICs”), standard integrated circuits, controllers (e.g., by executing appropriate instructions, and including microcontrollers and / or embedded controllers), field-programmable gate arrays (“FPGAs”), complex programmable logic devices (“CPLDs”), etc. Some or all of the modules, systems, and data structures may also be stored (e.g., as software instructions or structured data) on computer-readable media, such as hard disks, memory, networks, or portable media articles read by appropriate devices or via appropriate connections. The system, modules, and data structures can also be transmitted as generated data signals (e.g., as part of a carrier wave or other analog or digital propagation signal) over various computer-readable transmission media (including wireless-based and wired / cable-based media), and can take various forms (e.g., as part of a single or multiplexed analog signal, or as multiple discrete digital data packets or frames). In other embodiments, such computer program products can also take other forms. Therefore, the invention can be practiced with other computer system configurations.

[0147] While methods and systems have been described in conjunction with preferred embodiments and specific examples, they are not intended to limit the scope to the particular embodiments illustrated, as the embodiments herein are intended in all respects to be illustrative rather than restrictive.

[0148] Unless otherwise expressly stated, no method described herein is intended to be construed as requiring its operations to be performed in a particular order. Therefore, where a method claim does not actually describe the order in which its operations are followed, or where the claims or description do not otherwise specify that the operations will be limited to a particular order, it is by no means that the order can be inferred from any aspect. This applies to any possible non-explicit basis of interpretation, including: logical matters relating to the arrangement of steps or operational flows; the simple meaning derived from grammatical organization or punctuation; and the number or type of embodiments described in the specification.

[0149] It will be apparent to those skilled in the art that various modifications and variations can be made without departing from the scope or spirit of this disclosure. Other embodiments will become apparent to those skilled in the art upon consideration of the specification and practice described herein. The specification and example figures are to be considered exemplary only, and the true scope and spirit are indicated by the appended claims.

Claims

1. A system comprising: At least one processor; as well as At least one memory, communicatively coupled to the at least one processor, and including instructions that, when executed by the at least one processor, cause the system to perform operations including: Based on data captured by a first processing engine, a logical log and a log sequence number (LSN) associated with the logical log are generated. The first processing engine is configured to perform online transaction processing, wherein the LSN includes a DDL LSN associated with a Data Definition Language (DDL) operation. The logical log and the LSN are propagated to a storage device configured to communicate with a first processing engine and a second processing engine, the second processing engine being configured to perform online analysis processing; The metadata service stores and distributes the LSN and information indicating the LSN mode version; and The first processing engine, the second processing engine, the storage device, and the metadata service are modularized, configured to be decoupled from each other, and support the LSN for maintaining data consistency across the system.

2. The system according to claim 1, wherein the operation further includes: A quorum protocol is applied to replicas of the logical log in each partition of the storage device, wherein the quorum protocol maintains data consistency between the replicas during read and write processes.

3. The system according to claim 1, wherein the operation further includes: The gossip protocol is applied to replicas of the logical log in each partition of the storage device, wherein the gossip protocol resolves temporary data inconsistencies between the replicas.

4. The system of claim 1, wherein LSN is an incrementing integer indicating the storage state.

5. The system according to claim 4, wherein the operation further includes: The read LSN is assigned to a query from the metadata service, wherein the read LSN corresponds to the LSN of each partition of the storage device that is saved up to the time the query is compiled.

6. The system according to claim 5, wherein the operation further comprises: Initiate a scan task to retrieve data from the partitions of the storage device based on the read LSN.

7. The system according to claim 4, wherein the operation further comprises: The application indicates the state of multiple LSNs for multiple operations, wherein the multiple LSNs enable the multiple operations to work together in a concurrent and coherent manner, and wherein the multiple operations include data scanning, data insertion, data deletion, data refresh, compression, and garbage collection.

8. The system of claim 7, wherein the plurality of LSNs comprises: The minimum read LSN is used to concurrently perform data scanning and flush data from the incremental repository to the base repository in the storage device, wherein only data with an LSN smaller than the minimum read LSN is flushed from the incremental repository to the base repository. The previous refresh of LSN indicates the largest LSN refreshed previously; The next refresh LSN indicates the maximum LSN for the next refresh, which is selected between the minimum read LSN and the previous refresh LSN; and A truncated LSN is associated with the operation of removing old data from the storage device, wherein only data with an LSN smaller than the truncated LSN is removed from the storage device.

9. The system according to claim 1, further comprising: A centralized compilation service is configured to retrieve the read LSN for any specific query from the metadata service.

10. The system of claim 1, wherein the system provides multiple modes selectable by a user, the multiple modes corresponding to different levels of data consistency.

11. A method comprising: Based on data captured by a first processing engine, a logical log and a log sequence number (LSN) associated with the logical log are generated. The first processing engine is configured to perform online transaction processing, wherein the LSN includes a DDL LSN associated with a Data Definition Language (DDL) operation. The logical log and the LSN are propagated to a storage device configured to communicate with a first processing engine and a second processing engine, the second processing engine being configured to perform online analysis processing; The metadata service stores and distributes the LSN and information indicating the LSN mode version; as well as The first processing engine, the second processing engine, the storage device, and the metadata service are modularized, configured to be decoupled from each other, and support the LSN for maintaining data consistency across the first processing engine and the second processing engine.

12. The method of claim 11, further comprising: A quorum protocol is applied to replicas of the logical log in each partition of the storage device, wherein the quorum protocol maintains data consistency between the replicas during read and write processes.

13. The method of claim 11, further comprising: The gossip protocol is applied to replicas of the logical logs in each partition of the storage device, wherein the gossip protocol resolves temporary data inconsistencies between the replicas.

14. The method of claim 11, wherein LSN is an incrementing integer indicating the storage state.

15. The method of claim 14, further comprising: The read LSN is assigned to a query from the metadata service, wherein the read LSN corresponds to the LSN that has been saved for each partition of the storage device up to the time the query was compiled.

16. The method of claim 15, further comprising: Initiate a scan task to retrieve data from the partitions of the storage device based on the read LSN.

17. The method of claim 14, further comprising: The application indicates the state of multiple LSNs for multiple operations, wherein the multiple LSNs enable the multiple operations to work together in a concurrent and coherent manner, and wherein the multiple operations include data scanning, data insertion, data deletion, data refresh, compression, and garbage collection.

18. The method of claim 17, wherein the plurality of LSNs comprises: The minimum read LSN is used to concurrently perform data scanning and flush data from the incremental repository to the base repository in the storage device, wherein only data with an LSN less than the minimum read LSN is flushed from the incremental repository to the base repository. The previous refresh of LSN indicates the largest LSN refreshed previously; The next refresh LSN indicates the maximum LSN for the next refresh, which is selected between the minimum read LSN and the previous refresh LSN; and A truncated LSN is associated with the operation of removing old data from the storage device, wherein only data with an LSN smaller than the truncated LSN is removed from the storage device.

19. The method of claim 11, further comprising: Multiple modes are provided for users to choose from, and these modes correspond to different levels of data consistency.

20. A non-transitory computer-readable storage medium comprising computer-readable instructions, which, when executed by a system, cause the system to perform operations including: Based on data captured by a first processing engine, a logical log and a log sequence number (LSN) associated with the logical log are generated. The first processing engine is configured to perform online transaction processing, wherein the LSN includes a DDL LSN associated with a Data Definition Language (DDL) operation. The logical log and the LSN are propagated to a storage device configured to communicate with a first processing engine and a second processing engine, the second processing engine being configured to perform online analysis processing; The metadata service stores and distributes the LSN and information indicating the LSN mode version; as well as The first processing engine, the second processing engine, the storage device, and the metadata service are modularized, configured to be decoupled from each other, and support the LSN for maintaining data consistency across the first processing engine and the second processing engine.