Method, device and readable storage medium for rolling release based on flink-cdc

By adopting a rolling release and shared binlog parsing architecture, the problems of data interruption and source database load caused by Flink-CDC task updates are solved, achieving efficient and lossless data synchronization and consistency processing, and improving system availability and operational efficiency.

CN121636623BActive Publication Date: 2026-06-19云筑信息科技(成都)有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
云筑信息科技(成都)有限公司
Filing Date
2026-02-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing data synchronization solutions based on Flink-CDC suffer from problems such as opaque task updates leading to data interruptions and redundant resource consumption, affecting the real-time performance of data synchronization and the load on the source database.

Method used

A rolling release approach is adopted, which enables lossless task updates through state preservation and custom health monitoring. Combined with a shared binlog parsing architecture and intelligent scheduling, the task configuration change process is optimized to ensure the continuity and consistency of data synchronization.

🎯Benefits of technology

It achieves zero-interruption data synchronization updates, reduces the load on the source database, improves the intelligence of system operation and maintenance, ensures data order and consistency, and provides flexible point-in-time recovery capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121636623B_ABST
    Figure CN121636623B_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, and readable storage medium for rolling deployment based on Flink-CDC, belonging to the field of big data processing technology. The method includes rolling deployment: in response to a change request, triggering the task status information saved by a running first Flink-CDC task; creating a second Flink-CDC task based on the task status information and the changed content; stopping the first Flink-CDC task after confirming stability by monitoring its custom health indicators, achieving zero-interruption updates; and shared Binlog parsing: creating a parent task to uniformly monitor changes in multiple tables and deliver them to a message middleware according to primary key consistency routing rules; creating sub-tasks to consume and filter data before writing it to the target system, achieving one read and multiple writes. The apparatus includes a collaborative management end and a synchronization end. This invention solves the problems of task updates requiring interruption and high source database pressure during multi-target synchronization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of big data processing and data synchronization technology, specifically to a method, apparatus, and readable storage medium for rolling deployment based on Flink-CDC. Background Technology

[0002] Change data capture (CDC) technology is the core of real-time data synchronization. Flink-CDC, as a CDC implementation based on Apache Flink, can efficiently capture changes in the database's binary log (binlog).

[0003] However, existing data synchronization solutions based on Flink-CDC have the following shortcomings: 1. Opaque task updates: When adding or removing monitored data tables in a running Flink-CDC task, it is usually necessary to stop the current task, modify the configuration, and restart it. This process causes data flow interruption for several minutes, and the restart speed is negatively correlated with the number of monitored tables, seriously affecting the real-time performance and service availability of data synchronization. 2. Redundant resource consumption: If multiple downstream systems require changed data from the same data source, a separate Flink-CDC task needs to be deployed for each downstream system. These tasks each establish a connection to the source database and pull binlogs, causing repeated connection pressure, network I / O, and computational resource consumption on the source database, which may lead to database performance issues.

[0004] Therefore, there is an urgent need for a solution that can achieve lossless task updates and share binlog parsing to reduce pressure on the source end. Summary of the Invention

[0005] The present invention aims to provide a method, apparatus and readable storage medium for rolling deployment based on Flink-CDC, so as to solve the technical problems of data interruption caused by task updates and increased load on the source database due to multi-task synchronization in the existing solution.

[0006] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0007] A rolling deployment method based on Flink-CDC, including the rolling deployment process:

[0008] In response to a change request for the first running Flink-CDC task, the first Flink-CDC task is triggered to perform a state saving operation by calling the resource orchestrator application programming interface of the compute cluster, so as to obtain and persistently save the task state information.

[0009] Based on the task status information and the configuration content indicated by the change request, create and start the second Flink-CDC task;

[0010] By querying the indicator system, the custom health indicators reported by the second Flink-CDC task are monitored. The custom health indicators are initially set to the first state value when the second Flink-CDC task starts, and are automatically updated to the second state value after a preset delay after startup.

[0011] When the custom health metric is detected to be in the second state value, it is confirmed that the second Flink-CDC task has entered a stable running state. The first Flink-CDC task is stopped, and the second Flink-CDC task is used as the updated task of the first Flink-CDC task.

[0012] Furthermore, the configuration content indicated in the change request is at least one of the following: adding a data table to be monitored, removing a data table that has already been monitored, or modifying the data monitoring configuration parameters.

[0013] Furthermore, it also includes the process of sharing binlog parsing:

[0014] Create a parent task and configure it to uniformly monitor change events of multiple data tables in the source database and parse binary logs;

[0015] The parent task delivers the monitored change events to the message middleware. The delivery process follows a predetermined routing rule: change events with the same routing key are delivered to the same partition of the message middleware. The routing key is determined based on the primary key field value or the table name of the changed data in the change event.

[0016] Create at least one subtask and configure it to consume data from the message middleware, filter the data as needed, and write it to the target storage system.

[0017] Furthermore, shared binlog parsing also includes order guarantees:

[0018] When the parent task delivers a change event to the message middleware, it carries the determined routing key in the message header corresponding to the change event;

[0019] In the process of consuming data from the message middleware, the subtask first performs key partitioning based on the routing key extracted from the message header, then filters the partitioned data according to requirements, and finally writes it to the target storage system.

[0020] Furthermore, shared binlog parsing also includes intelligent scheduling:

[0021] When a request to synchronize a specified data table to a specified target system is received, the metadata database is queried to determine whether the parent task of the database to which the specified data table belongs and the sub-task of writing to the specified target system exist.

[0022] Based on the query results, automatically select and execute one of the following publishing strategies:

[0023] a. If both the parent task and the child task exist, then the rolling release process is executed, and the updated task is the updated child task;

[0024] b. If the parent task exists but the child task does not exist, a new child task is created. A unique consumer group identifier is generated for the new child task, and the new child task is configured to start consuming with the latest available offset of the message middleware. At the same time, the rolling publish process is performed, and the updated task is used to update the parent task.

[0025] c. If the parent task does not exist, create a new parent task and child task.

[0026] Furthermore, when executing the release strategy, the release order is controlled as follows: the creation and startup of subtasks are completed first, and the creation and startup of parent tasks are carried out only after the custom health indicators of the subtasks are monitored and it is confirmed that they have entered a stable running state.

[0027] A rolling deployment device based on Flink-CDC includes a collaborative management and synchronization endpoint:

[0028] On the control side, execute the above methods;

[0029] The synchronization endpoint, deployed on the computing cluster, runs data synchronization tasks created and scheduled by the management endpoint. These data synchronization tasks include:

[0030] At least one parent task is used to perform unified listening and delivery of change events;

[0031] At least one subtask is used to consume data from the message middleware and write it to the target system.

[0032] Furthermore, the control end includes:

[0033] The configuration management module is used to store the configuration content indicated by the change request;

[0034] Cluster interaction module, used for:

[0035] In response to the change request, the first Flink-CDC task execution state save operation is triggered by calling the resource orchestrator application programming interface of the computing cluster to obtain and persistently save the task state information.

[0036] Based on the storage path of the acquired task status information and the configuration content indicated by the change request obtained from the configuration management module, control the creation and start of the second Flink-CDC task;

[0037] The metrics monitoring module is used to monitor the custom health metrics reported by the second Flink-CDC task by querying the metrics system.

[0038] The cluster interaction module is also used for:

[0039] When the custom health metric is detected as the second state value by the metric monitoring module, it is confirmed that the second Flink-CDC task has entered a stable running state, the first Flink-CDC task is stopped, and the second Flink-CDC task is used as the updated task of the first Flink-CDC task.

[0040] A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, an electronic device containing the processor acts as a control terminal to implement the above-described method.

[0041] Compared with the prior art, the present invention has the following beneficial effects:

[0042] 1. Achieve true zero-disruption data synchronization updates: Through a lossless rolling release mechanism based on savepoint-based state synchronization and custom health monitoring, a smooth switch between old and new tasks can be achieved when updating task configurations (such as adding or removing listener tables). Businesses no longer need to tolerate several minutes of data stream interruption caused by task restarts in traditional solutions, significantly improving the availability and continuity of data synchronization services and meeting the stringent requirements of high-availability businesses for real-time data streams.

[0043] 2. Significantly Reduces Source Database Load: Employing a "one-read-many-write" shared Binlog parsing architecture, only one parent task needs to connect to the source database and pull the binary log (Binlog). Data can then be provided to multiple downstream targets (sub-tasks) via a message middleware. Compared to existing technologies that independently deploy CDC tasks for each downstream target, this invention reduces the number of connections, network I / O, and log parsing computation pressure on the source database from N times to 1 time, effectively avoiding performance bottlenecks and stability risks caused by excessive synchronous connections to the source database.

[0044] 3. Strictly guaranteeing data ordering and consistency within a decoupled architecture: An innovative design incorporates consistent routing rules based on primary keys (or table names) and a cross-task message header passing mechanism. This mechanism ensures that changes to the same data entity are always routed to the same processing path throughout the entire chain—from parent task processing and message middleware delivery to child task consumption—thus strictly guaranteeing the consistency of the final state. Even with multiple millisecond-level updates to the same entity, downstream processes can obtain the correct final state, solving the common data out-of-order problem encountered after introducing message queues for decoupling.

[0045] 4. Enhance the intelligence and automation of system operation and maintenance: Through the intelligent scheduling module at the management end, the system can automatically determine the status of tasks based on metadata and intelligently select the optimal release strategy (such as rolling updates, creating subtasks and updating parent tasks, or creating all new tasks). Simultaneously, it enforces a collaborative release sequence of "starting and confirming the stability of consumers (subtasks) first, then starting producers (parent tasks)," eliminating the risk of data loss from a process perspective. This greatly simplifies operation and maintenance in complex synchronous scenarios involving multiple tasks and objectives, reduces the risk of human error, and improves operational efficiency.

[0046] 5. Provides flexible point-in-time recovery and rollback capabilities: Utilizing regularly saved and long-term Savepoints, the system not only supports real-time synchronization but also allows rollback to any past saved point in time (such as any half-hour point within the past 7 days). This provides robust support for data correction, fault recovery, and historical data analysis, enhancing the flexibility of data management. Attached Figure Description

[0047] Figure 1 This is a flowchart of the rolling release process of this invention.

[0048] Figure 2 This is a flowchart of the shared binlog parsing process for this invention. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0050] In the description of this invention, it should be noted that the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance.

[0051] This invention provides a data synchronization device based on Flink-CDC, employing an architecture that separates the management and control end from the synchronization end, and decoupling the data production and consumption processes through a message middleware. The Flink-CDC tasks on the synchronization end can be deployed on various resource management platforms, including but not limited to Kubernetes (K8s) and Apache YARN. The management and control end achieves unified control of the tasks by adapting to the management interfaces provided by different platforms. This design ensures the broad applicability and future scalability of the technical solution.

[0052] To enable those skilled in the art to better understand and implement this invention, the following description uses a mainstream containerized deployment solution in the industry—based on a Kubernetes (K8s) cluster—and deploys and manages flink-cdc tasks through the Flink Kubernetes Operator (hereinafter referred to as Flink-Operator)—as a detailed example environment. It should be understood that this example is for illustrative purposes only and is not intended to limit the scope of this invention.

[0053] In one embodiment of the present invention, the control terminal includes:

[0054] The configuration management module is used to store the configuration content indicated by the change request;

[0055] Cluster interaction module, used for:

[0056] In response to the change request, the first Flink-CDC task execution state save operation is triggered by calling the resource orchestrator application programming interface of the computing cluster to obtain and persistently save the task state information.

[0057] Based on the storage path of the acquired task status information and the configuration content indicated by the change request obtained from the configuration management module, control the creation and start of the second Flink-CDC task;

[0058] The metrics monitoring module integrates with third-party metrics systems such as Prometheus to monitor custom health metrics reported by the second Flink-CDC task by querying the metrics system.

[0059] The cluster interaction module is also used for:

[0060] When the custom health metric is detected as the second state value by the metric monitoring module, it is confirmed that the second Flink-CDC task has entered a stable running state, and the first Flink-CDC task is stopped.

[0061] In this Kubernetes environment, the cluster interaction module encapsulates calls to the Kubernetes API Server. It enables fine-grained control over synchronous tasks by creating, updating, or deleting custom resources such as FlinkDeployment or FlinkSessionJob defined by Flink-Operator. For example, it triggers a savepoint by updating the `spec.job.savepointTriggerNonce` field of the resource, and sets the task recovery point by modifying `spec.job.initialSavepointPath`.

[0062] In one embodiment of the present invention, the synchronization endpoint is specifically implemented in a Kubernetes cluster. The creation and management process of the synchronization endpoint (i.e., Flink job instances) by the Flink-Operator is as follows: The management endpoint creates or updates corresponding custom resources (such as FlinkDeployment) through the Kubernetes API. The Flink KubernetesOperator deployed in the cluster continuously monitors changes to such resources. When it detects a custom resource created by the management endpoint, the Operator is responsible for creating and managing the corresponding Flink job instances (including JobManager and TaskManagerPods) according to the resource definition. The collection of these running Flink job instances constitutes the synchronization endpoint.

[0063] Flink job instances are divided into two categories:

[0064] Parent task: This refers to the parent task for binlog parsing. Each parent task is an independent FlinkDeployment resource, whose Pod runs a flink-cdc job responsible for uniformly monitoring change events of multiple tables in the source database (such as MySQL) and reading binary logs.

[0065] Subtasks: These are data consumption subtasks. Each subtask is also an independent Flink job (which can be a FlinkDeployment or a FlinkSessionJob). Its Pod consumes data from a message middleware (such as Kafka), filters and processes it, and then writes it to the target system.

[0066] Message middleware: A dedicated Apache Kafka cluster is used. The parent task writes data to different topics, and the child tasks subscribe to these topics to consume the data.

[0067] like Figure 1 As shown, this invention provides a rolling deployment method based on Flink-CDC, including the rolling deployment process:

[0068] In response to a change request for the first running Flink-CDC task, the first Flink-CDC task is triggered to perform a state saving operation by calling the resource orchestrator application programming interface of the compute cluster, so as to obtain and persistently save the task state information.

[0069] Based on the task status information and the configuration content indicated by the change request, create and start the second Flink-CDC task;

[0070] By querying the indicator system, the custom health indicators reported by the second Flink-CDC task are monitored. The custom health indicators are initially set to the first state value when the second Flink-CDC task starts, and are automatically updated to the second state value after a preset delay after startup.

[0071] When the custom health metric is detected to be in the second state value, it is confirmed that the second Flink-CDC task has entered a stable running state. The first Flink-CDC task is stopped, and the second Flink-CDC task is used as the updated task of the first Flink-CDC task.

[0072] This rolling deployment process is the core of this invention's zero-disruption task update capability. Traditional Flink-CDC task updates require a "stop-modify-restart" process, inevitably leading to several minutes of data stream interruption. The innovation of this invention lies in creatively applying the rolling update concept to data synchronization tasks. Through the core technology of state preservation and recovery, combined with a precise health status determination mechanism, it achieves a smooth switch between old and new task instances.

[0073] In one embodiment of the present invention, it is necessary to configure the state persistence path (Savepoint and Checkpoint) for the task, i.e., the storage path, and set the automatic save period to ensure that a relatively recent state snapshot is available. For example, the savepoint and checkpoint period and the save location to be enabled by the task can be defined by parameters such as state.savepoints.dir, state.checkpoints.dir, kubernetes.operator.periodic.savepoint.interval, kubernetes.operator.checkpoint.trigger.grace-period.

[0074] In one embodiment of this invention, the state saving operation is implemented through Flink's Savepoint mechanism. Savepoint is a special, complete state snapshot mechanism built upon Flink's standard Checkpoint fault tolerance, but specifically designed for planned operations rather than fault recovery. By calling the resource orchestrator interface to trigger a Savepoint, this operation generates a persistent snapshot containing complete state information based on the task's latest Checkpoint. This snapshot is then used to save and restore table structure information, thus addressing the problem of new task startup failures due to table structure changes.

[0075] In one embodiment of the present invention, it is assumed that the first running flink-cdc task corresponds to a FlinkDeployment named parent-job-old in the Kubernetes cluster. The cluster interaction module on the management end modifies the value of the spec.job.savepointTriggerNonce parameter in the FlinkDeployment / Parent-Job-Old resource definition by calling the Kubernetes API. When this change is committed (apply), the FlinkKubernetes Operator listens for the change and issues an instruction to the first flink-cdc task (i.e., the Parent-Job-Old instance) to trigger its state saving operation (Savepoint), persisting the complete state of the current task to the configured storage path. This state information includes at least the binary log position (GTID) of the Job-Old consumer source and the table structure (Schema) information of the monitored data table (db1.users). Saving the table structure information is crucial, as it ensures that even if the db1.users table undergoes a structural change (ALTER TABLE) during this period, the new task can correctly parse the data at historical positions.

[0076] In one embodiment of the present invention, since the Savepoint operation is asynchronous, it typically lasts from tens of seconds to several minutes. The cluster interaction module on the management end will query the status of the Savepoint resource in Kubernetes at cyclical intervals (e.g., every 10 seconds) until its status becomes "COMPLETED" and obtain the storage path of the task status information.

[0077] In one embodiment of the present invention, the management terminal, based on the acquired storage path (including task status information) and the configuration content indicated by the change request (including adding a data table to be monitored, removing a data table already being monitored, and modifying data monitoring configuration parameters), instructs the cluster interaction module to create a new FlinkDeployment resource (such as parent-job-new), i.e., the second flink-cdc task. During creation, the save path is written to the spec.job.initialSavepointPath field. The Flink-Operator will create a new Pod group to start the task.

[0078] In one embodiment of the invention, after the parent-job-new Pod starts, its Source operator executes initialization code: immediately reporting the custom health metric isStart=0, and starting a delayed thread to update it to isStart=1 after 10 seconds. The metric monitoring module on the management side polls this metric by querying the Prometheus API. When the metric monitoring module finds isStart=1, the task scheduling module confirms that the new task has entered a stable running state. Subsequently, the cluster interaction module calls the Kubernetes API to delete the FlinkDeployment / parent-job-old resource, and Flink-Operator gracefully stops all Pods it manages, stopping the first flink-cdc task. At this point, the rolling deployment is complete.

[0079] like Figure 2 As shown, the rolling deployment method based on Flink-CDC provided by this invention also includes a shared binlog parsing process:

[0080] Create a parent task and configure it to uniformly monitor change events of multiple data tables in the source database and parse binary logs.

[0081] The parent task delivers the monitored change events to the message broker. The delivery process follows predetermined routing rules: change events with the same routing key are delivered to the same partition within the message broker. The routing key is determined based on the primary key field value of the changed data in the change event; if the changed data has no primary key, the routing key is determined based on the name of its corresponding table. The delivered change event includes the data before the change (before), the data after the change (after), and table structure information (schema), enabling subtasks to perform complete data processing as needed.

[0082] Create at least one subtask and configure it to consume data from the message middleware, filter the data as needed, and write it to the target storage system.

[0083] This shared binlog parsing method is the core architectural innovation of this invention in addressing the pressure on the source database in multi-target synchronization scenarios. Traditionally, deploying a separate Flink-CDC task for each downstream target (such as Elasticsearch or Doris) leads to repeated fetching and parsing of the source database's binary logs, resulting in N times the consumption of connection count, network I / O, and computing resources. This invention innovatively introduces a "read-multiple-write" pipeline architecture, decoupling the single data capture from multiple consumptions through a message middleware.

[0084] This shared binlog parsing method explicitly defines the processing order as "partition first, then filter." This is a key design feature to ensure sequentiality, preventing filtering operations from disrupting the original partitioning logic. This order guarantee mechanism is specifically designed to address data out-of-order issues that may arise from the decoupling caused by the introduction of message middleware. Its core lies in passing the processing state (routing key) in the message, enabling downstream systems to accurately reproduce the upstream data partitioning logic. This ensures that update operations on the same entity, even after multi-stage processing in a distributed system, are executed sequentially, with the final state strictly consistent with the source database.

[0085] Taking a typical application scenario as an example, when it is necessary to synchronize data from the same database db1 to three different target systems (Elasticsearch (ES) for search, Doris for analysis, and another Kafka cluster for stream processing), this invention will:

[0086] Create a parent task to monitor all relevant tables in db1 and post changed data to topic_db1.

[0087] Create three subtasks:

[0088] Subtask 1: Consume topic_db1, filter data from the orders table, and write it to Elasticsearch.

[0089] Subtask 2: Consume topic_db1, filter data from the users table, and write it to Doris.

[0090] Subtask 3: Consume topic_db1 and write all data to another Kafka cluster for downstream consumption.

[0091] Before writing data to Kafka, perform unified data formatting and lightweight cleaning to ensure that the data received by downstream systems is consistent in structure and timing.

[0092] In one embodiment of the present invention, the management terminal, according to the configuration, instructs the cluster interaction module to create a FlinkDeployment resource parent-job-db1, i.e., the parent task, whose image contains the configured parent task code. This task uniformly monitors the users and orders tables of the db1 database.

[0093] In one embodiment of the present invention, after the parent-job-db1 Pod runs, it executes a predetermined routing rule for each changed data: using the data primary key (or table name) as the routing key, it performs a Flink keyBy operation, and ensures that data with the same routing key is written to the same partition of Kafka, while storing the routing key in the Kafka message header.

[0094] In one embodiment of the present invention, the management end creates three sub-task FlinkDeployment resources: child-job-db1-to-es (write to ES), child-job-db1-to-doris (write to Doris), and child-job-db1-to-kafka2 (write to another Kafka cluster).

[0095] In one embodiment of the present invention, after the subtask Pod starts, it consumes data from Kafka. First, it extracts the routing key from the message header and performs a keyBy operation, then filters out the data from the required tables, and finally writes it to its respective target.

[0096] The shared binlog parsing of this invention also includes intelligent scheduling. The following example of synchronizing the products table (as an example of a newly added data table) in the db1 database to ES illustrates the intelligent scheduling process.

[0097] First, the control end queries the metadata database in the configuration management module to determine whether the parent task for listening to db1 and the child task for writing to ES exist.

[0098] Scenario A (both parent and child tasks exist): Update the existing child task that writes to ES by performing a rolling publish process (i.e., first trigger its Savepoint, create a new instance with the original consumer group identifier and offset), and update the child task through the above lossless rolling publish steps (add a listener to db1.products).

[0099] Scenario B (parent task exists, child task does not exist): Create a new child task (generate a unique consumer group identifier based on timestamp and random number, starting from the latest offset), and update the parent task through the rolling publish process described above.

[0100] Scenario C (Parent task does not exist): Create a new parent task and child task.

[0101] Finally, in any release that includes subtasks, prioritize ensuring that the subtasks start successfully and are stable (confirmed by health metrics) before releasing or updating the parent task.

[0102] In scenarios B and C, before starting a new subtask, the system checks if the corresponding message middleware topic exists. If it does not exist, the topic is created automatically.

[0103] Each subtask is assigned a unique consumer group identifier upon creation. This identifier is persistently stored in the metadata repository built into the configuration management module. When a subtask is updated, the exact same consumer group identifier is reused, ensuring that from the perspective of the Kafka server, it remains the same consumer, thus inheriting the original consumption progress and permissions.

[0104] Before triggering a subtask update, the management console will perform a state save operation for the subtask, similar to a savepoint for the parent task. This state save records the subtask's precise consumption offset within the Kafka topic. Newly created subtasks will begin consuming based on this saved offset, ensuring that processed data is not consumed repeatedly and that unconsumed data is not lost.

[0105] When executing the deployment strategy, a strict "child-before-parent" startup order is followed. This is because in a shared Binlog architecture, the parent task is the data producer, and the child tasks are the data consumers. The management system must first ensure that the child tasks (consumers) start successfully and enter a stable state, capable of consuming data from the message middleware, before starting the parent task (producer) to begin producing data. This sequential control fundamentally prevents the risk of data accumulating or being lost in the middleware due to consumers not being ready, and is a key design feature ensuring the integrity and reliability of the entire data stream.

[0106] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the electronic device containing the processor acts as a control terminal to implement the methods described above.

[0107] Finally, it should be noted that the above embodiments are merely preferred embodiments of the present invention used to illustrate the technical solutions of the present invention, and are not intended to limit the invention, nor are they intended to limit the patent scope of the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. These modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention. That is to say, any changes or refinements made to the main design concept and spirit of the present invention that are not of substantial significance, but whose technical problems are still consistent with the present invention, should be included within the protection scope of the present invention. In addition, the direct or indirect application of the technical solutions of the present invention to other related technical fields are similarly included within the patent protection scope of the present invention.

Claims

1. A rolling deployment method based on Flink-CDC, characterized in that, This includes the rolling release process: In response to a change request for the first running Flink-CDC task, the first Flink-CDC task is triggered to perform a state saving operation by calling the resource orchestrator application programming interface of the compute cluster, so as to obtain and persistently save the task state information. Based on the task status information and the configuration content indicated by the change request, create and start the second Flink-CDC task; By querying the indicator system, the custom health indicators reported by the second Flink-CDC task are monitored. The custom health indicators are initially set to the first state value when the second Flink-CDC task starts, and are automatically updated to the second state value after a preset delay after startup. When the custom health metric is detected to be the second state value, it is confirmed that the second Flink-CDC task has entered a stable running state, and the first Flink-CDC task is stopped. The second Flink-CDC task is then used as the updated task of the first Flink-CDC task. It also includes the process of parsing the shared binlog: Create a parent task and configure it to uniformly monitor change events of multiple data tables in the source database and parse binary logs; The parent task delivers the monitored change events to the message middleware. The delivery process follows a predetermined routing rule: change events with the same routing key are delivered to the same partition of the message middleware. The routing key is determined based on the primary key field value or the table name of the changed data in the change event. Create at least one subtask and configure it to consume data from the message middleware, filter the data as needed, and write it to the target storage system.

2. The rolling deployment method based on Flink-CDC according to claim 1, characterized in that, The configuration content indicated in the change request is at least one of the following: adding a data table to be monitored, removing a data table that has been monitored, or modifying the data monitoring configuration parameters.

3. The rolling deployment method based on Flink-CDC according to claim 1, characterized in that, Shared binlog parsing also includes order guarantees: When the parent task delivers a change event to the message middleware, it carries the determined routing key in the message header corresponding to the change event; During the processing of data consumed from the message middleware, the subtask performs key partitioning based on the routing key extracted from the message header, filters the partitioned data according to requirements, and writes it to the target storage system.

4. The rolling deployment method based on Flink-CDC according to claim 1, characterized in that, Shared binlog parsing also includes intelligent scheduling: When a request to synchronize a specified data table to a specified target system is received, the metadata database is queried to determine whether the parent task of the database to which the specified data table belongs and the sub-task of writing to the specified target system exist. Based on the query results, automatically select and execute one of the following publishing strategies: a. If both the parent task and the child task exist, then the rolling release process is executed, and the updated task is the updated child task; b. If the parent task exists but the child task does not exist, a new child task is created. A unique consumer group identifier is generated for the new child task, and the new child task is configured to start consuming with the latest available offset of the message middleware. At the same time, the rolling publish process is performed, and the updated task is the updated child task. c. If the parent task does not exist, create a new parent task and child task.

5. The rolling deployment method based on Flink-CDC according to claim 4, characterized in that, When executing the release strategy, the release order is controlled as follows: the creation and startup of subtasks are completed first, and the creation and startup of parent tasks are carried out only after the custom health indicators of the subtasks are monitored and confirmed to have entered a stable running state.

6. A rolling deployment apparatus based on Flink-CDC, characterized in that, This includes the control and synchronization ends for collaborative work: The control unit executes the method as described in any one of claims 1 to 5; The synchronization endpoint, deployed on the computing cluster, runs data synchronization tasks created and scheduled by the management endpoint. These data synchronization tasks include: At least one parent task is used to perform unified listening and delivery of change events; At least one subtask is used to consume data from the message middleware and write it to the target system.

7. The apparatus according to claim 6, characterized in that, The control terminal includes: The configuration management module is used to store the configuration content indicated by the change request; Cluster interaction module, used for: In response to the change request, the first Flink-CDC task execution state save operation is triggered by calling the resource orchestrator application programming interface of the computing cluster to obtain and persistently save the task state information. Based on the storage path of the acquired task status information and the configuration content indicated by the change request obtained from the configuration management module, control the creation and start of the second Flink-CDC task; The metrics monitoring module is used to monitor the custom health metrics reported by the second Flink-CDC task by querying the metrics system. The cluster interaction module is also used for: When the custom health metric is detected as the second state value by the metric monitoring module, it is confirmed that the second Flink-CDC task has entered a stable running state, the first Flink-CDC task is stopped, and the second Flink-CDC task is used as the updated task of the first Flink-CDC task.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When a computer program is executed by a processor, it causes an electronic device containing the processor to act as a control terminal, implementing the method as described in any one of claims 1 to 5.

Citation Information

Patent Citations

  • Log recording method, device and system

    CN114629929A

  • Accuracy and providing explainability and transparency for query response using machine learning models

    US20250147957A1