Late binding and data-aware compilation release approach for declarative database change

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using a late binding and data-aware compilation and release system, the problems of database change conflicts and high-risk operations in parallel development are solved, the automation and security of database changes are realized, and the efficiency and controllability of the release process are improved.

CN122309477APending Publication Date: 2026-06-30JIUZHANG ARITHMETIC (ZHEJIANG) TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: JIUZHANG ARITHMETIC (ZHEJIANG) TECH CO LTD
Filing Date: 2026-02-04
Publication Date: 2026-06-30

Application Information

Patent Timeline

04 Feb 2026

Application

30 Jun 2026

Publication

CN122309477A

IPC: G06F16/21; G06F16/2458; G06F8/41

AI Tagging

Technology Topics

Execution plan Theoretical computer science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies suffer from merging conflicts and release blocking issues in database change scripts in parallel development environments. Furthermore, the expert experience of database administrators is difficult to productize and automate, making it difficult to execute high-risk operations in production environments safely and efficiently.

Method used

The system employs a late binding and data-aware compilation and release system. The late binding version management module parses and aggregates atomic change intentions, uses the compiler to generate an optimized physical execution plan, combines a context collector to obtain static and dynamic contexts, and a rule engine to perform conflict detection and optimization, generating a physical execution plan containing four-tuple execution units.

Benefits of technology

It eliminates script order dependencies, reduces manual coordination costs, automates security measures, improves the flexibility and observability of the release process, reduces production risks, and minimizes the impact on database performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122309477A_ABST

Patent Text Reader

Abstract

This invention provides a late-binding and data-aware compilation and release method for declarative database changes. Through the "late-binding" and "target-state aggregation" models, it eliminates script order dependencies at the architectural level. Developers can independently submit change intentions without worrying about the execution order, significantly reducing manual coordination costs. The "data-aware compiler" distills DBA expert experience into configurable policy rules, automatically selecting safe execution strategies and configuring safety parameters for high-risk operations, transforming manual review into automated decision-making, and significantly reducing the probability of production accidents caused by human error. Through instruction fusion and resource parameterization, the system can generate optimized execution plans, reducing the total duration of changes and the impact on database performance. The late-binding model makes the organization of releases extremely flexible. The generated physical plan includes built-in checks and rollback steps, making the entire change process clear, controllable, observable, and auditable.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of database technology, and in particular to a late binding and data-aware compilation and deployment system for declarative database changes, as well as a late binding and data-aware compilation and deployment device, electronic device, and computer-readable storage medium for declarative database changes. Background Technology

[0002] In modern software development practice, the evolution of database schemas is closely related to the iteration of application code. To manage database changes, the industry has developed a series of tools and methodologies, but existing technical solutions generally suffer from the following technical pain points when dealing with large-scale, high-concurrency parallel development environments:

[0003] Existing technical solution 1: Management model based on migration scripts and its limitations

[0004] Traditional tools like Flyway and Liquibase operate on the core principle of maintaining a linear, version-incrementing sequence of change scripts. Developers encapsulate database changes (such as ALTER TABLE statements) in versioned SQL scripts, which are executed sequentially during deployment. This model works well in monolithic applications or small teams, but its drawbacks become increasingly apparent in microservice architectures and multi-branch parallel development. Developers on different functional branches may simultaneously modify the same database object, leading to severe merge conflicts (Merge Hells) when merging scripts into the main branch. Resolving these conflicts requires significant manual coordination, and the strictly locked script execution order lacks flexibility, severely hindering the development and release process. While these tools introduce mechanisms like contexts or cherry-pick merging to attempt to filter scripts, this doesn't fundamentally solve the problem; the essence remains "selecting which scripts to execute," rather than intelligently aggregating and rearranging change intentions.

[0005] Existing technical solution two: Limitations of declarative / target state comparison tools

[0006] Next-generation tools, exemplified by Atlas, have introduced a declarative approach, allowing developers to describe the "target state" of the database. The tool then automatically compares the current state with the target state (diff) and generates a migration plan. This addresses the issue of script order dependencies to some extent. However, existing declarative tools primarily focus on static schema difference analysis, lacking sufficient awareness of the dynamic context of the production environment when generating execution plans. For example, they typically do not automatically select the optimal execution operator (such as native DDL, pt-online-schema-change, or gh-ost) or dynamically adjust execution parameters (such as batch size and rate limiting thresholds) based on dynamic metrics such as the actual table size, current database query load (QPS), and master-slave replication latency. This results in compromises in the security and efficiency of the generated plans in complex production environments.

[0007] In summary, existing technical solutions generally suffer from the following core problems:

[0008] Significant risks associated with production deployments: Database changes are high-risk operations. A seemingly simple ALTER TABLE statement, when executed on a large table in a production environment, can lead to prolonged table locking and service outages. To mitigate these risks, enterprises typically rely on experienced database administrators (DBAs) for manual review and manual execution of changes using online change tools (such as pt-online-schema-change or gh-ost). This makes DBAs a bottleneck for deployments, placing an extremely heavy review burden on them.

[0009] Loss and inconsistency of expert knowledge: Knowledge of security changes relies heavily on the unstructured expert experience of a few senior DBAs (often referred to as 'tribal knowledge'), which is difficult to pass on and scale, resulting in inconsistent security standards among different teams and a high risk of knowledge loss.

[0010] The efficiency gap between development and operations: Developers pursue rapid iteration, while the security requirements of database changes make the operations process lengthy and cautious. In DevOps practices, database changes often become the slowest and most manually-intervention-intensive part of the automated pipeline, creating a significant efficiency gap.

[0011] Therefore, there is an urgent need in this field for a new technical solution that can fundamentally solve the conflict problem in parallel development, productize and automate the expert experience of DBAs, and be able to sense the dynamic changes in the production environment, thereby generating a safe and optimized physical execution plan and achieving true automation of database changes. Summary of the Invention

[0012] In order to solve the following technical problems existing in the prior art:

[0013] How to eliminate or reduce merge conflicts and release blocking caused by linear dependencies of database change scripts in parallel development mode;

[0014] How to formalize, productize, and automate the expert decision-making knowledge of DBAs in high-risk production environments (such as selecting online change tools and setting safety parameters) to reduce release risks and free up manpower;

[0015] How can database change tools be made aware of the static context (such as table size and database version) and dynamic context (such as real-time load and replication latency) of the production environment, and adaptively optimize change execution strategies accordingly?

[0016] How can we flexibly organize independent, atomic change intentions into different versions of release content, and perform intelligent compilation at the release point, instead of simply executing preset scripts sequentially?

[0017] This invention provides the following technical solution:

[0018] On the one hand, a late-binding and data-aware build-and-deploy system for declarative database changes is provided, including:

[0019] The late-binding version management module is configured to parse the version list and aggregate atomic change intents from one or more independent stores.

[0020] The compiler front end is configured to generate the target pattern based on the database baseline pattern, generate an intermediate representation (IR) containing difference information, and perform semantic conflict detection in the process;

[0021] The context collector is configured to collect both static and dynamic contexts.

[0022] The rules engine is configured to evaluate matching rules in groups based on explicit priorities, adopting only the highest priority non-exclusive actions in each group, and stopping and reporting conflicts when encountering mutually exclusive actions of the same priority.

[0023] The optimizer is configured to perform at least one of the following on the IR: operator selection, instruction fusion, and resource parameterization.

[0024] The compiler backend is configured to perform topological sorting of execution steps based on dependencies and output a physical execution plan containing quadruple execution units;

[0025] The executor is configured to perform a consistency check on the baseline hash before execution, abort and generate a failure report and recommended rollback plan if any step fails.

[0026] Preferably, the physical execution plan includes multiple execution steps ordered by topology, each step consisting of a quadruple execution unit composed of a pre-check instruction, a core execution instruction, a post-check instruction, and a rollback instruction.

[0027] Preferably, the baseline hash is obtained by normalizing and serializing the database schema object and calculating a one-way hash, and the executor is authorized to execute only when the real-time hash matches the hash stored in the plan.

[0028] Preferably, the IR includes at least operation type, object identifier, attribute change and dependency metadata, used to drive optimization and topology sorting.

[0029] Preferably, the rule engine groups matching rules according to action type and only adopts the highest priority rule in each group; when there are mutually exclusive actions with the same priority in the same group, compilation is stopped and a conflict list is output.

[0030] Preferably, the instruction fusion is limited to multiple changes on the same data table that can be safely merged in a single physical operation, identifying and excluding operations that are exclusive or destructive from participating in the fusion, wherein the operations include renaming, partitioning changes, or engine migration.

[0031] Preferably, the resource parameterization calculates execution parameters based on thresholds for replication latency, query load, and table row count, and automatically matches a preset low-priority fallback rule when a dynamic context is missing, employing conservative parameters and low-risk operators. The preset low-priority fallback rule is as follows:

[0032] When the context collector is unable to acquire dynamic data, the system will not fail. Instead, it will automatically apply a set of predefined, highly conservative security default parameters to ensure the safety of the change execution.

[0033] Preferably, after the fusion operation is completed, the actuator performs post-verification on all sub-changes one by one. If any one of them is not satisfied, it is considered a failure and the process is terminated. A suggested rollback plan is then generated in reverse order of the successful steps.

[0034] Preferably, the system includes a human-machine collaborative rule generation module, which receives natural language policies and generates structured rule drafts, which are then manually reviewed and confirmed before being stored in the database and their versions and audit information are recorded.

[0035] On the other hand, an automated compilation method for database changes is provided, including:

[0036] Analyze the version list and aggregate atomic change intentions;

[0037] Construct baseline and target models, and perform intermediate representation containing difference information: IR generation;

[0038] Collect static and dynamic context;

[0039] The rule engine performs at least one optimization based on operator selection, instruction fusion and resource parameterization, wherein the rule evaluation method in the rule engine is as follows: a grouping and priority selection mechanism is adopted, and when a mutually exclusive action of the same priority is encountered, the operation is stopped and a conflict list is output.

[0040] The execution steps are topologically sorted based on dependencies, and a physical execution plan containing four-tuple execution units is generated.

[0041] Before execution, a baseline hash consistency check is performed, and if any step fails, the process is aborted and a failure report and recommended rollback plan are output.

[0042] On the other hand, an electronic device is provided, comprising: a processor; and a memory storing computer-readable instructions, wherein when executed by the processor, the computer-readable instructions implement the late binding and data-aware compile-and-deploy method for declarative database changes as described above.

[0043] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction is stored in the storage medium, the at least one instruction being loaded and executed by a processor to implement the above-described late binding and data-aware compile-and-deploy method for declarative database changes.

[0044] The beneficial effects of the technical solutions provided in the embodiments of the present invention include at least the following:

[0045] By using the "late binding" and "target state aggregation" models, the script order dependency is eliminated from the architecture. Developers can submit change intentions independently without worrying about the execution order, which greatly reduces the cost of manual coordination.

[0046] Automated safety assurance reduces production risks: The "data-aware compiler" transforms the DBA's expert experience into configurable policy rules, automatically selects safe execution strategies and configures safety parameters for high-risk operations, and transforms manual review into automated decision-making, significantly reducing the probability of production accidents caused by human negligence.

[0047] Reduce the mental burden on DBAs: Through instruction fusion and resource parameterization, the system can generate optimized execution plans, reducing the total duration of changes and the impact on database performance.

[0048] High flexibility and observability: The late binding model makes it extremely flexible for organizations to release content (such as emergency hotfixes). The generated physical plan comes with built-in checks and rollback steps, making the entire change process clear, controllable, observable, and auditable. Attached Figure Description

[0049] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0050] Figure 1 System overall architecture: late binding version management module, compiler (front-end / middle / back-end), context collector, executor and data flow;

[0051] Figure 2 Late binding and intent aggregation process: manifest parsing → intent assembly → semantic conflict detection → target state generation;

[0052] Figure 3 Front-end difference and conflict detection process: baseline / target graph construction, namespace / type / lifecycle conflict examples;

[0053] Figure 4 Rule evaluation and optimization process: rule matching → grouping by action → priority selection → failure due to ambiguity → multiple optimization iterations;

[0054] Figure 5 Physical execution plan data structure: quadruple execution unit and step dependency DAG topology sorting.

[0055] Figure 6 Executor process: Baseline hash verification → Sequential execution → Abort on failure → Generation of recommended rollback plan;

[0056] Figure 7 Baseline hash generation and verification: Normalized serialization → Hash → Consistency comparison before execution;

[0057] Figure 8 Instruction fusion example: Merging the same ADD COLUMN / INDEX into a single online table reconstruction;

[0058] Figure 9 End-to-end sequence diagram of a complete change release ( Figure 9 The flowchart for the development-release planning-CI / CL triggering phase; the flowcharts for the compilation phase, execution and feedback phases are as described in the example): It depicts the complete end-to-end interactive process from the developer submitting atomic change intent, to the release manager planning the version list, to the CI / CD pipeline triggering compilation, and finally to the executor applying the physical execution plan on the target database. It clearly shows the collaborative relationship between different roles (developers, release managers) and system components (compilers, executors) on the timeline. Detailed Implementation

[0059] The technical solution of the present invention will now be described with reference to the accompanying drawings.

[0060] To facilitate a clear understanding of this embodiment, a glossary of terms is provided below to aid in comprehending the technical principles of this invention:

[0061] I. Core Model and Concept

[0062]

[0063] II. System Components and Modules

[0064]

[0065] III. Data Structures and Products

[0066]

[0067] IV. Core Processes and Mechanisms

[0068] the term English / Alternative Names Definitions and Explanations Conflict detection ConflictDetection This occurs when the front-end aggregates intents. It is divided into hard conflicts (logically incompatible and must be stopped) and soft conflicts (potential risks, issuing warnings). Operator selection OperatorSelection / InstructionSelection The optimizer selects the most suitable physical execution tool (operator) for each logical change based on policy rules and context, such as native DDL, pt-online-schema-change, gh-ost, etc. Instruction Fusion InstructionFusion The optimizer combines multiple independent change operations for the same database object into a single, more efficient physical operation to reduce total execution time and database load. Resource parameterization Resource Parametrization The optimizer dynamically calculates and sets safe runtime parameters for the selected execution operators based on the real-time dynamic context, such as batch size (chunk-size) and maximum load threshold (max-load). Baseline drift detection Baseline DriftDetection Before executing the plan, the executor recalculates the real-time hash of the current database and compares it with the baseline hash stored in the plan. If they do not match, execution is aborted to prevent operation in an unknown state. Normalized serialization CanonicalSerialization To ensure the stability of the baseline hash, the schema information is uniformly processed before hash calculation, including sorting objects by name, format normalization, ignoring comments and unstructured metadata, etc. Safety safety net mechanism Safety Fallback Mechanism When the context collector is unable to acquire dynamic data, the system will not fail, but will automatically apply a set of predefined, most conservative security default parameters to ensure the safety of the change execution. Human-machine collaborative rollback Human-in-the-Loop Rollback When execution fails, the system will not automatically roll back. Instead, it will generate a clear "recommended rollback plan" which will be reviewed by the DBA or operations engineer to determine whether and how to execute it. Topological sorting TopologicalSort A key compiler backend algorithm. It constructs a directed acyclic graph (DAG) by analyzing the dependencies between execution steps (such as the creation / modification order of tables, columns, indexes, and foreign keys), and generates a linear sequence that ensures all prerequisites are executed before subsequent operations, thereby ensuring the logical correctness of the final physical execution plan.

[0069] The technical solution, principle, and technical effects of the present invention will be described in detail below.

[0070] The core of this invention is the introduction of a true "compilation" concept, while existing technologies essentially remain at the level of "configuration execution" or "static generation." This application will illustrate this crucial difference through a detailed comparison table below.

[0071] This invention discloses a late binding and data-aware compilation and deployment system and method for declarative database changes. The specific steps of the method are as follows:

[0072] Receiving Atomic Change Intents: Developers submit atomic change intents in a structured, execution-independent format (such as YAML). Each intent describes a minimal logical change, such as "add an email field to the user table".

[0073] Late Binding Version Inventory: The release manager creates an application version inventory that "binds" one or more independent atomic change intentions to a specific application version through references. This binding operation occurs during the release planning phase, decoupling the creation and release of changes.

[0074] Target State Aggregation and Generation: Upon release, the system aggregates all relevant atomic change intentions based on the version manifest. The system first loads the baseline schema of the current database, and then applies all aggregated intentions in memory to generate a final, consistent target schema state.

[0075] Conflict detection and semantic analysis: During the generation of the target state, the system automatically performs conflict detection, such as detecting contradictory modifications to the same field or attempts to add an index to a deleted table, which are hard conflicts.

[0076] Difference analysis and intermediate representation generation: The system performs a deep comparison between the baseline schema and the target schema, generating a platform-independent, structured list of differences as the compiler's intermediate representation (IR).

[0077] Data-aware optimization: The compiler performs a series of optimizations on this IR. This process actively collects the static context (such as table row count, storage engine, index definitions) and dynamic context (such as real-time QPS, replication latency) of the target database. The optimization process includes, but is not limited to:

[0078] Instruction Selection: Based on a pre-defined policy rule base and the collection context, the optimal physical execution operator is selected for each difference operation. For example, for large tables with more than one million rows, an online change tool (such as pt-online-schema-change) is automatically selected, while for small tables, the native ALTER TABLE command may be selected.

[0079] Instruction fusion combines multiple independent change operations targeting the same table (or other object) into a single, more efficient physical operation. For example, it can merge the intents "add column" and "add index" into multiple ALTER clauses within a single pt-online-schema-change command.

[0080] Resource Parameterization: Based on the collected dynamic context, dynamically calculate and set safe execution parameters for the selected execution operator, such as batch size (chunk-size) and maximum load threshold (max-load).

[0081] Physical Execution Plan Generation: After optimization, the compiler backend "lowers" the optimized IR into a concrete, executable physical plan. This plan is a linear sequence of steps, each containing explicit pre-checks, execution commands, post-checks, and rollback commands, and is topologically ordered based on the dependencies between objects.

[0082] Output and Execution: Finally, the system outputs the physical execution plan (usually in JSON format), which is then executed by external executors such as CI / CD pipelines.

[0083] To more clearly and deeply illustrate the working principle and practical application effects of this invention, the following will cover a complete scenario through a series of specific embodiments, from a complete change release process and LLM-based intelligent rule generation to various boundary conditions (such as context collection degradation, cloud environment adaptation, and instruction fusion verification) and security mechanisms (such as baseline drift detection and failure rollback). These embodiments together constitute a comprehensive demonstration of the technical solution of this invention; please refer to the detailed descriptions of the subsequent embodiments for specific details.

[0084] 1. System Overall Architecture and Workflow

[0085] The system proposed in this invention is designed as a key new stage in modern software continuous integration and continuous deployment (CI / CD) pipelines, specifically responsible for the automated compilation and release of database changes. It is typically deployed after code merging and before the application is deployed to the target environment.

[0086] The core modules of the system are composed (see appendix). Figure 1 System overall architecture):

[0087] Late Binding Version Management Module (Source Manager): As the system's input front end, it is responsible for managing all changed "source code". It consists of two main parts:

[0088] Changes Repository: Responsible for storing independent, atomic change intentions submitted by developers in the form of structured files (such as YAML) within a version control system (such as Git).

[0089] Manifests Repository: Also located within the version control system, it stores "manifest" files that define the binding relationships between specific application versions and a set of change intentions.

[0090] Data-Aware Change Compiler: This is the core processing engine of this invention, responsible for transforming a set of discrete change intentions into a concrete, executable physical plan. It consists of three sub-modules:

[0091] The compiler frontend is responsible for parsing the version manifest, aggregation intents, loading the database baseline state, generating the target state, and performing conflict detection and semantic analysis in the process. Its output is a platform-independent intermediate representation (IR).

[0092] The compiler middleware / optimizer is responsible for performing multiple optimizations on the instruction recursion (IR), including context-based operator selection, instruction fusion, and resource parameterization.

[0093] The compiler backend is responsible for "lowering" the optimized IR into a final physical execution plan containing explicit steps.

[0094] Policy Engine: The set of rules upon which the storage optimizer bases its decisions. These rules formalize the expert experience of senior DBAs, such as "For tables with more than 1 million rows, an online DDL tool must be used." This module may optionally include a rule intelligent generation module that uses a large language model (LLM) to assist DBAs in generating structured rules from natural language.

[0095] Context Collector: As an independent probe or agent, it is responsible for real-time interaction with the target database and its monitoring system, providing the compiler with the static context (such as table size, index structure) and dynamic context (such as real-time QPS, master-slave replication latency) required for decision-making. To achieve accurate decision-making across data engines, the collector of this invention has a defined minimum complete information set, and its specific collection content dynamically adapts according to the target database type, for example, but not limited to:

[0096] For MySQL: Query information_schema and performance_schema to obtain table metadata and performance metrics; analyze replication topology (such as GTID mode, binlog_format) and monitor latency; determine the support matrix for INSTANT / INPLACE DDL algorithms based on the version; and detect long transactions and metadata locks in real time.

[0097] For PostgreSQL: Check the availability of features such as CREATE INDEX CONCURRENTLY; monitor activity and replication latency through system views such as pg_stat_activity; and assess the potential risks of configurations such as hot_standby_feedback to change operations.

[0098] End-to-end workflow (see appendix) Figure 9 (End-to-end sequence diagram of a complete change release)

[0099] Development phase: Developers create one or more atomic change intent files (such as add-user-email.yml) describing business requirements in the feature branch and commit them to the version control system.

[0100] Release planning phase: The release manager or team leader creates a version manifest file (e.g., release-v2.5.yml) that references all the atomic change intent files that need to be included in this release.

[0101] CI / CD Triggering Phase: When a CI / CD pipeline (such as Jenkins, GitLab CI) is triggered to deploy version 2.5, it calls the system of this invention via API, passing the version number v2.5 and the target environment (such as production) as parameters.

[0102] Compilation stage:

[0103] Once the system receives an API call, the compiler immediately begins working.

[0104] The front end finds the corresponding list based on the version number, aggregates all change intentions, obtains the baseline schema from the target database (through the context collector), and finally generates the intermediate representation of differences (IR).

[0105] The optimizer loads applicable rules from the policy rule base and obtains detailed dynamic and static context data again through the context collector, performing three rounds of optimization on the IR.

[0106] The backend translates the optimized IR into the final physical execution plan in JSON format.

[0107] Execution and Feedback Phase:

[0108] The compiler returns the generated physical execution plan to the CI / CD pipeline.

[0109] In a CI / CD pipeline, the executor is responsible for executing each step in the plan (such as execute_command). During execution, the executor continuously feeds back key metrics (such as execution progress and replication latency) to the monitoring and logging system for real-time monitoring, pausing or triggering rollback, and post-event auditing.

[0110] Through the above architecture and process, this invention transforms the database change process, which originally required a lot of manual intervention and was full of risks, into a highly automated, predictable, observable and secure compilation and release process.

[0111] 2. Core Module 1: Late Binding Version Management Module

[0112] One of the core innovations of this invention is the proposal of a "late-binding version management module," which aims to solve the problems of parallel development conflicts and inflexible releases caused by traditional linear migration scripts. This model decouples the "creation" and "release" phases of changes, enabling non-linear, combinatorial management of change intentions.

[0113] 2.1 Representation and Storage of Atomic Modification Intent

[0114] Representation: In this invention, a database change is no longer represented as a procedural SQL script, but is defined as a structured, declarative "Atomic Change Intent." This intent is described using a standard format (such as YAML or JSON), ensuring good machine readability and scalability.

[0115] Core Fields: Each atomic change intent object must contain at least the following fields: * id: A globally unique identifier (such as a UUID) to ensure that each intent can be accurately referenced; * author: The creator of the change, used for traceability; * description: A natural language description of the business purpose of the change; * type: The type of change, such as SCHEMA (schema change) or DATA (data migration); * intent: A structured object or array of objects used to precisely describe the core operation of the change. For example, an intent to add a column can be represented as [{ "op": "ADD_COLUMN", "table": "users", "column_def": {...}}]. This declarative description defines "what to do," rather than "how to do it."

[0116] Storage method: Each atomic change intention is stored as a separate file in a dedicated directory (such as changes / ) within the version control system (such as Git). This "change as code" management approach ensures that each minimal change has its own independent lifecycle, review history, and version record.

[0117] 2.2 Structure and Function of the Application Version List

[0118] Structure: The Application Version Manifest is a configuration file defined using a structured format (such as YAML). Its core data structure is a mapping that associates a unique "application version number" with the file paths of one or more "atomic change intentions".

[0119] Purpose: The release manifest is a concrete implementation of the "late binding" concept. It acts as a release blueprint, explicitly stating "what specific changes are included in a given application version." Release managers flexibly organize the content of each release by editing this manifest—that is, adding or removing references to atomic change intent files. For example, a release-v2.5.yml (release version 2.5) manifest might contain references to two intents: changes / add-user-email.yml (adding an email field to the user table) and changes / add-user-index.yml (adding an index to the user table).

[0120] 2.3 Decoupling mechanism for change creation and release

[0121] Decoupling Core: The management model of this invention, through the separation of intent and list mentioned above, successfully decouples the activities of the development phase, namely "change creation," from the activities of the operation and maintenance phase, namely "change release."

[0122] Working Mechanism: Developers can independently create atomic change intentions within their respective development branches, without needing to worry about when or in what order these changes are released. Once the change intention files are merged into the main branch, they enter a "change pool." Only during the release planning phase does the release manager "select" and "bind" the necessary changes for the release from this pool by editing the version list.

[0123] Beneficial effects: This decoupling mechanism brings significant engineering advantages:

[0124] Eliminate merge conflicts: Since developers no longer edit the same linear version number file or script sequence, merge conflicts caused by script version number or sequence dependencies are avoided.

[0125] Increased release flexibility: Different types of releases can be easily organized. For example, creating a Hotfix release with only a single urgent fix intent, or a master release with multiple new feature intents, is simply a matter of editing the manifest file.

[0126] Enhanced parallel development capabilities: Multiple functional teams can develop in parallel and submit their respective intents to the "change pool" without blocking each other. The system will automatically handle the aggregation and dependencies of all intents during the compilation phase at release time.

[0127] 3. Core Module Two: Data-Aware Change Compiler

[0128] The data-aware variant compiler is the core technology engine of this invention. Its responsibility is to transform a set of discrete, high-level business intents provided by the late-binding version control module into a specific, secure, and highly optimized physical execution plan. This compiler productizes and automates expert knowledge by simulating the decision-making process of a seasoned database administrator.

[0129] 3.1 Compiler Front End: From Intent to Intermediate Representation (IR)

[0130] The compiler front-end is responsible for parsing, validating, and standardizing the input, providing a clean and consistent input for subsequent optimization steps. Its workflow (see Appendix) Figure 3 The front-end difference and conflict detection process is as follows:

[0131] 1) Aggregation: After receiving an application version number, the compiler first parses the corresponding version manifest file, obtains the file paths of all atomic change intents referenced in it, and loads the contents of these files.

[0132] 2) Baseline / Target Graph Generation: This step aims to transform the current and desired states of the database into an accurate, structured in-memory model, laying the foundation for subsequent automated analysis.

[0133] Baseline Graph Construction: The system connects to the target database via a context collector and performs a deep scan of its metadata (such as information_schema). It not only reads basic information such as tables, columns, and indexes, but more importantly, it parses the inherent relationships between them, such as primary and foreign key constraints, columns covered by indexes, and view dependencies. This information is constructed into a baseline graph in memory. In this graph, database objects (such as tables, columns, indexes, and constraints) are treated as nodes, while their dependencies or permission relationships (such as foreign key references or columns belonging to a particular table) are treated as edges. This graph is a complete and normalized snapshot of the current database physical schema.

[0134] Target Graph Generation: The system first creates a complete copy of the baseline graph in memory. Then, it iterates through all aggregated atomic change intentions, translating each intention into a series of transformations applied to the graph copy. For example, an `ADD_COLUMN` intention adds a column node to the graph and creates an edge pointing to its associated table node; while a `DROP_TABLE` intention removes the corresponding table node and all its associated column nodes and edges. After all intentions are applied, the resulting graph is the target graph, which precisely describes the expected state of the database after the changes.

[0135] 3) Conflict Detection & Semantic Analysis: During the target graph generation process, the front end performs in-depth conflict detection and semantic analysis to ensure that all combinations of change intentions are logically consistent and executable. This invention categorizes conflicts into two main types:

[0136] Hard conflicts refer to combinations of changes that are logically or physically incompatible, leading to an invalid or inconsistent final database state. Upon detecting a hard conflict, the compilation process immediately halts and reports detailed error information to the user. The core detection logic is as follows: when a change is intended to be applied to the "target graph" in memory, the system checks whether the preconditions for the operation are met and whether the result of the operation will contradict the existing state in the graph.

[0137] Soft conflicts (or warnings) refer to combinations of changes that are logically feasible but may not perform as expected in practice, pose potential risks, or have better implementation methods. These conflicts typically do not halt compilation but instead generate warnings to alert the publisher or suggest manual verification. For example, adding a non-nullable column with no default value to a table containing massive amounts of data might cause write failures during the change process.

[0138] The following table details some of the key hard collision rules detected by this invention:

[0139] Change operation A Change operation B Conflict Types Conflict Description Example DROP_TABLE ADD_COLUMN (applies to the same table) Lifecycle conflict Unable to add a column to a table that has been deleted. meaning Figure 1 Delete the users table. Figure 2 Add an age field to the users table. DROP_TABLE ADD_INDEX (applies to the same table) Lifecycle conflict Unable to add an index to a table that has been deleted. meaning Figure 1 Delete the products table. Figure 2 Add an index to the sku field of the products table. DROP_COLUMN ADD_INDEX (index contains this column) Dependency conflict Unable to create an index for a column that has been deleted. meaning Figure 1 Delete the email field from the users table. Figure 2 Create a unique index on the email field of the users table. ADD_COLUMN ADD_COLUMN (same name) Namespace conflict A table cannot have two columns with the same name. meaning Figure 1 Add a `status` field (varchar) to the `users` table. Figure 2 Add a status field (int) to the users table. MODIFY_COLUMN MODIFY_COLUMN (Same column, conflicting attributes) Definition Conflict There are contradictory definitions for modifying the same field. meaning Figure 1 Change the data type of the users.name field to Text. Figure 2 Change the data type of the users.name field to VARCHAR(100). RENAME_TABLE ADD_COLUMN (applies to the old table name) Lifecycle conflict When a table is renamed, the original table name no longer exists, and it cannot be operated on. meaning Figure 1 Rename the users table to accounts. Figure 2 Add a last_login field to the users table. ADD_TABLE ADD_TABLE (same name) Namespace conflict You cannot create two tables with the same name. meaning Figure 1 Create the `logs` table. Figure 2 Also create the logs table (even if the definition is different).

[0140] This conflict detection mechanism ensures that all aggregated change intentions have formed a logically correct and consistent target state before entering the optimization phase.

[0141] Intermediate Representation (IR) Generation: By deeply comparing the structural differences between the baseline and target maps, the front end generates a platform-independent, structured list of differences, known as the Intermediate Representation (IR). The IR is a precise description of "which logical changes need to be performed," such as [{"op": "ADD_COLUMN", ...}, {"op": "ADD_INDEX", ...}], which will serve as input to the optimizer.

[0142] 3.2 Compiler Mid-Levels (Optimizer): Context Awareness and Multi-Pass Optimization

[0143] The compiler optimizer is the core embodiment of the "data-aware" capability of this invention. It receives the IR generated by the front end and processes and enriches it through a series of optimization passes (see Appendix). Figure 4 (Rule evaluation and optimization process).

[0144] Context Collection: Before optimization begins, the optimizer instructs the context collector to obtain two types of key information needed for decision-making from the target database and the associated monitoring system:

[0145] Static context: Relatively stable or slowly changing data, such as database engine type and version, exact number of rows in a table, physical storage size, index definitions, partition information, etc.

[0146] Dynamic context: Real-time changing data, such as the current database QPS (queries per second), number of active connections, concurrency of write operations, presence of long-running locks, and replication lag.

[0147] Multi-Pass Optimization: The optimizer holds a set of configurable optimization rules (derived from a policy rule base) and processes the IR (Inverse Relationship).

[0148] First pass: Instruction Selection: This is the most critical optimization step. The optimizer employs a deterministic decision-making algorithm to select the most suitable physical execution operator for each logical change in IR. This process can be formally described as a decision function D(F), where F is a set of facts obtained from the context collector containing multi-dimensional information (such as table row count, peak and off-peak periods, replication latency, engine version, current lock wait, instance resource load, cloud product limitations, etc.). The decision-making process follows these stages:

[0149] Unavailable scenario exclusion filtering: First, the system applies a series of "exclusion rules" to exclude all technically infeasible options from the candidate operator list (such as native, instant, inplace, pt-osc, gh-ost, etc.) based on the hard constraints of the environment described in the fact set F. For example, if the target environment is detected as a cloud database (RDS) that does not support installing triggers, then pt-online-schema-change is excluded; if binlog_format is not ROW, then gh-ost is excluded.

[0150] Priority and Threshold Decisions: Among the filtered available operators, the system selects based on a preset, configurable priority and fallback order (e.g., instant > inplace > online schema change tool). The specific switching decision is determined by a threshold rule based on fact F. For example, "If table.rows < 1,000,000, select the inplace algorithm; otherwise, fall back to pt-online-schema-change." These thresholds are configurable and can even be learned and adaptively adjusted using historical data for smarter decision-making.

[0151] The second pass: Instruction Fusion: This optimization pass aims to merge multiple independent change operations targeting the same database object into a single, more efficient physical operation, significantly reducing overall execution time and database load. This process is not simply instruction concatenation, but follows a rigorous, rule-based analysis and transformation workflow:

[0152] Group by Target: The optimizer first iterates through the entire IR list and groups all change operations according to the target object they affect (e.g., the db.users table).

[0153] Compatibility & Safety Analysis: Within each group, the system performs compatibility checks based on the "merging rules" defined in the policy rule base. These rules specify which types of operations can be safely merged. For example:

[0154] Combineable operations: For the same table, multiple operations such as ADD_COLUMN, DROP_COLUMN, and ADD_INDEX are usually combined.

[0155] Fusion Barriers: Some operations are destructive or exclusive and cannot be merged with other operations, thus forming a "barrier". For example, operations such as RENAME_TABLE (rename a table) or ALTER_TABLE_ENGINE (modify the storage engine) must be executed independently.

[0156] Operator Consistency Check: A set of compatible change operations can only be merged if they were assigned the same physical execution operator (such as pt-online-schema-change) during the "Operator Selection" phase.

[0157] IR Transformation: For change operation groups that pass all the above checks, the optimizer removes them from the IR and replaces them with a completely new, single FUSED_ALTER node. The details field of this node aggregates the ALTER clauses of all the original operations. For example, an intent to add a column and an intent to add an index will ultimately generate a single pt-online-schema-change command containing two clauses: ADD COLUMN ... and ADD INDEX ..., thus optimizing two separate, expensive online table rebuild processes into one.

[0158] The third pass: Resource Parameterization: After determining the physical execution operators, this optimization pass is responsible for "tailoring" a set of safe and efficient runtime parameters for them. This process is not a simple estimation, but is driven by parameterization rules explicitly defined in the policy rule base:

[0159] Expression-based dynamic computation: The optimizer searches for parameterized rules that match the current operation and context. At the heart of these rules are executable value expressions or decision matrices. These expressions take dynamically acquired context (such as `replication.lag_seconds`, `qps.current`) as input variables to dynamically generate parameter values. For example, a rule could define the `max-load` parameter as: `IF(qps.current > 5000, 'Threads_running=20', 'Threads_running=80')`, thus automatically reducing the invasiveness of change tools when the database is under high load.

[0160] Graceful Degradation & Conservative Defaults: This invention incorporates a crucial safety fallback mechanism. If, during parameterization, the context collector is unable to provide real-time dynamic data for any reason (e.g., monitoring system latency), the system will not fail. Instead, it will automatically apply a set of predefined, highly conservative safety defaults (e.g., extremely small chunk-size and extremely low max-load threshold). This ensures that even with incomplete information, changes are always executed in the safest possible state, avoiding the introduction of new risks due to reliance on external monitoring.

[0161] 3.3 Compiler Backend: From IR to Physical Execution Plan

[0162] The compiler backend is responsible for transforming the fully optimized IR into a final output that can be directly executed by the machine.

[0163] Step Generation: The backend traverses the optimized IR, translating each IR node into one or more specific execution steps. Each step is a structured object containing a "quadruple" (see appendix). Figure 5 The physical execution plan data structure includes: pre_check_sql: a pre-execution check script used to verify whether the preconditions for execution are met (such as whether the table exists or whether the columns have not yet been added).

[0164] execute_command: The core execution command, which may be a piece of SQL or a shell command (such as calling pt-online-schema-change).

[0165] post_check_sql: A post-execution check script used to verify whether the operation was completed successfully.

[0166] rollback_command: A rollback command used to undo the changes made in this step.

[0167] Dependency Analysis & Topological Sort: Database operations have a strict logical order; executing them in an arbitrary order is dangerous. Therefore, one of the core tasks of the compiler backend is to automatically parse and correctly sort all execution steps.

[0168] Dependency Graph Construction (DAG): The backend first treats each generated execution step (such as "Create Table A", "Add Column B to Table A", "Create Index C for Column B") as a node. Then, it analyzes the inherent dependencies between these nodes and constructs a Directed Acyclic Graph (DAG) based on this. For example, "Create Index C for Column B" depends on "Add Column B to Table A", while "Add Column B to Table A" depends on "Create Table A". These dependencies are represented in the graph as directed edges from the dependent item to the dependent item (Create Table A -> Add Column B -> Create Index C).

[0169] Linearization via Topological Sort: After constructing the DAG, the backend performs a topological sorting algorithm on the graph. This algorithm outputs a linear sequence of nodes, ensuring that for every directed edge U -> V in the graph, node U appears before node V in the sequence. This guarantees that all operations are executed only after their dependencies are satisfied. This final ordered sequence is the final order of the steps array (Section 5.4) in the physical execution plan, guaranteeing the logical correctness and executability of the plan.

[0170] Output: Finally, the backend outputs a JSON file containing all sorted, detailed execution steps; this is the Physical Execution Plan. This plan is self-contained, deterministic, and can be consumed by any CI / CD pipeline or automated executor.

[0171] 3.4 Failure Handling and Rollback Mechanism

[0172] When designing the execution and rollback of the physical execution plan, this invention follows the core principle of "safety first, human decision-making priority" to deal with abnormal situations in complex production environments.

[0173] 1. Fail-Safe Halt

[0174] The physical execution plan executor is designed to immediately abort the entire execution process if it encounters a failure at any stage of any step (pre_check_sql, execute_command, post_check_sql). The system will not continue executing subsequent steps, thereby avoiding further unpredictable modifications to the database under abnormal conditions and ensuring fault isolation.

[0175] 2. Human-in-the-Loop Rollback

[0176] This invention recognizes that fully automated rollback may introduce greater risks in complex database changes. Therefore, the system adopts a "human-machine collaborative" rollback strategy:

[0177] [Optional] Generate a suggested rollback plan: The system will not automatically execute a rollback. Instead, based on the "successful steps list," it will extract the rollback_command corresponding to each step in reverse order to generate a clear "suggested rollback plan" that can be reviewed by the operator. The rollback command can be predefined by the developer when submitting the atomic change intent, or the compiler backend can automatically generate the reverse operation instruction based on the semantic logic of the forward operation (e.g., mapping ADD COLUMN to DROPCOLUMN).

[0178] Empowering human decision-making: After receiving a failure report and a suggested rollback plan, database administrators (DBAs) or operations engineers can make decisions based on the business context. They can choose to execute all rollback commands to restore the database to its initial state, or they can try to continue from the failed steps after fixing external issues.

[0179] 3. Considerations for Fused Operations during Rollback

[0180] For a single physical operation comprised of multiple logical changes (such as adding columns and indexes simultaneously using pt-online-schema-change), its atomicity and failure recovery depend primarily on the robustness of the selected execution operator (such as the pt-online-schema-change tool). The post_check_sql step of this invention can verify the eventual consistency of the merged operation. However, if partial success is detected (e.g., columns have been added but index creation failed), the system will by default treat it as a serious failure and immediately abort, forcing manual intervention for state recovery. (That is, if the tool itself has problems with multiple consecutive steps, causing partial success and partial failure, this application, as the caller, can only verify the final result; this application has no ability to correct intermediate errors caused by the tool.)

[0181] 4. Baseline Drift Detection

[0182] To address the "baseline drift" problem caused by out-of-band operations (such as other pipelines or manual modifications by DBAs, which are not captured by this system) that may alter the target database between compilation and execution, this invention introduces a pre-consistency verification mechanism:

[0183] Baseline hash generation and storage: During the compilation phase, after reading and constructing the "baseline map," the compiler frontend performs a deterministic hash calculation (such as SHA-256) on it to generate a unique "fingerprint" representing the baseline state. This hash value will be stored as metadata in the final generated physical execution plan.

[0184] Pre-execution hash verification: During the execution phase, before performing any change steps, the executor's primary task is to connect to the target database and recalculate the real-time hash value of the current database schema using the exact same algorithm as the compiler.

[0185] Decision and Abort: The executor rigorously compares the hash calculated in real-time with the baseline hash stored in the plan. If they match perfectly, it proves that the baseline has not drifted, and the plan can be executed safely. If they do not match, it proves that the database state has changed, and the original plan has become invalid. In this case, the executor will refuse to execute and immediately abort the process, while reporting a "baseline drift" error. This mechanism forces the user to re-initiate a compilation based on the new database state, thereby ensuring that all changes are made in the latest and most accurate context.

[0186] Through the above mechanism, this invention not only provides powerful automated change capabilities, but also establishes a mature and reliable security system by empowering human decision-making in critical anomaly handling stages.

[0187] Modules 1 and 2, though seemingly long and detailed, are essentially about the process of compiler principles: translating the initial intent step by step into specific executable commands. Atlas follows a similar general process. The strength of various compilers, besides basic engineering development bug quality and the number of features, primarily depends on the optimizer's algorithmic capabilities—that is, Module 3, how compilation optimization is performed.

[0188] 4. Core Module Three: Policy Rule Base and Hybrid Intelligent Generation

[0189] To drive accurate and safe decision-making in the compiler's optimizer, this invention introduces a "policy rule base" as its decision-making basis. The core function of this module is to transform the implicit "tribal knowledge" and best practices of experienced database administrators (DBAs) into explicit, structured rules that can be computed and executed by machines.

[0190] 4.1 Structure and Function of the Policy Rule Base

[0191] Function: The policy rule base is the cornerstone of this invention's "data awareness" and "automated security." It is not a simple static checker or lint tool, but rather directly participates in the dynamic generation of the physical execution plan. Every decision in the optimizer, including which execution operator to choose, whether to merge instructions, and how to set runtime parameters, is driven by one or more rules from this library.

[0192] Structure: The rule base consists of a series of structured rule objects. Each rule object typically contains two core parts and a priority field for conflict resolution.

[0193] Conditions: Defines the prerequisites for the rule to take effect. The conditions section can combine multiple "facts" for logical judgment. These "facts" originate from the dynamic and static context obtained by the context collector, such as db.engine == 'mysql', table.rows > 1000000, or replication.lag_seconds > 5. Rules can define complex logic, such as "when all (ALL) conditions are met" or "when any (ANY) condition is met".

[0194] Actions: Define the operations that the compiler should perform when a condition is met. The types of actions correspond one-to-one with compiler optimizations, for example:

[0195] CHOOSE_OPERATOR: Instructs the compiler to select a specific execution operator (such as gh-ost).

[0196] PARAMETRIZE: Instructs the compiler to set specific parameters for an operator, the value of which can even be an expression based on contextual facts (IF(condition, value_if_true, value_if_false), such as max-lag-millis = IF(time.hour > 2 && time.hour < 5, 1500, 500)).

[0197] ENFORCE_FUSION: Forces or disables instruction fusion for certain operations.

[0198] FLAG_FOR_REVIEW: Marks a change as requiring manual approval (achieving "graceful degradation").

[0199] 4.2 LLM-based intelligent rule generation module

[0200] The problem to be solved: Before the LLM era, formalizing expert experience into structured rules was a difficult problem in the field of knowledge engineering. It required experts (such as DBAs) to have the ability to write complex configurations or code, which was a high barrier to entry, inefficient, and difficult to maintain. To solve this core challenge, this invention introduces an innovative "rule intelligent generation module".

[0201] The interaction process of the rule intelligent generation module based on LLM is as follows: The DBA inputs the strategy described in natural language through the user interface. The system combines it with preset prompt words and sends it to the Large Language Model (LLM). The LLM returns a structured rule draft. Finally, the DBA reviews, fine-tunes and confirms it on the human-computer collaborative interface, completing the transformation process from expert knowledge to machine-executable rules.

[0202] Natural Language Input: DBAs do not need to learn specific rule syntax; instead, they can directly describe a security policy or best practice using natural language within a user interface. For example, input: "For production tables exceeding 500GB, gh-ost must be used, and the master-slave latency threshold can be relaxed to 1500ms during off-peak hours (2-5 AM), and must be 500ms at other times."

[0203] LLM Translation: The system sends this natural language text, along with a pre-defined system prompt that describes the target rule's JSON structure (i.e., JSON Schema), to a Large Language Model (LLM) API.

[0204] Structured draft generation: LLM understands entities, intentions, and logical relationships in natural language and "translates" them into a structured rule object (such as JSON format) draft based on the format requirements of system prompts.

[0205] Human-in-the-Loop Collaborative Review: The system presents the rule draft generated by the LLM to the DBA in a user-friendly, visual way (or directly displays the formatted code). The DBA can review its accuracy and may fine-tune it through further natural language dialogue (e.g., "To add, this rule only works for MySQL versions 8.0 and above"). The LLM will incrementally modify the generated draft based on supplementary instructions.

[0206] Activation and inclusion in the database: Once the DBA has confirmed that there are no errors, the verified rule will be officially included in the policy rule base and will immediately take effect on the subsequent compilation process.

[0207] Beneficial Effects: This "hybrid intelligence" working mode simplifies a complex knowledge engineering task into an efficient human-computer dialogue. It significantly lowers the barrier to productizing expert knowledge, making the construction, iteration, and maintenance of the strategy library extremely efficient and intuitive, thereby ensuring that the core "data-aware compilation" capability of this invention possesses strong practical feasibility and scalability.

[0208] It could even be said that without the introduction of LLM, the core optimization rules of this invention would be almost impossible to implement in engineering, and would have to rely entirely on a large number of programmers to write code for implementation.

[0209] 4.3 Rule Priority and Conflict Resolution Mechanism (See Example 5.5)

[0210] To ensure that the behavior of the rule engine is deterministic and predictable when faced with multiple rules that may apply simultaneously, this invention introduces a clear priority and conflict resolution mechanism.

[0211] 1. Explicit Priority Definition

[0212] The data structure of policy rules includes an integer field called `priority`, which is explicitly set by the user (such as a DBA). This field defines the priority of the rule (e.g., the higher the value, the higher the priority), making the priority order of rules clear and controllable, and avoiding the uncertainty caused by implicit behaviors such as relying on file order.

[0213] 2. Deterministic Assessment Process

[0214] When the compiler optimizes for a change, its rule evaluation process is as follows:

[0215] Matching: First, the system filters out all rules that match the current change context (such as table size, database version, etc.).

[0216] Grouping: Then, the matched rules are grouped according to their action type (actions.type). For example, all rules used for the "choose operator" (CHOOSE_OPERATOR) are grouped together.

[0217] Selection: Within each group, the system only considers the rule with the highest priority.

[0218] Conflict resolution:

[0219] If the highest priority rule is not unique within the group (i.e., multiple rules with the same priority exist), and their actions are mutually exclusive (e.g., simultaneously requiring the selection of gh-ost and pt-online-schema-change), the system will consider it a fatal conflict. The compilation process will immediately abort, and the user will be reported to a list of conflicting rules, forcing the user to resolve the ambiguity of the rules.

[0220] If the highest priority rule is unique, or their actions can be safely combined (e.g., by setting different parameters for the same operator), the system will apply its action.

[0221] Beneficial effect: This "failure to resolve ambiguity" design is a security feature. DBAs can easily build a layered, maintainable rule system, for example, using low-priority rules to define general policies and high-priority rules to define exceptions for specific tables or scenarios.

[0222] 5. Definition of key data structures (see Example 1 for details)

[0223] To clearly describe the information flow and processing in this invention, this section will define in detail the core data structures used by the system. These data structures all adopt industry-standard, human- and machine-readable formats (such as YAML or JSON).

[0224] It should be noted that the above data format is only a practice adopted in the current embodiment for ease of explanation and does not constitute a limitation of the present invention; any serialization or interface form that can express the data structure and its semantics in an equivalent manner can be used to replace it and achieve essentially the same technical effect, such as but not limited to: XML, Protocol Buffers, Avro, Thrift, custom DSL, structured fragments based on SQL / DDL, and structured payloads transmitted through API / message bus, etc., all of which should fall within the protection scope of the present invention.

[0225] To improve readability, this application adds line breaks in some long texts, but this does not affect the text's compliance with YAML format requirements.

[0226] 5.1 Atomic Change Intent

[0227] This data structure is used to define a minimal, independent logical change. Below is an example intent for "Adding an email field to the user table" (changes / add-user-email.yml):

[0228] id: "uuid-for-email-column" author: "developer-A" description: "Add an email field to the user table and add a unique index" type: SCHEMA intent:

[0229] - op: ADD_COLUMN

[0230] table: users

[0231] column_def:

[0232] name: email

[0233] type: varchar(255)

[0234] nullable: true

[0235] - op: ADD_INDEX

[0236] table: users

[0237] index_def:

[0238] name: idx_email

[0239] columns: [email]

[0240] unique: true.

[0241] 5.2 Application Version Manifest

[0242] This data structure is used to "bind" multiple atomic change intentions to a single application version at release time. Below is an example version manifest (manifest / fusion-test.yml) that aggregates two independent change intentions:

[0243] version: "fusion-test-v1.0" changes:

[0244] - "changes / add-user-status.yml" # Meaning Figure 1 Add a status field

[0245] - "changes / add-user-name-index.yml" # Meaning Figure 2 Add an index to the name field.

[0246] 5.3 Intermediate Representation (IR)

[0247] IR (Input Reference) is a platform-independent list of logical differences generated by the compiler frontend after comparing the baseline graph with the target graph. For example, for the fusion-test.yml manifest, the generated IR might look like this:

[0248] [{"op": "ADD_COLUMN", "table": "users",

[0249] "details": {"name": "status", "type": "tinyint(4)", ...}},

[0250] {"op": "ADD_INDEX", "table": "users",

[0251] "details": {"name": "idx_users_name", "columns": ["name"],...}}].

[0252] 5.4 Physical Execution Plan

[0253] This is the final output of the system of this invention, a set of instructions containing complete security checks that can be directly executed by an automated pipeline. Its format is JSON, and its structure is as follows:

[0254] / / plan.json{ "version": "fusion-test-v1.0","database": "appdb","baseline_hash": "sha256:a1b2c3d4e5f6...", "steps": [{ "id": "users__pt-osc__fused_0",

[0255] "pre_check_sql": [

[0256] "SELECT COUNT(1) FROM information_schema.tables

[0257] WHERE table_schema='appdb' AND table_name='users'",

[0258] "SELECT IF(COUNT(1) = 0, 1, 0) FROM information_schema.columns

[0259] WHERE table_schema='appdb' AND table_name='users' AND column_name='status'" ],

[0260] "execute_command":

[0261] "pt-online-schema-change --alter 'ADD COLUMN status tinyint(4)

[0262] NOT NULL DEFAULT 0, ADD INDEX idx_users_name (name)'

[0263] D=appdb,t=users --execute --chunk-size=20000

[0264] --max-load Threads_running=80",

[0265] "post_check_sql": [

[0266] "SELECT IF(COUNT(1) = 1, 1, 0) FROM information_schema.columns

[0267] WHERE table_schema='appdb' AND table_name='users' AND column_name='status'" ],

[0268] "rollback_command":

[0269] "pt-online-schema-change --alter 'DROP COLUMN status,

[0270] DROP INDEX idx_users_name' D=appdb,t=users --execute"}]}.

[0271] The steps array in this structure contains one or more execution steps. Each step contains a quadruple of pre_check_sql, execute_command, post_check_sql, and rollback_command, which together form a complete and highly reliable execution unit.

[0272] 5.5 Policy Rule

[0273] The rule objects in the policy rule base define the logic by which the optimizer makes decisions. Below is an example rule that specifies the use of gh-ost for large tables and dynamically sets the latency parameter:

[0274] {"rule_id": "RULE-GH-OST-LARGE-TABLE-POLICY",

[0275] "priority": 100,

[0276] "description": "General large table strategy: For tables exceeding 500GB, use gh-ost",

[0277] "conditions": { "ALL": [{ "fact": "table.size_gb", "operator": "greaterThan", "value": 500} ]},

[0278] "actions": [{ "type": "CHOOSE_OPERATOR", "operator": "gh-ost"},{

[0279] "type": "PARAMETRIZE",

[0280] "params": [{"name": "max-lag-millis","value_expression": "IF(time.hour >= 2 && time.hour < 5, 1500, 500)"}]} ]}.

[0281] 6. Example 1: A complete change release process

[0282] This embodiment will take a typical example of late binding and data-aware compilation and release for a declarative database change involving multiple operations to illustrate the complete workflow of the system of the present invention from receiving intent to generating the final plan, and will focus on demonstrating its instruction fusion and data-aware optimization capabilities.

[0283] Scenario: Suppose a release named fusion-test-v1.0 requires two separate modifications to the users table in the production database appdb: 1) Add a status field; 2) Add an index to the existing name field. The users table is a large table with over 5 million rows.

[0284] Step 1: Defining the atomic change intent and late binding

[0285] The development team created two separate atomic change intent files: changes / add-user-status.yml and changes / add-user-name-index.yml.

[0286] The release manager created the version manifest / fusion-test.yml and "bound" these two changes to the fusion-test-v1.0 version by referencing the two files mentioned above.

[0287] Step Two: Compiler Triggering and Front-End Processing

[0288] The CI / CD pipeline calls the compiler of this system and passes in the version number fusion-test-v1.0.

[0289] The compiler frontend has started working:

[0290] Intent aggregation and graph generation: The front-end parses the manifest and aggregates two intents. The baseline schema of the `users` table is obtained through a context collector, and a baseline graph is built in memory. Subsequently, two change intents are applied to a copy of the baseline graph to generate the target graph.

[0291] Conflict detection: During the generation of the target graph, the system performs semantic analysis to detect any hard conflicts (such as lifecycle conflicts and namespace conflicts). In this scenario, adding columns and adding indexes are compatible, and no conflicts were found.

[0292] IR and Baseline Hash Generation: By comparing the baseline map and the target map, the front end generates an IR containing two operations: ADD_COLUMN and ADD_INDEX. Simultaneously, the front end performs normalized hash calculation on the baseline map to generate a baseline hash fingerprint for subsequent drift detection.

[0293] Step 3: Processing in the compiler's intermediate stages (optimizer)

[0294] Context Collection: The optimizer instructs the context collector to retrieve static and dynamic information about the users table. The collector returns facts such as: table.rows = 5,000,000, db.engine = 'mysql', etc.

[0295] First pass: Operator selection:

[0296] The optimizer loads the policy rule base and performs rule matching based on the current context (table.rows > 100,000, db.engine = 'mysql'). Suppose a high-priority rule states: "For MySQL tables with more than 100,000 rows, all ADD_COLUMN and ADD_INDEX operations should use the pt-online-schema-change operator."

[0297] Based on this rule, the optimizer marks the execution operators of both operations in the IR as pt-osc.

[0298] Second pass: Command fusion:

[0299] The optimizer enters instruction fusion pass. Another rule states: "When multiple pt-osc operators operate on the same table, they should be merged into a single FUSED_ALTER operation."

[0300] The optimizer detected that both IR operations targeted the users table and used the same operators, so it merged them into a single FUSED_ALTER node. The details field of this node will contain two ALTER clauses: "ADDCOLUMN status ..." and "ADD INDEX idx_users_name ...".

[0301] Third pass: Resource parameterization:

[0302] Finally, the optimizer applies parameterized rules. For example, a rule matching the current context might be defined as: "If table.rows > 1,000,000, then set chunk-size to 20000 and max-load to Threads_running=80". Based on this rule, the optimizer appends the calculated parameters to the merged IR nodes.

[0303] Step 4: Compiler Backend Processing

[0304] The compiler backend receives the optimized IR containing a single FUSED_ALTER operation.

[0305] It translates the IR into an execution step containing a complete quadruple and performs a topological sort (in this case, only one step).

[0306] Step 5: Final Product Output

[0307] The system ultimately outputs a plan.json file as shown in Section 5.4. This plan contains a single, fully optimized and parameterized pt-online-schema-change command that performs two independent table structure changes in a single operation. The plan also provides precise pre- and post-check scripts and commands for one-click rollback, and includes the baseline hash generated in the front-end steps for the executor to perform baseline drift detection.

[0308] Through this embodiment, the present invention demonstrates how it can automatically compile dispersed, high-level business intents into a specific, secure, and far more efficient physical execution plan than manual assembly, thereby achieving secure, efficient, and automated database changes.

[0309] 7. Example 2: Intelligent Rule Generation Based on LLM

[0310] This embodiment will describe in detail how a senior DBA adds a new security policy with dynamic conditions to the production environment, in order to illustrate the specific working method of the "rule intelligent generation module" in this invention.

[0311] Scenario: Senior DBA Ms. Li wants to solidify a proven "tribal knowledge" into the system to ensure that all changes to large data tables follow the safest practices.

[0312] Step 1: DBA uses natural language input strategy

[0313] In the "Rule Intelligent Generation Module" interface of this system, Ms. Li entered her strategy requirements in a simple text box, the content of which resembled her explaining work requirements to a new colleague:

[0314] "For the MySQL 8.0 cluster in the production environment of this application, any ALTER operation on tables exceeding 500GB must not be executed directly. gh-ost must be used. Furthermore, the master-slave latency threshold (max-lag-millis) of gh-ost should be dynamically set based on whether it is a peak business period. If it is between 2-5 AM, it can be set to 1500ms; at other times, it must be strictly controlled within 500ms. In addition, such operations are absolutely prohibited after 4 PM on Fridays. This rule is very important; set its priority to 200 to ensure it covers all general rules."

[0315] Step Two: System Interaction with LLM

[0316] The system received the aforementioned natural language text.

[0317] The system combines this text with a preset "System Prompt" in the background. This prompt contains a detailed description of the target JSON rule structure (i.e., JSON Schema) to guide the LLM to produce correctly formatted and structured output.

[0318] The system sends the combined content to the API of a large language model (such as GPT-5 or other models with natural language understanding capabilities).

[0319] Step 3: LLM generates structured rule drafts

[0320] LLM understands the logic, conditions, and parameters in natural language, and generates the following JSON-formatted rule draft based on the schema provided by the system prompts, as well as technologies such as RAG, workflow composition, and related knowledge bases:

[0321] {

[0322] "rule_id": "RULE-GH-OST-LARGE-TABLE-POLICY-1704067200",

[0323] "priority": 200,

[0324] "description": "Security policy for online changes to large tables in a production MySQL 8.0 cluster",

[0325] "conditions": { "ALL": [{ "fact": "db.engine", "operator": "equal","value": "mysql"},

[0326] { "fact": "db.version", "operator": "greaterThanOrEqual", "value": "8.0"},

[0327] { "fact": "table.size_gb", "operator": "greaterThan", "value":500},

[0328] { "fact": "operation.is_schema_alter", "operator": "equal", "value": true} ]

[0329] },"actions": [{"type": "CHOOSE_OPERATOR", "operator": "gh-ost"},

[0330] {

[0331] "type": "PARAMETRIZE", "target_operator": "gh-ost","params": [{

[0332] "name": "max-lag-millis",

[0333] "value_expression": "IF(time.hour >= 2 && time.hour < 5,1500, 500)"}]},{

[0334] "type": "ENFORCE_TIME_WINDOW","allowed": [],"denied": [{ "day_of_week": "Friday", "start_hour": 16, "end_hour": 23} ]} ]}.

[0335] For GPT4 / 5 series models, generating specific JSON and performing syntax and semantic checks to ensure accuracy is a relatively common approach. How to employ techniques such as RAG is beyond the scope of this article. Similar work has been presented in previous patent applications, so it will not be repeated here.

[0336] Step 4: DBA Review and Confirmation (Human-in-the-Loop)

[0337] The system presents the JSON draft returned by the LLM to Ms. Li in a user-friendly, visual way (or directly displays the formatted JSON).

[0338] Ms. Li reviewed every detail of the rule:

[0339] The priority field was correctly set to 200, ensuring its high priority.

[0340] The conditions section accurately reflects all prerequisites such as "MySQL 8.0" and "greater than 500GB".

[0341] The actions list clearly lists three actions:

[0342] First, gh-ost was correctly selected as the execution operator.

[0343] Second, the max-lag-millis parameter is dynamically set through an expression, perfectly embodying the logic of "relaxing in the early morning and tightening at other times".

[0344] Third, the time window of "prohibition after 4 pm on Friday" was accurately set.

[0345] Ms. Li confirmed that the rules were correct and clicked the "Save and Activate" button.

[0346] Step 5: Rule Activation and Enforcement

[0347] This verified JSON rule has been officially stored in the system's "Policy Rule Base". From this moment on, any database changes processed by the compiler that meet the above conditions will automatically apply this rule, thereby selecting the gh-ost tool and dynamically calculating the safest parameters for execution based on the execution time.

[0348] Beneficial effects:

[0349] This "hybrid intelligence" working mode simplifies a process that originally required senior DBAs to manually write complex configuration files or code into a simple natural language conversation. It helps lower the barrier to productizing expert knowledge, making the construction and iteration of the strategy library more efficient, thereby ensuring the feasibility and scalability of the "data-aware compilation" of this invention. This knowledge base accumulation will also be used for subsequent RAG generation.

[0350] 8. Example 3: Context Acquisition Degradation and Security Layoffs

[0351] This embodiment aims to illustrate how the present invention ensures the security of changes through a degradation strategy when faced with anomalies in external dependencies (such as monitoring systems).

[0352] Scenario: The compiler needs to generate a plan for the ADD_COLUMN operation on a users table (2 million rows). According to the policy rules, it needs to determine the `--max-load` and `--max-lag` parameters of `pt-online-schema-change` based on real-time QPS and replication latency. However, at compile time, the company's monitoring system API malfunctions, causing the context collector to be unable to acquire dynamic data.

[0353] Workflow:

[0354] Context collection attempt: The optimizer requests the context collector to obtain the dynamic context of the users table. The collector attempts to call the monitoring API, but the request times out or returns an error.

[0355] Degradation mechanism triggered: When the collector returns the fact set to the optimizer, it explicitly marks the status of the two facts, qps.current and replication.lag_seconds, as UNAVAILABLE, and attaches a timestamp and the reason for failure.

[0356] Rule evaluation and rollback: The optimizer loads the policy rule base.

[0357] High-priority parameterized rules that rely on dynamic data (such as IF(qps.current > 5000, ...)) are automatically skipped because the fact qps.current in their conditions is unavailable.

[0358] The system matched a low-priority "fallback rule" with the condition qps.current ISUNAVAILABLE.

[0359] Applying a conservative strategy: The action of this fallback rule is:

[0360] CHOOSE_OPERATOR: Still select pt-online-schema-change.

[0361] PARAMETRIZE: Applies a predefined set of highly conservative safety parameters, such as max-load = Threads_running = 10 and max-lag-millis = 200. This set of parameters is extremely low-intrusive, and although it may increase the time required for changes, it maximizes safety.

[0362] Plan generation and auditing: The final physical execution plan can include a comment or log indicating that the plan was generated using a conservative degradation strategy in the absence of dynamic context, which facilitates manual auditing.

[0363] Beneficial effects: Through this mechanism, the present invention avoids compilation failure or the generation of risky execution plans due to external monitoring system failures, achieving a "safety fallback" and ensuring high availability and high security of the system in imperfect environments.

[0364] 9. Example 4: Operator Exclusivity Selection in Cloud Database Environment

[0365] This embodiment illustrates how the present invention automatically excludes incompatible execution operators based on the permissions and characteristics of the target environment.

[0366] Scenario: An application is deployed on a managed MySQL service (RDS) in a public cloud. For security reasons, users are not granted SUPER privileges in this environment, thus triggers cannot be created. A release includes an ADD_INDEX change to a large table named orders.

[0367] Workflow:

[0368] Context Collection: When collecting static context data, the compiler checks not only table structure but also global database permissions, version, and features. The context collector identifies whether the current user lacks TRIGGER creation privileges by querying information_schema or executing commands such as SHOW GRANTS.

[0369] Operator Exclusivity Filtering: In the "Operator Selection" optimization pass, the system first applies the "exclusivity rule".

[0370] A built-in exclusion rule was matched: "Exclude the pt-online-schema-change operator if context.user_privileges does not contain TRIGGER".

[0371] Another rule was matched: "Exclude the gh-ost operator if db.binlog_format is not equal to ROW". Assuming binlog_format is ROW in this example, gh-ost is still a candidate.

[0372] Priority decision: After excluding pt-online-schema-change, the optimizer selects from the remaining available operators (such as native, inplace, and gh-ost). Based on another policy rule (e.g., "for large tables in an RDS environment, prioritize gh-ost"), the system ultimately selects gh-ost as the execution operator for this change.

[0373] Plan generation: The final generated physical execution plan will use the gh-ost command to perform online index addition.

[0374] Beneficial effects: This invention can adapt to various constrained environments, including cloud databases, select technically feasible execution solutions, avoid execution failures caused by environment incompatibility, and improve the universality of the solution.

[0375] 10. Example 5: Boundary Conditions and Verification of Instruction Fusion

[0376] This embodiment aims to clarify the applicable boundaries of instruction fusion and how the system handles fusionable and non-fusionable operation combinations.

[0377] Scenario: A single release contains three atomic change intents targeting the products table:

[0378] add-product-sku.yml: Adds the sku field (ADD_COLUMN).

[0379] rename-products-to-items.yml: Renames the products table to items (RENAME_TABLE).

[0380] add-index-on-sku.yml: Adds an index (ADD_INDEX) to the sku field.

[0381] Workflow:

[0382] Front-end processing: The compiler front-end aggregates three intents to generate an initial IR containing three operations: ADD_COLUMN, RENAME_TABLE, and ADD_INDEX.

[0383] Instruction Fusion Analysis: In the "Instruction Fusion" optimization pass (PASS), the optimizer groups the IRs according to the target object.

[0384] The system identified RENAME_TABLE as a "Fusion Barrier". The policy rule base explicitly states that the renaming operation changes the identifier of the object, must be executed independently, and cannot be merged with any other operation on that table.

[0385] The ADD_COLUMN and ADD_INDEX operations are merging according to the rules.

[0386] IR Transformation and Ranking:

[0387] The optimizer merges ADD_COLUMN and ADD_INDEX into a single FUSED_ALTER node.

[0388] RENAME_TABLE is kept as an independent node.

[0389] When performing topological sorting, the compiler backend prioritizes the FUSED_ALTER operation over RENAME_TABLE based on dependencies (tables must exist before renaming).

[0390] Plan Generation: The final generated physical execution plan will contain two steps, instead of one or three:

[0391] Step 1: Execute a merged pt-online-schema-change command to complete the ADDCOLUMN sku and ADD INDEX idx_sku in one go.

[0392] Step 2: Execute a native RENAME TABLE products TO items command.

[0393] Post-validation: The `post_check_sql` statement in step one contains two independent SELECT statements, one to verify the existence of the `sku` column and the other to verify the existence of the `idx_sku` index. If either validation fails, the entire step will be considered a failure.

[0394] Beneficial effects: By precisely controlling the fusion boundary, this invention can maximize the efficiency of merging compatible operations and ensure the correct execution order of destructive or exclusive operations, thus guaranteeing the logical correctness of complex change combinations.

[0395] 11. Example 6: Baseline Drift Detection

[0396] This example demonstrates how a baseline hash check mechanism can prevent changes from being made to an unexpected database state.

[0397] Scene:

[0398] At compile time: The CI / CD pipeline triggered a compilation of release-v3.0. The compiler connected to the production database, read its schema, and calculated its normalized hash value as HASH_A. The generated plan.json file recorded baseline_hash: "HASH_A". This plan includes an operation to add an age field to the users table.

[0399] Intermediate state: During the hours scheduled for approval, a DBA, in an urgent troubleshooting effort, manually connected to the production database and added a temporary index, idx_temp_for_debug, to the users table. This operation was performed outside of this system.

[0400] During execution: Once approved, the CI / CD pipeline starts the executor to apply plan.json.

[0401] Workflow:

[0402] Pre-execution verification: The executor's primary task is to connect to the target database and calculate the hash value of the current schema in real time using the same normalization algorithm as the compiler. Due to the existence of `idx_temp_for_debug`, the calculated real-time hash value is `HASH_B`.

[0403] Hash comparison failed: The executor compares the live hash HASH_B with the baseline hash HASH_A stored in the plan.

[0404] Abort and report: Because HASH_A ≠ HASH_B, the executor determines that a "baseline drift" has occurred. It will refuse to execute any steps in the plan and immediately abort the process. Simultaneously, it will report a clear "baseline drift" error to the CI / CD pipeline and monitoring system, indicating the difference between the expected hash and the actual hash value.

[0405] Manual intervention: After receiving the alert, the person in charge of the release needs to reassess the current situation. They can choose to remove the temporary index and then re-execute the original plan, or (more recommended) re-initiate a compilation based on the changed database state to generate a completely new execution plan that reflects the latest state.

[0406] Beneficial effects: The baseline drift detection mechanism is a key security defense in this invention. It ensures that changes are always performed on a fully expected database state that has been analyzed at compile time, effectively avoiding the risk of data inconsistency or change failure caused by "out-of-band operations".

[0407] 12. Example 7: Execution Failure and Recommendation to Roll Back

[0408] This embodiment illustrates the system's safe abort and auxiliary recovery process when a multi-step physical execution plan fails midway through execution.

[0409] Scenario: A physical execution plan consists of two steps:

[0410] Step 1: Create a new table audit_logs.

[0411] Step 2: Add a last_modified_by field to the users table (using gh-ost).

[0412] Workflow:

[0413] Step 1 successful: The executor successfully executed pre_check, execute_command(CREATE TABLE ...), and post_check in Step 1. The audit_logs table was successfully created. The executor logged that Step 1 was successfully completed.

[0414] Step 2 failed: The executor began executing Step 2. After pre_check passed, it started the gh-ost command. During the gh-ost process, due to a sudden spike in database load, the master-slave replication latency exceeded the gh-ost --max-lag-millis threshold, and gh-ost automatically terminated and exited with a non-zero exit code.

[0415] Safe abort: The executor catches a failure of execute_command. It immediately aborts the entire execution flow without attempting any subsequent steps.

[0416] Report Generation: The executor generates a detailed failure report, including:

[0417] Failure point: The execute_command phase of step two (ID: users__gh-ost__add_last_modified_by).

[0418] Logs: gh-ost outputs detailed error logs ("Replication lag is too high...").

[0419] Successful steps: [Step 1 (ID: create_table_audit_logs)].

[0420] Generate a suggested rollback plan: Based on the list of successful steps, the system extracts the rollback_commands corresponding to these steps in reverse order, generates a "suggested rollback plan", and presents it to the operator.

[0421] Human decision-making: DBAs receive alerts and suggested rollback plans. They can assess the situation:

[0422] Option 1: Execute the recommended rollback command to restore the database to its state before the change began.

[0423] Option 2: Analyze the cause of failure (such as temporary high load), and after the problem is resolved, try to execute step 2 again from the point of failure.

[0424] Beneficial effects: The "failure-to-abstain" and "human-machine collaborative rollback" strategies of this invention not only ensure fault isolation and prevent the problem from escalating, but also greatly empower operation and maintenance personnel by providing accurate rollback suggestions, enabling them to make fault recovery decisions quickly and safely.

[0425] 13. Example 8: Normalized Serialization of Baseline Hash

[0426] This embodiment details the normalized serialization rules used to ensure the stability and determinism of the baseline hash.

[0427] Scenario: To ensure that two independent schema reads can generate completely consistent hash values as long as the logical structure is exactly the same, the schema information must be normalized.

[0428] Normalization rules: Before calculating the hash, the system converts the database schema into a JSON object in memory and applies the following normalization rules to it:

[0429] Stable sorting:

[0430] All lists of objects (such as all tables in a database) must be sorted alphabetically by object names.

[0431] All objects within a table (such as columns, indexes, and foreign keys) must also be sorted alphabetically by their names.

[0432] Standardize the format:

[0433] All identifiers (table names, column names, index names) are uniformly converted to lowercase.

[0434] Extra spaces in data type definitions are removed and standardized to a standard format (e.g., varchar(255) and VARCHAR(255) will be normalized to varchar(255)).

[0435] Unnecessary quotes are uniformly removed from the string of DEFAULT values.

[0436] Ignore unstructured metadata:

[0437] All COMMENT fields (table comments, column comments) are ignored and removed.

[0438] Statistics returned by the database engine that are unrelated to the logical structure (such as the current value of AUTO_INCREMENT, the physical size of the table, and the estimated number of rows) are ignored.

[0439] Storage engine-specific non-functional parameters (such as ROW_FORMAT) are ignored.

[0440] Example: Suppose the database returns two tables defined as follows:

[0441] Original definition A:

[0442] CREATE TABLE `Users` (

[0443] `ID` int(11) NOT NULL AUTO_INCREMENT,

[0444] `email` varchar(255) DEFAULT NULL COMMENT 'email address',

[0445] PRIMARY KEY (`ID`)) ENGINE=InnoDB COMMENT='User Table';

[0446] Original definition B (logically equivalent, but with different format and comments):

[0447] CREATE TABLE `users` (

[0448] `email` VARCHAR( 255 ) DEFAULT null,

[0449] `id` INT(11) NOT NULL,

[0450] PRIMARY KEY (`id`)) ENGINE=InnoDB;

[0451] Normalized JSON (illustrated):

[0452] {"tables": [{"name": "users",

[0453] "columns": [{"name": "email", "type": "varchar(255)", "nullable":true, "default": null},{"name": "id","type": "int(11)", "nullable": false} ]"primary_key": ["id"]}]}.

[0454] Regardless of whether the input is definition A or definition B, the above normalization process will generate the exact same JSON structure. The system then uses the SHA-256 algorithm on this JSON string to obtain a stable and reproducible baseline hash.

[0455] Beneficial effects: Through rigorous normalization, this invention ensures the reliability of baseline hashes, enabling them to accurately reflect changes in the logical structure of the database without being affected by differences in format, comments, or unstructured metadata. This is the cornerstone for the accurate operation of the baseline drift detection function.

[0456] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A late-binding and data-aware build-and-deploy system for declarative database changes, characterized in that, The system includes: The late-binding version management module is configured to parse the version list and aggregate atomic change intents from one or more independent stores. The compiler front end is configured to generate the target pattern based on the database baseline pattern, generate an intermediate representation (IR) containing difference information, and perform semantic conflict detection in the process; The context collector is configured to collect both static and dynamic contexts. The rules engine is configured to evaluate matching rules in groups based on explicit priorities, adopting only the highest priority non-exclusive actions in each group, and stopping and reporting conflicts when encountering mutually exclusive actions of the same priority. The optimizer is configured to perform at least one of the following on the IR: operator selection, instruction fusion, and resource parameterization. The compiler backend is configured to perform topological sorting of execution steps based on dependencies and output a physical execution plan containing quadruple execution units; The executor is configured to perform a consistency check on the baseline hash before execution, abort and generate a failure report and recommended rollback plan if any step fails.

2. The system according to claim 1, characterized in that, The physical execution plan includes multiple execution steps ordered by topology. Each step consists of a quadruple execution unit composed of a pre-check instruction, a core execution instruction, a post-check instruction, and a rollback instruction. After the core execution instruction is completed, the executor uses the post-check instruction to verify each item. If any item is not satisfied, it is considered a failure and the execution is terminated. The executor then uses the rollback instruction to generate a suggested rollback plan in reverse order of the successful steps.

3. The system according to claim 1, characterized in that, The baseline hash is obtained by normalizing and serializing the database schema object and calculating a one-way hash. The executor is authorized to execute only when the real-time hash matches the hash stored in the plan.

4. The system according to claim 1, characterized in that, The IR includes at least operation type, object identifier, attribute change and dependency metadata, which is used to drive optimization and topology sorting.

5. The system according to claim 1, characterized in that, The instruction fusion is limited to multiple changes on the same data table that can be safely merged in a single physical operation. It identifies and excludes operations that are exclusive or destructive from participating in the fusion, including renaming, partitioning changes, or engine migration.

6. The system according to claim 1, characterized in that, The resource parameterization calculates execution parameters based on thresholds for replication latency, query load, and table row count. In the absence of a dynamic context, it automatically matches a preset low-priority fallback rule, employing conservative parameters and low-risk operators. The preset low-priority fallback rule is as follows: When the context collector is unable to acquire dynamic data, the system will not fail. Instead, it will automatically apply a set of predefined, highly conservative security default parameters to ensure the safety of the change execution.

7. The system according to claim 1, characterized in that, The system includes a human-machine collaborative rule generation module, which receives natural language policies and generates structured rule drafts. After manual review and confirmation, the drafts are stored in the database and their version and audit information are recorded.

8. A late binding and data-aware compilation and deployment method for declarative database changes, characterized in that, include: Analyze the version list and aggregate atomic change intentions; Construct baseline and target models, and perform intermediate representation containing difference information: IR generation; Collect static and dynamic context; The rule engine performs at least one optimization based on operator selection, instruction fusion and resource parameterization, wherein the rule evaluation method in the rule engine is as follows: a grouping and priority selection mechanism is adopted, and when a mutually exclusive action of the same priority is encountered, the operation is stopped and a conflict list is output. The execution steps are topologically sorted based on dependencies, and a physical execution plan containing four-tuple execution units is generated. Before execution, a baseline hash consistency check is performed, and if any step fails, the process is aborted and a failure report and recommended rollback plan are output.

9. An electronic device, characterized in that, The electronic device includes: One or more processors; Memory, on which computer-executable instructions are stored; The processor is configured to execute the instructions to implement the method as described in claim 8.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains program code that can be invoked by a processor to execute the method as described in claim 8.