A database migration method
By generating syntax trees for multiple databases and merging them into a migration syntax tree, the problem of low migration efficiency between different databases is solved, realizing automated migration across databases and reducing costs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DIGITAL GUANGDONG NETWORK CONSTR CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240589A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to database migration technology, and more particularly to a database migration method. Background Technology
[0002] As businesses grow, the amount of data generated by applications increases, and the functions required by the database become more and more numerous. The source database may not be able to support a large number of business operations, or the functions of the source database may not be able to meet the needs of business development. This necessitates migrating the data from the source database to the target database to support business growth, which is known as database migration.
[0003] Most database vendors provide migration tools for migrating mainstream foreign databases and open-source databases to their own databases. The main principle is based on the characteristics of the SQL (Structured Query Language) syntax of mainstream foreign databases, and uses JDBC (Java Database Connectivity) to migrate from different types of source databases to the target database.
[0004] Because different databases have their own unique syntax, and there are many different types of database migrations, relying solely on migration tools provided by a single database vendor is inefficient and costly when migrating between different databases, especially when the target database has multiple types. Summary of the Invention
[0005] This invention provides a database migration method to improve database migration efficiency and reduce migration costs.
[0006] In a first aspect, the present invention provides a database migration method, comprising: For each of the multiple databases, retrieve the data table from the database; Generate the syntax tree for the data table; The syntax trees of the same data tables in the multiple databases are merged to obtain the migration syntax tree of the same data table; Obtain the data tables to be migrated from the source database and generate the source table creation statements for the data tables to be migrated; Based on the migration syntax tree, the source table creation statement is converted into the target table creation statement of the target database; The target table is created in the target database using the target table creation statement. Import the data from the data table to be migrated into the target table.
[0007] Optionally, generating a syntax tree for the data table includes: The statement that creates the data table; The table creation statement is parsed to determine the syntax elements of the table creation statement and the syntax for connecting the syntax elements; A syntax tree for the data table is constructed based on the syntax elements and the syntax that connects the syntax elements.
[0008] Optionally, the syntax trees of the same data tables in the multiple databases are merged to obtain the migration syntax tree of the same data table, including: Merge identical syntax branches in syntax trees of the same data tables from multiple databases, retain different syntax branches, and identify the database type from which the syntax branches originate, to obtain a migration syntax tree for the same data table.
[0009] Optionally, identical syntax branches from syntax trees of the same data tables from multiple databases are merged, while different syntax branches are retained. The database type from which the syntax branches originate is identified, resulting in a migration syntax tree for the same data table, including: For a syntax tree of the same data table from multiple databases, start from the root node of the syntax tree and select the syntax branch as the first target branch; Determine whether the first target branch is the same in each syntax tree; If so, merge the first target branch; If not, then retain different first target branches and identify the database type from which the first target branch originates; Determine if all syntax branches have been traversed; If so, the merged syntax tree will be used as the migration syntax tree for the same data table; If not, select the next syntax branch as the first target branch and return to the step of determining whether the first target branches in each syntax tree are the same.
[0010] Optionally, the process of merging the syntax trees of identical data tables from the multiple databases also includes: Different types of databases are coded, and each type of database has a unique database code; When merging syntax trees of the same data tables, the database encoding of the database from which the syntax tree originated is preserved.
[0011] Optionally, after merging the syntax trees of the same data tables from the multiple databases to obtain the same data table's migration syntax tree, the method further includes: For each syntax branch in the migration syntax tree, generate the table rollback clause corresponding to the syntax branch to obtain the table rollback statement.
[0012] Optionally, converting the source table creation statement into the target table creation statement for the target database based on the migration syntax tree includes: Determine the source syntax tree of the source table creation statement, and the target transition syntax tree corresponding to the source table creation statement; The source syntax tree retains the same syntax branches as the target migration syntax tree, and the different syntax branches are replaced with the syntax branches in the target database in the target migration syntax tree to obtain the target syntax tree; The target table creation statement for the target database is generated based on the target syntax tree.
[0013] Optionally, the source syntax tree retains the same syntax branches as the target migration syntax tree, and replaces the different syntax branches with the syntax branches from the target database in the target migration syntax tree to obtain the target syntax tree, including: For the source syntax tree and the target transfer syntax tree, starting from the root node of the syntax tree, a syntax branch is selected as the second target branch; Determine whether the second target branches in two syntax trees are the same; If so, then retain the second target branch in the source syntax tree; If not, then select a second target branch from the target database from the target migration syntax tree to replace the second target branch in the source syntax tree; Determine if all syntax branches have been traversed; If so, the syntax tree after the replacement is used as the target syntax tree; If not, select the next syntax branch as the second target branch and return to the step of determining whether the second target branches in the two syntax trees are the same.
[0014] Optionally, the target table is created in the target database using the target table creation statement, including: The clauses in the target table creation statement are executed sequentially to create the table. When the clause is executed successfully, the table rollback clause of the syntax branch corresponding to the clause is deleted; When the clause fails to execute, the table rollback clause of the syntax branch corresponding to the clause is executed, and the clause is re-executed until the clause is executed successfully, or the clause is skipped after being executed a preset number of times and the next clause is executed.
[0015] Optionally, importing data from the data table to be migrated into the target table includes: The system generates data import statements and data rollback statements for the data table to be migrated. The data import statements include multiple import clauses, and the data rollback statements include multiple data rollback clauses corresponding to the multiple import clauses. The import clauses are executed sequentially in the order described above, and the data corresponding to the import clauses in the data table to be migrated are sequentially imported into the target table. When the import clause is executed successfully, the corresponding data rollback clause is deleted; If the import clause fails to execute, the data rollback clause corresponding to the import clause is executed, and the import clause is re-executed until the import clause is executed successfully, or the import clause is skipped after being executed a preset number of times, and the next import clause is executed.
[0016] Secondly, the present invention also provides a database migration apparatus, comprising: The table retrieval module is used to retrieve a data table from each of the multiple databases. A syntax tree generation module is used to generate a syntax tree for the data table; The syntax tree merging module is used to merge the syntax trees of the same data tables in the multiple databases to obtain the migration syntax tree of the same data table; The table creation statement generation module is used to obtain the data tables to be migrated from the source database and generate the source table creation statements for the data tables to be migrated. The table creation statement conversion module is used to convert the source table creation statement into the target table creation statement of the target database based on the migration syntax tree. The table creation module is used to create a target table in the target database using the target table creation statement; The data import module is used to import data from the data table to be migrated into the target table.
[0017] Thirdly, the present invention also provides an electronic device, comprising: One or more processors; Storage device for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the database migration method as described in the first aspect of the present invention.
[0018] Thirdly, the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the database migration method as described in the first aspect of the present invention.
[0019] The database migration method provided by this invention obtains data tables for multiple databases and generates their respective syntax trees. The syntax trees for identical tables are then merged into a unified migration syntax tree, enabling automatic translation of table creation statements between different databases. The migration syntax tree is used to perform structural parsing and target syntax rewriting of the source table creation statements, generating target table creation statements adapted to the target database. These target table creation statements are then used to create the target table in the target database, and the source data is imported into the target table. Ultimately, this achieves universal migration capabilities across multiple databases, avoiding dependence on specific migration tools. The unified migration syntax tree improves the accuracy and efficiency of table creation statement conversion, supports automated migration across heterogeneous databases, significantly improves database migration efficiency, and reduces migration costs.
[0020] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description
[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0022] Figure 1 A flowchart of a database migration method provided by the present invention; Figure 2 A schematic diagram of the structure of a database migration device provided by the present invention; Figure 3 This is a schematic diagram of the structure of an electronic device provided by the present invention.
[0023] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0024] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0025] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0026] Figure 1 The flowchart illustrates a database migration method provided by this invention. This embodiment supports migration between multiple databases without requiring the development of dedicated migration tools for each target database. The method can be executed by the database migration device provided by this invention, which can be implemented in software and / or hardware, and is typically configured in an electronic device, such as... Figure 1 As shown, this database migration method includes the following steps: S101. For each of the multiple databases, retrieve the data table from the database.
[0027] In this embodiment of the invention, "multiple databases" refers to at least two database systems that differ in SQL syntax, data type definitions, constraint expression methods, or DDL statement structures, such as MySQL, PostgreSQL, Oracle, DM, Kingbase, and openGauss. A data table refers to a structured data object in a database organized in the form of a relational table, possessing explicit metadata information such as table name, field name, field type, primary / foreign key constraints, and index definitions. The purpose of this step is to provide raw structural samples for subsequent syntax modeling, ensuring that the constructed syntax tree covers all DDL semantic features of the target database type.
[0028] In an alternative embodiment, database instances can be connected via a database driver to execute standard metadata query statements (such as the 'show table' command) to retrieve all data tables in the database.
[0029] S102, Generate the syntax tree for the data table.
[0030] In this embodiment of the invention, the syntax tree refers to an Abstract Syntax Tree (AST) that represents the syntax of a table creation statement in a tree structure. Its root node corresponds to the overall structure of the table creation statement, internal nodes correspond to syntactic elements in the statement (such as the `CREATE TABLE` keyword, table name, field definition list, constraint clauses, index clauses, etc.), and leaf nodes correspond to specific identifiers, literals, or operators. This syntax tree does not depend on a specific database execution engine; it only reflects the structured semantic hierarchy of SQL statements. The purpose of this step is to transform the unstructured table creation statement into a comparable, mergeable, and mappable intermediate representation, providing a foundation for subsequent cross-database syntax conversion.
[0031] In one optional embodiment, lexical and syntactic analysis is performed on the table creation statement to identify keywords, identifiers, delimiters, and nested bracket relationships. An AST (Abstract Syntax Tree) and its parent-child relationships are then constructed based on preset SQL syntax rules. In another optional embodiment, a combination of regular expression matching and recursive descent parsing can be used to customize the parsing logic for the DDL extended syntax specific to different databases, generating a more compatible syntax tree.
[0032] For example, in a specific embodiment of the present invention, the process of generating the syntax tree of the data table is as follows: S1021, Table creation statement for generating data tables.
[0033] A table creation statement is a SQL Data Definition Language (DDL) statement used to define the structure of data tables in a database. Its standard syntax typically begins with the `CREATE TABLE` keyword, followed by the table name, a list of field definitions (including field names, data types, constraints, etc.), and optional table-level constraints. This statement fully expresses the logical structure of the data table, reflecting the data semantics, storage format, and integrity constraints of each field. In this embodiment, the table creation statement serves as the original input text for syntax tree construction, providing a unified and standardized language carrier for subsequent lexical and syntactic parsing.
[0034] In an alternative embodiment, this can be achieved by sending `SHOW CREATE TABLE` to the database.<table_name> The command retrieves the table creation statement for the data table.
[0035] S1022. Parse the table creation statement to determine the syntax elements of the table creation statement and the syntax elements of the connection syntax elements.
[0036] In this context, grammatical elements refer to basic language units in a table creation statement that have independent grammatical functions or semantic implications. These include, but are not limited to, keywords (such as `CREATE`, `TABLE`, `INT`, `VARCHAR`, `PRIMARY KEY`), identifiers (such as table names, field names), literals (such as string constants, numeric constants), and symbols (such as commas, parentheses, and equal signs). The syntax connecting these grammatical elements refers to the structured dependency relationships formed between them according to SQL syntax rules. This includes, but is not limited to, hierarchical relationships (such as field definitions belonging to table definitions), sequential grouping relationships (such as field names immediately following data types), nested inclusion relationships (such as constraint definitions nested within field definitions), and semantic limitation relationships (such as `NOT NULL` limiting the field's value range). In this embodiment, the parsing process does not depend on the runtime environment of a specific database. Instead, it performs static analysis of the table creation statement based on a preset SQL syntax specification. The output is a set of terms and their inter-terminal grammatical connections, forming the structured intermediate representation required for syntax tree construction.
[0037] In an optional embodiment, a recursive descent parser based on context-free grammar can be used to identify the grammatical components in the table creation statement word by word and construct grammatical connections. Alternatively, lexical analysis can be performed using regular expressions combined with a finite state machine, and then the grammatical structure can be deduced by driving the LR(1) parsing table, and the table creation statement can be parsed. After parsing, the parsed grammatical elements and their grammatical connections are mapped to tree nodes and parent-child edge relationships. With `CREATETABLE` as the root, a subtree structure is generated from top to bottom according to the grammatical rules. Alternatively, the parsing results can be organized into an intermediate graph structure, and then the graph can be topologically sorted and pruned according to the grammatical priority and associativity rules to generate a syntax tree that satisfies the left associativity and nesting depth constraints.
[0038] S1023. A syntax tree for constructing data tables based on syntax elements and connecting syntax elements.
[0039] A syntax tree is an abstract representation of syntax organized in a tree structure. Its nodes correspond to grammatical elements in a table creation statement, edges correspond to grammatical connections between words, and grammatical branches formed by edges and connecting grammatical elements correspond to clauses in the table creation statement. The root node is typically the abstract node for the `CREATE TABLE` operation, and its child nodes expand sequentially into table name nodes, field definition list nodes, constraint definition nodes, etc.; field definition nodes are further expanded into field name child nodes, data type child nodes, NOT NULL constraint child nodes, etc. In this embodiment, this syntax tree serves as the basic data structure for subsequent syntax tree merging and table creation statement transformation. Its nodes can be uniquely identified and support semantic comparison and branch replacement operations across database types.
[0040] Furthermore, while generating the syntax tree, database type identifier and version identifier fields can be attached to each syntax node, so that the syntax trees generated by the same table creation statement in different database contexts are distinguishable and traceable.
[0041] S103. Merge the syntax trees of the same data tables in multiple databases to obtain the migration syntax tree of the same data tables.
[0042] In this embodiment of the invention, the same data table refers to a data table with the same table name, consistent field set semantics (i.e., field name and field type mapping relationships can be aligned), and logically equivalent primary keys and constraints. The migration syntax tree refers to a unified syntax representation formed by merging the syntax trees of multiple database tables with the same name. It retains both common syntax branches (such as general field definition formats) and differing syntax branches (such as MySQL's `AUTO_INCREMENT` and PostgreSQL's `SERIAL`), and marks the source database type on each differing branch. The purpose of this step is to construct an intermediate syntax model adaptable to multiple target databases, allowing the table creation logic of the same table to be expanded separately in different database systems, thereby supporting subsequent automatic statement-level conversion.
[0043] In an optional embodiment, the merging method may be as follows: using the field definition subtree as the basic comparison unit, perform structural equivalence judgment on the subtrees at corresponding positions in multiple syntax trees. If the structures are completely consistent, they are merged into a single branch. If there are differences in syntax elements (such as different ways of writing auto-increment identifiers), each version of the subtree is kept as a sibling node under the same parent node, and a database type label is attached to each subtree.
[0044] For example, in a specific embodiment of the present invention, identical syntax branches in the syntax trees of the same data tables from multiple databases are merged, different syntax branches are retained, and the database type from which the syntax branches originate is identified, resulting in a migration syntax tree of the same data table.
[0045] In a syntax tree, a syntax branch is a substructure path consisting of several consecutive syntactic units, originating from a certain node. Its starting node is the parent node, and it terminates at a leaf node or the root node of the subtree. In the SQL table creation statement syntax tree, typical syntax branches include column definition branches, primary key constraint branches, foreign key constraint branches, index definition branches, and storage engine / tablespace declaration branches. This syntax branch carries the specific syntactic expression used by a particular database for the same logical DDL semantics.
[0046] The same syntactic branch refers to a syntactic subtree path that is identical in terms of structural form, node type sequence, and terminal lexical unit. For example, both MySQL and PostgreSQL use `INT PRIMARY KEY` to define integer primary key columns, thus forming the same syntactic branch at the column definition level. This characteristic serves as the criterion for the merge operation in this step, ensuring that only branches with completely identical semantics and structure are merged, avoiding syntactic mismatches caused by superficial similarity but semantic discrepancies.
[0047] Different syntactic branches refer to syntactic subtree paths that differ in at least one node type, number of child nodes, terminal lexical units, or connection relationships. For example, in Oracle, auto-incrementing primary keys are implemented using `NUMBER GENERATED BYDEFAULT AS IDENTITY`, while in MySQL the corresponding syntax is `INT AUTO_INCREMENT PRIMARY KEY`. Although both express the same semantics (auto-incrementing primary key), their syntactic structures and keywords are completely different, constituting different syntactic branches. This feature is explicitly preserved rather than ignored or replaced to maintain the integrity of the syntactic features of each database.
[0048] The database type identifier for the source of a syntax branch refers to attaching a unique, identifiable database type marker to the root node or metadata field of the different retained syntax branches. This marker can be the database vendor name (such as "MySQL", "Oracle", "DM"), standardized abbreviations (such as "MYS", "ORA", "DM8"), or a preset encoded value (such as "01", "02", "03"). This identifier is used in the subsequent table creation statement conversion stage to accurately locate and select the appropriate syntax branch based on the target database type, thereby ensuring that the generated target table creation statement conforms to the syntax specifications and execution capabilities of the target database.
[0049] This invention merges identical syntax branches into a unified path while retaining all differentiated syntax branches and labeling them with their source database types. This allows the migration syntax tree to possess both cross-database commonality and fully encompass the unique syntax of each database. Based on this, when subsequent source table creation statements are converted using this migration syntax tree, the system can dynamically select the appropriate syntax branch from the retained sets of syntax branches according to the target database type. This ensures that the generated target table creation statement can be correctly parsed and executed by the target database without sacrificing syntax compatibility.
[0050] In a specific embodiment of the present invention, the above merging process includes the following steps: 1. For syntax trees of the same data tables from multiple databases, start from the root node of the syntax tree and select the syntax branch as the first target branch.
[0051] In this embodiment, for syntax trees of identical data tables from multiple databases, a syntax branch is selected as the first target branch, starting from the root node of the syntax tree. The syntax branch serves as the basic unit of operation for syntax tree comparison and merging; its selection order determines the traversal path of the merging process, ensuring that all syntax levels are covered by the system.
[0052] 2. Determine whether the first target branch is the same in each syntax tree.
[0053] In this embodiment of the invention, it is determined whether the first target branches in each syntax tree are the same. Sameness means that the two syntax branches are completely identical in three dimensions: structural form, node type sequence, and semantic relationships between child nodes. Specifically, the syntax node names at corresponding positions, the connections between nodes, the number of child nodes, and the recursive nesting structure are all consistent. This determination does not depend on the specific content of the node values (such as column names or data type strings), but focuses on the topological equivalence of the syntax skeleton. In this embodiment, this determination provides a deterministic basis for subsequent merging or retention decisions and is a prerequisite for ensuring the semantic integrity and distinguishability of the transfer syntax tree.
[0054] 3. If so, merge the first target branch.
[0055] If the first target branches in different syntax trees are the same, then the first target branches are merged. Merging refers to integrating syntax branches with the same structure from multiple syntax trees into a single syntax branch node, and attaching a database type set identifier to this node to indicate which database types have a consistent implementation of the syntax structure. This set identifier can be a list of strings, bitmap encoding, or a combination of database type enumeration values. In this embodiment, this operation compresses redundant syntax tree structures, improves syntax matching efficiency, and preserves common semantics across databases, providing a unified syntax anchor for subsequent table creation statement conversions.
[0056] 4. If not, retain different first target branches and identify the database type from which the first target branch originates.
[0057] If the first target branches in different syntax trees are different, then all different first target branches are retained. It should be noted that there may be some identical first target branches and others different first target branches. In this case, the identical first target branches are merged, and only the different first target branches are retained, with the database type from which the first target branch originates identified. Identifying the database type from which the first target branch originates involves attaching a metadata tag to each retained differentiated syntax branch node. This tag explicitly records the original database type to which the branch belongs, such as "MySQL", "PostgreSQL", "DM", "Renmin University Kingbase", or "openGauss". This tag can be implemented as a string field, attribute key-value pairs, or node extended attributes. In this embodiment, this identification enables the migration syntax tree to have explicit difference tracing capabilities, ensuring that the syntax path adapted to the target database can be accurately selected during the subsequent generation of target table creation statements, avoiding execution failures caused by syntax misuse.
[0058] 5. Determine if all syntax branches have been traversed.
[0059] The process determines whether all non-leaf nodes and their downward subtrees in all participating syntax trees have been accessed and processed level by level. The traversal strategy can be depth-first or breadth-first, as long as each syntax path is selected at least once as the first target branch. In this embodiment, this determination controls the termination condition of the merging process, ensuring that the migrated syntax trees cover all possible syntaxes without structural omissions.
[0060] 6. If so, the merged syntax tree will be used as the migration syntax tree for the same data table.
[0061] If all syntax branches have been traversed, the merged syntax tree is used as the migration syntax tree for the same data table. The merged syntax tree refers to the composite syntax tree formed after the aforementioned steps, containing both common branch merging results and individual branch structures. Its nodes contain both common syntax structures shared by multiple databases and differentiated syntax structures distinguished by database type. This syntax tree no longer corresponds to a specific implementation of any single database, but rather serves as a semantic intermediate representation built for multi-database migration scenarios. In this embodiment, this syntax tree constitutes the core knowledge carrier for subsequent table creation statement transformations, supporting the accurate mapping from source syntax to target syntax.
[0062] 7. If not, select the next syntax branch as the first target branch and return to the step of determining whether the first target branches in each syntax tree are the same.
[0063] If not all syntax branches have been traversed, the next syntax branch is selected as the first target branch, and the process returns to determine whether the first target branches in each syntax tree are the same. The next syntax branch refers to a syntax subtree that has not yet been processed, determined according to a preset traversal order (e.g., sorted by node depth priority, by syntactic semantic importance, or by node creation timestamp). This selection process does not change the original structure of the syntax tree, but only updates the current processing focus. In this embodiment, this loop mechanism ensures that the merging process is deterministic, repeatable, and comprehensive, freeing the construction process of the transfer syntax tree from dependence on human experience and providing a basis for engineering implementation.
[0064] In some embodiments of the present invention, during the process of merging syntax trees of the same data tables in multiple databases, different types of databases can be encoded, each type of database having a unique database code, and when merging syntax trees of the same data tables, the database code of the database from which the syntax tree originated is retained.
[0065] Database encoding refers to a string or numeric code used to identify the type of database. In this field, it is typically used to distinguish the syntactic features and semantic constraints of different Database Management Systems (DBMS). This database encoding does not participate in the construction of the syntax tree structure; it is only appended as meta-information to the syntax tree nodes for identifying the original adaptation environment of syntactic branches during subsequent migration. In this embodiment, the database encoding is assigned to the root node of the initially constructed syntax tree for each database and propagates downwards to all child nodes. For example, the encoding for MySQL is "DB01", PostgreSQL is "DB02", Oracle is "DB03", DM is "DB04", and Kingbase is "DB05". Each encoding is globally unique within the system and cannot be reused. This encoding mechanism allows for rapid differentiation of the source of the same syntactic structure when it appears in different databases, avoiding mismatches caused by similar syntactic appearances but semantic differences.
[0066] Preserving the database encoding of the source database of the syntax tree means that during the syntax tree merging operation, the database encoding already attached to the syntax branches is not deleted, overwritten, or normalized, but rather inherited as an inherent attribute of that syntax branch. When multiple database syntax trees contain the same syntax branch (e.g., all containing the `PRIMARY KEY` definition clause) and are merged into a shared branch, multiple database encodings will be attached to this shared branch, forming an encoding set. When branches differ (e.g., MySQL supports `AUTO_INCREMENT` while Oracle uses `SEQUENCE`), each independent branch retains its original database encoding. This database encoding is used as the basis for selecting the target database syntax path in the subsequent table creation statement conversion stage: the system finds its corresponding encoding based on the target database type, and then locates all syntax branches carrying this encoding in the migration syntax tree, ensuring that the generated target table creation statements strictly conform to the syntax specifications and execution capabilities of the target database.
[0067] This embodiment achieves explicit marking and persistent binding of syntax branch origins by assigning a database code to each database and fully preserving this code information during the syntax tree merging process. Based on this, the migration engine can accurately select the appropriate syntax path according to the target database type during the table creation statement conversion stage, avoiding cross-database syntax mixing. Furthermore, this encoding mechanism provides a structured metadata foundation for subsequent extensions of advanced functions such as multi-version compatibility analysis, syntax compatibility scoring, and abnormal branch attribution diagnosis, thereby improving the accuracy, controllability, and maintainability of the entire database migration process.
[0068] In some embodiments of the present invention, after merging the syntax trees of the same data tables from multiple databases to obtain the migration syntax tree of the same data table, the method may further include: For each syntax branch in the migration syntax tree, generate the corresponding table rollback clause to obtain the table rollback statement.
[0069] Syntax branches refer to sub-structural units in the migration syntax tree that are divided based on a specific syntax function and have independent semantics and executable capabilities. Examples include "primary key constraint definition branch", "foreign key constraint definition branch", "column data type declaration branch", or "default value setting branch". In the table creation statement, each branch corresponds to a clause that can be executed or undone independently.
[0070] A table rollback clause is an SQL statement fragment that is semantically reversed and functionally symmetrical to a given syntax branch. It is used to undo the table creation operation represented by that syntax branch in the target database, ensuring statement-level reversibility of the table creation process. In this embodiment, the generation of the table rollback clause is based on syntax branches. According to the table creation semantics represented by that branch, a preset semantic mapping rule is matched to generate the corresponding target database-compatible reverse operation statement. For example, when a syntax branch represents "ADD PRIMARY KEY (id)", its corresponding table rollback clause is "DROP CONSTRAINT".<constraint_name> The constraint name can be dynamically derived based on the parsing result of the source table creation statement or generated according to the naming convention. When a certain syntax branch represents "ADD COLUMN name VARCHAR(50) NOT NULL", its corresponding table rollback clause is "DROP COLUMN name". In an optional implementation, the predefined rollback template library can be matched based on the node type and attribute label of the syntax branch. The placeholders in the template are replaced with the context information such as the table name, column name, and constraint name actually involved in the current branch, so as to obtain the table rollback clause that can be executed in the target database. Furthermore, it is also possible to determine whether the syntax branch has a pre-dependency based on its hierarchical position in the migration syntax tree and the parent-child dependency relationship. If it does (such as a column being referenced by the primary key), the pre-cleanup logic is inserted synchronously when generating the rollback clause to ensure that the rollback operation can be executed independently and safely.
[0071] This application pre-generates semantically matching and database-compatible table rollback clauses for each grammar branch in the migration syntax tree and binds and stores them with the corresponding branches. This enables the subsequent table creation execution phase to respond to failure events at the clause granularity. Based on this, when the target database fails to execute a table creation clause, the system can immediately call its associated table rollback clause for partial undoing, avoiding the interruption or full rollback of the entire table creation process due to a single point of failure. This significantly improves the robustness, debuggability, and operational efficiency of the table creation process.
[0072] S104. Obtain the data table to be migrated from the source database and generate the source table creation statement for the data table to be migrated.
[0073] In this embodiment of the invention, the source database to be migrated refers to a specific database instance that actually participates in this migration task, and its type belongs to one of the aforementioned "multiple databases". The source table creation statement refers to the standard DDL statement used in the database to create the data table to be migrated, which contains a complete table structure definition. The purpose of this step is to determine the specific input object of this migration and formalize its structure into a standard input that can be processed by the migration syntax tree.
[0074] In an alternative embodiment, a database driver can be used to connect to the source database, execute standard metadata query statements (such as the 'show table' command), retrieve all data tables in the database, select the data tables to be migrated, and send a 'SHOW CREATE TABLE' command to the source database.<table_name> The command retrieves the table creation statement for the data table.
[0075] S105. Based on the migration syntax tree, convert the source table creation statement into the target table creation statement of the target database.
[0076] In this embodiment of the invention, the target database refers to the receiving database system specified in the migration task, and its type also belongs to one of the aforementioned database types. The target table creation statement refers to a DDL statement that is functionally equivalent to the source table creation statement but whose syntax conforms to the target database specification. The purpose of this step is to utilize the commonalities and differences in syntax modeled in the migration syntax tree to complete the structural semantic mapping and syntax rewriting from the source to the target, ensuring that the table creation logic can be correctly executed in the target environment.
[0077] In one alternative embodiment, the source table creation statement can first be parsed into a source syntax tree, then the syntax branch paths in the migration syntax tree that match the target database type can be traversed, the parts of the source syntax tree that are consistent with the path can be retained, and the inconsistent parts can be replaced with the syntax nodes in the corresponding branches of the target database. Finally, the replaced syntax tree can be serialized into the target table creation statement.
[0078] Specifically, the above conversion process includes the following sub-steps: S1051. Determine the source syntax tree of the source table creation statement and the target transition syntax tree corresponding to the source table creation statement.
[0079] The source syntax tree (RSB) is an abstract syntax tree constructed by performing lexical and syntactic analysis on the table creation statements obtained from the source database to be migrated. Its construction process can be referred to in the aforementioned embodiments, and will not be repeated here. The target migration syntax tree (TRB) is a unified syntax tree generated in the aforementioned embodiments, covering multiple databases and targeting the same data tables. Internally, it has merged identical syntax branches and retained differentiated syntax branches between different databases, with each differentiated branch identifying the source database type. In this embodiment, the target migration syntax tree is retrieved from a pre-constructed migration syntax tree set by matching the data table name and field semantic features to which the source table creation statement belongs, ensuring that it corresponds to the source table creation statement at the logical table structure level.
[0080] S1052. Retain the grammar branches in the source syntax tree that are the same as those in the target migration syntax tree, and replace the different grammar branches with the grammar branches in the target database in the target migration syntax tree to obtain the target syntax tree.
[0081] In this embodiment of the invention, identical syntax branches refer to subtree paths in the source syntax tree and the target migration syntax tree that have the same node type, the same child node structure, the same semantic constraints, and are located at the same syntax level. These branches reflect common SQL syntax features across multiple databases, such as `PRIMARY KEY` definitions, `NOT NULL` constraints, field names, and declarations of basic data types (such as `INT` and `VARCHAR`). Their syntax is consistent or highly compatible across different databases and can be reused in the target database without adjustment. Different syntax branches refer to subtree paths that exist in the source syntax tree but have no structural match at the corresponding position in the target migration syntax tree, or have nodes but exhibit database-specific differences in substructure / attribute values. These branches embody database-specific syntax, such as `ENGINE=InnoDB` and `AUTO_INCREMENT` in MySQL, `GENERATED ALWAYS AS IDENTITY` in Oracle, and `SERIAL` or tablespace specification clauses in PostgreSQL. These branches need to be replaced.
[0082] In an optional embodiment, all syntax branches marked with the target database type identifier in the target migration syntax tree can be traversed, and branches that are semantically equivalent to the branch to be replaced in the source syntax tree (e.g., both are used to express the primary key auto-increment mechanism) can be selected, and their syntax structure can be mapped to the location of the source branch.
[0083] In a specific embodiment of the present invention, step S1052 may include the following sub-steps: 1. For the source syntax tree and the target transfer syntax tree, starting from the root node of the syntax tree, select the syntax branch as the second target branch.
[0084] In this embodiment of the invention, for the source syntax tree and the target migration syntax tree, a syntax branch is selected as the second target branch, starting from the root node of the syntax tree. The syntax branch serves as the basic unit of granularity for syntax tree comparison and replacement operations, ensuring that the mapping relationship between the semantic integrity of each component of the table creation statement and the compatibility with the target database can be identified and processed item by item.
[0085] 2. Determine whether the second target branches in the two syntax trees are the same.
[0086] In this embodiment of the invention, it is determined whether the second target branches in the two syntax trees are identical. Identity means that the two syntax branches are consistent in three aspects: syntax structure, node type sequence, and key syntax predicate values. The syntax structure reflects the parent-child and sibling relationship topology between nodes; the node type sequence refers to the set of syntax categories of nodes traversed on the path from the root node to each leaf node; and the key syntax predicate values refer to syntax parameter values that have a substantial impact on database behavior, such as `InnoDB` in `ENGINE=InnoDB`, `utf8mb4` in `CHARACTER SET utf8mb4`, and `CASCADE` in `ON DELETECASCADE`. In this embodiment, this determination is used to distinguish between directly reusable parts of the source database syntax and parts that must be adapted to the target database, serving as a logical premise for subsequent retention or replacement decisions.
[0087] 3. If so, retain the second target branch in the source syntax tree.
[0088] If the second target branch is the same in both syntax trees, the second target branch in the source syntax tree is retained. This retention can be done by directly referencing the memory node pointer of the corresponding branch in the source syntax tree, without performing a copy.
[0089] 4. If not, select the second target branch from the target database from the target migration syntax tree and replace the second target branch in the source syntax tree.
[0090] If the second target branches in the two syntax trees are different, the second target branch from the target database is selected from the target migration syntax tree to replace the second target branch in the source syntax tree. The second target branch from the target database refers to a syntax subtree in the target migration syntax tree that belongs to the target database type encoding and has the same syntax level and semantic role as the current second target branch. This branch has undergone multi-database syntax normalization and differential annotation during the migration syntax tree construction phase, and its nodes carry a database type identifier field. In this embodiment, this replacement action ensures that the final generated target syntax tree only contains syntax components natively supported by the target database, avoiding table creation failure due to syntax incompatibility. For example, when the source is PostgreSQL and the target is MySQL, the branch defining columns of type `SERIAL` will be replaced with the `INT AUTO_INCREMENT` branch supported by MySQL.
[0091] 5. Determine if all syntax branches have been traversed.
[0092] In this embodiment of the invention, it is determined whether all syntax nodes and their derived subtrees in the source syntax tree have been accessed and processed, covering all reachable paths originating from the root node. This determination is based on the node traversal order of the syntax tree (such as depth-first or breadth-first) and the marked status of the processed nodes. In this embodiment, this determination action constitutes a loop control condition, ensuring the completeness of the syntax tree transformation process and preventing the omission of any syntax components.
[0093] 6. If so, the syntax tree after the replacement is used as the target syntax tree.
[0094] If all syntax branches have been traversed, the resulting syntax tree after replacement is used as the target syntax tree. The target syntax tree is the new syntax tree formed after the aforementioned branch-by-branch comparison and replacement steps. Its overall structure maintains topological isomorphism with the source syntax tree, but the content of each syntax branch has been adapted to the semantic requirements of the target database. This syntax tree no longer contains syntax components specific to the source database, nor does it lack the necessary syntax elements for the target database. In this embodiment, the target syntax tree serves as an intermediate representation, providing a structured and executable syntactic foundation for subsequently generating table creation statements that conform to the target database specifications.
[0095] 7. If not, select the next syntax branch as the second target branch and return to the step of determining whether the second target branches in the two syntax trees are the same.
[0096] If not all syntax branches have been traversed, the next syntax branch is selected as the second target branch, and the process returns to determine whether the second target branches in the two syntax trees are the same. The next syntax branch refers to the syntax subtree immediately following the current second target branch in the preset traversal order. In this embodiment, this order can be a top-down, left-to-right depth-first traversal, or a priority traversal ordered by the logical importance of syntax clauses (e.g., processing column definitions first, then constraints, and finally storage parameters). This mechanism ensures that all syntax components are included in the comparison process, and the processing order does not affect the semantic correctness of the final target syntax tree.
[0097] This embodiment achieves a full-coverage semantic mapping of the syntactic components of the table creation statement by selecting the second target branch layer by layer from the root node, synchronously comparing the consistency of corresponding branches in the source syntax tree and the target transfer syntax tree, selecting to retain or replace based on the comparison results, and outputting the complete target syntax tree after traversal. This process inherits the integrity of the table creation intent of the source database while strictly adhering to the mandatory requirements of the target database's syntax specification. On this basis, the syntactic differences between different databases are explicitly modeled in the transfer syntax tree and are precisely located and controllably replaced during the transformation process, thereby ensuring the uniformity of the generated target table creation statement in three dimensions: syntactic validity, semantic equivalence, and execution reliability.
[0098] S1053. Generate the target table creation statement for the target database based on the target syntax tree.
[0099] After obtaining the target syntax tree, the target table creation statements for the target database are generated based on it. The target table creation statements are DDL statements that conform to the target database's SQL syntax specifications, can be parsed and executed by the database, and correctly create logically equivalent target tables. The generation process involves traversing the target syntax tree from top to bottom, sequentially serializing each syntax unit according to node type and edge relationships to form linear SQL text. During serialization, all nodes and their attributes in the target syntax tree after the above reconstruction are retained, including general syntax units and target database-specific syntax units, while redundant or incompatible nodes that have been replaced in the source syntax tree are ignored. This generation process does not introduce new syntax elements; it only converts the structural information of the target syntax tree into a string form that the target database can recognize.
[0100] This embodiment parses the source table creation statement into a source syntax tree, and uses this as a benchmark to identify and retain common syntax branches and precisely replace proprietary syntax branches in the target migration syntax tree, ultimately generating a semantically equivalent and grammatically compliant target syntax tree. Based on this, the target syntax tree is structured and serialized to output the target table creation statement adapted to the target database. This process achieves automatic semantic-level conversion of table creation statements from the source database to the target database, avoiding the non-portability issues of migration scripts caused by hard-coded syntax, and ensuring the executability and structural consistency of the conversion result in the target database, thereby supporting the stable and reliable operation of the cross-database migration process.
[0101] S106. Use the target table creation statement to create the target table in the target database.
[0102] After generating the target table creation statement, the target table is created in the target database using this statement. The target table is a newly created relational table in the target database that is structurally and functionally equivalent to the table in the source database that is being migrated. This step serves to prepare for the database structure migration, providing a container for subsequent data import. For example, the target table creation statement is submitted to the target database execution engine, awaiting the execution result. If a success status code is returned, the target table creation is confirmed to be complete. The execution engine breaks down the target table creation statement into multiple clauses (such as table definition, primary key constraints, index definitions, etc.), executing them sequentially according to their dependencies. Each successful execution advances to the next step; failures terminate the process and return an error message.
[0103] In a specific embodiment of the present invention, the process of creating the target table is as follows: S1061. Execute the clauses in the target table creation statement in the order they appear to create the table.
[0104] In this embodiment of the invention, the clauses in the target table creation statement are executed sequentially according to their order to create the table. Each clause in the target table creation statement refers to a syntactic unit constituting the target table creation statement, including but not limited to field definition clauses, primary key constraint clauses, foreign key constraint clauses, index definition clauses, comment declaration clauses, and storage parameter clauses. In this embodiment, the term specifically refers to the logical execution unit formed by decomposing the complete table creation statement, which conforms to the target database syntax specification and is derived from the target syntax tree generated by the conversion process of this application. Its order reflects the syntactic constraints of the target database on the execution dependencies of DDL statements; for example, field definitions must precede primary key constraints, primary key constraints must precede foreign key constraints, and index definitions are usually placed after constraint definitions. This order ensures the satisfaction of object dependencies during table creation and avoids syntax errors or object conflicts caused by improper execution order.
[0105] S1062. When a clause is executed successfully, delete the table rollback clause of the syntax branch corresponding to the clause.
[0106] When a clause is executed successfully, the table rollback clause of the corresponding syntax branch is deleted. The corresponding syntax branch refers to the syntax tree substructure in the migration syntax tree of this application that is semantically equivalent to the table creation clause and has a consistent positional mapping. The table rollback clause refers to the SQL statement generated in this application used to undo the execution effect of the clause. For example, the rollback clause for the corresponding field definition clause is `ALTER TABLE ... DROP COLUMN ...`, the rollback clause for the corresponding primary key constraint clause is `ALTER TABLE ... DROP PRIMARY KEY`, and the rollback clause for the corresponding index definition clause is `DROP INDEX ... IF EXISTS`. This rollback clause is statically bound to the corresponding syntax branch node during the migration syntax tree construction phase, possessing a one-to-one semantic reverse operation capability with the table creation clause. Its function is to restore the target database to a consistent state before the clause execution when table creation fails. The deletion action can be to remove the corresponding entry of the rollback clause from the rollback clause mapping table maintained in memory after the target database successfully returns a confirmation response that the clause has been executed successfully, or to mark the rollback clause as "executed and skipped" and record the execution timestamp and status of its associated clause in the log. This invention does not limit this.
[0107] This embodiment establishes an explicit binding relationship between the rollback clause and the table creation clause at the syntax tree level, and immediately unbinds them after the clause is successfully executed. This ensures that rollback resources are only retained within the risk range corresponding to the incomplete table creation action, thereby improving system resource utilization and the determinism of state management.
[0108] S1063. When a clause fails to be executed, execute the table rollback clause of the syntax branch corresponding to the clause, re-execute the clause until the clause is executed successfully, or skip the clause after repeating the execution a preset number of times and execute the next clause.
[0109] When a clause fails to execute, the table rollback clause of the corresponding syntax branch is executed, and the clause is re-executed until it succeeds. Alternatively, after repeating the clause a preset number of times, the clause is skipped, and the next clause is executed. Execution failure refers to any situation where the target database returns a non-zero execution code, an SQL exception status, or a timeout interruption, indicating that the clause has not achieved the expected table creation effect. The preset number of retry attempts is a positive integer threshold that can be set in the configuration file or runtime parameters, used to limit the maximum number of retries for a single clause and prevent infinite loops. Skipping means terminating further retries for that clause, marking it as a "skipped item," and continuing the execution flow of subsequent clauses without interrupting the overall table creation task.
[0110] This embodiment embeds the table creation failure handling mechanism into the clause execution closed loop, enabling the table creation process to have self-recovery capabilities. With the help of preset number constraints and skip strategies, it ensures the continuity of the critical path while taking into account the fault tolerance boundary of unrepairable anomalies. Thus, under the premise of uncertain heterogeneous database syntax compatibility, it maintains the overall robustness of the migration process and the friendliness of manual intervention.
[0111] This embodiment decouples table creation statements into clause units that can be executed sequentially, verified independently, and rolled back precisely. It leverages pre-bound table rollback clauses in the migration syntax tree to achieve targeted clearing of failure states and reset of reconstruction conditions, allowing the table creation process to break free from the rigid constraints of traditional full transactions. Furthermore, it incorporates a three-tiered control strategy: releasing rollback resources upon successful execution, triggering closed-loop recovery upon failure, and automatically downgrading and skipping attempts when retry limits are exceeded. This ultimately enables breakpoint-based resume capabilities for table creation tasks, significantly reducing the overall migration failure rate caused by single syntax incompatibility or momentary resource shortages, and improving the maintainability and delivery stability of cross-database migration projects.
[0112] S107. Import the data from the data table to be migrated into the target table.
[0113] After creating the target table in the target database, import the data from the table to be migrated into the target table. Data import refers to writing the actual business data contained in the table in the source database into the target table in the target database in batches, according to field mapping relationships and type compatibility rules. The purpose of this step is to complete the migration at the database content level and realize the overall transfer of data assets. For example, data can be exported from the source database to an intermediate format (such as CSV, JSON, or binary stream), and then written into the target table through the batch import interface provided by the target database (such as `COPY`, `LOAD DATAINFILE`, `INSERT ... VALUES` batch processing).
[0114] In a specific embodiment of the present invention, importing data from the data table to be migrated into the target table includes the following sub-steps: S1071. Generate data import statements and data rollback statements for the data table to be migrated. The data import statement includes multiple import clauses, and the data rollback statement includes multiple data rollback clauses corresponding to the multiple import clauses.
[0115] In this embodiment of the invention, a data import statement refers to a structured query language statement used to write data to be migrated from the source database into the corresponding table of the target database, and its syntax conforms to the SQL specification of the target database. An import clause is the smallest independently executable unit constituting the data import statement. It can be a single `INSERT INTO ... VALUES(...)` statement, a batch `INSERT` statement, or a data extraction-insertion combination statement with `WHERE` conditions after being split by the primary key / shard key. A data rollback statement is an SQL statement that is semantically equivalent to but has the opposite function to the data import statement, used to clear incomplete data that has already been written when the import fails. A data rollback clause is an independently executable rollback operation unit that corresponds one-to-one with each import clause. Its content can reflect the key identifying characteristics of the imported data, such as limiting the scope of influence through primary keys, timestamps, or hash values to achieve precise undoing.
[0116] For example, based on the syntax tree structure of the target database, the field types, constraints, and primary key information defined in the target table creation statement can be parsed. Combining this with the data volume and distribution characteristics of the data table to be migrated, the entire dataset can be divided into multiple logical data blocks, and an import clause and its corresponding rollback clause can be generated for each data block. Each import clause can contain no more than a preset number of data records, and each rollback clause can contain a `DELETE` or `TRUNCATE` conditional expression that matches the primary key set of the data records in the import clause.
[0117] S1072. Execute the import clauses in the order of the import clauses, and import the data corresponding to the import clauses in the data table to be migrated into the target table in sequence.
[0118] In this embodiment of the invention, the execution operations are carried out according to the order of the import clauses in the data import statement to ensure that the data writing process satisfies logical dependencies and business consistency constraints. Sequential import emphasizes the explicit execution order between the import clauses, with the completion status of the previous import clause serving as one of the prerequisites for starting the execution of the next import clause.
[0119] S1073. When the import clause is executed successfully, delete the data rollback clause corresponding to the import clause.
[0120] When an import clause is executed successfully, the corresponding rollback clause is deleted. Successful execution means the target database returns a completion signal, the number of affected rows is non-negative, no SQL exception is thrown, and the transaction has been committed. Deleting the corresponding rollback clause means removing the storage instance of the rollback clause from the memory cache or persistent rollback statement queue, preventing it from participating in any subsequent rollback operations. This action is irreversible and does not trigger additional database interactions.
[0121] This embodiment achieves dynamic release of rollback resources and precise tracking of execution status by immediately clearing the corresponding data rollback clause after the import clause is executed successfully. This ensures that the state space of the entire data migration process always maintains the minimum necessary size, which reduces system memory usage and avoids the risk of misoperation caused by redundant rollback statement residues.
[0122] S1074. When the import clause fails to execute, execute the data rollback clause corresponding to the import clause and re-execute the import clause until the import clause is executed successfully, or skip the import clause after repeating the execution a preset number of times and execute the next import clause.
[0123] In this embodiment of the invention, execution failure refers to any abnormal situation such as the target database returning an SQL error code, timeout interruption, connection loss, constraint conflict, or transaction rollback. Executing the data rollback clause corresponding to the import clause means calling the generated data rollback clause that strictly matches the currently failed import clause, performing its defined reverse operation in the target database to restore the target table to a consistent state before the import clause was executed. Repeating the preset number of times refers to setting a maximum retry threshold for the same import clause. This threshold can be configured as a fixed integer, such as 3 or 5 times; the retry process terminates after exceeding this threshold. Skipping the import clause means abandoning the execution of the current import clause, marking it as "skipped," and continuing to the next import clause. This behavior does not affect the normal execution flow of the remaining clauses.
[0124] This embodiment constructs a data migration execution model with statement-level fault tolerance by binding each import clause with its dedicated data rollback clause and setting a limited-retry mechanism. Based on this, by leveraging preset retry limits and skip strategies, it effectively avoids infinite loops or overall blocking caused by individual data anomalies (such as dirty data, type mismatches, or unique key conflicts), significantly improving the robustness and controllability of large-scale heterogeneous database migration tasks.
[0125] This embodiment achieves fine-grained control over the data import process by generating structured, granular data import statements and semantically equivalent data rollback statements, combined with sequential execution, success clearing, failure rollback, and limited retry mechanisms. Furthermore, each import clause and its corresponding data rollback clause constitute a verifiable, undone, and reentrant operation unit, allowing the migration process to be interrupted and resumed at any clause level without clearing the target database or restarting the entire process. This solves the technical problem of having to completely re-import data due to interruption during large-scale data import, improving the stability, maintainability, and resource utilization efficiency of database migration.
[0126] The database migration method provided by this invention obtains data tables for multiple databases and generates their respective syntax trees. The syntax trees for identical tables are then merged into a unified migration syntax tree, enabling automatic translation of table creation statements between different databases. The migration syntax tree is used to perform structural parsing and target syntax rewriting of the source table creation statements, generating target table creation statements adapted to the target database. These target table creation statements are then used to create the target table in the target database, and the source data is imported into the target table. Ultimately, this achieves universal migration capabilities across multiple databases, avoiding dependence on specific migration tools. The unified migration syntax tree improves the accuracy and efficiency of table creation statement conversion, supports automated migration across heterogeneous databases, significantly improves database migration efficiency, and reduces migration costs.
[0127] Figure 2 A schematic diagram of the structure of a database migration device provided by the present invention is shown below. Figure 2 As shown, the database migration device includes: The table acquisition module 201 is used to acquire a data table from each of the multiple databases; Syntax tree generation module 202 is used to generate a syntax tree for the data table; Syntax tree merging module 203 is used to merge the syntax trees of the same data tables in the multiple databases to obtain the migration syntax tree of the same data table; The table creation statement generation module 204 is used to obtain the data tables to be migrated from the source database to be migrated, and generate the source table creation statements for the data tables to be migrated. The table creation statement conversion module 205 is used to convert the source table creation statement into the target table creation statement of the target database based on the migration syntax tree. Table creation module 206 is used to create a target table in the target database using the target table creation statement; The data import module 207 is used to import data from the data table to be migrated into the target table.
[0128] In some embodiments of the present invention, the syntax tree generation module 202 includes: The table creation statement generation submodule is used to generate the table creation statements for the data table; The parsing submodule is used to parse the table creation statement and determine the syntax elements of the table creation statement and the syntax elements connecting the syntax elements; The syntax tree construction submodule is used to construct the syntax tree of the data table based on the syntax elements and the syntax connecting the syntax elements.
[0129] In some embodiments of the present invention, the syntax tree merging module 203 includes: The syntax tree merging submodule is used to merge identical syntax branches from syntax trees of the same data tables from multiple databases. Different syntax branches are retained, and the database type from which the syntax branches originate is identified, resulting in a migration syntax tree of the same data table.
[0130] In some embodiments of the present invention, the syntax tree merging submodule includes: The first branch determination unit is used to select a syntax branch as the first target branch, starting from the root node of the syntax tree, for a syntax tree of the same data table from multiple databases. The first decision unit is used to determine whether the first target branch in each syntax tree is the same; A branch merge unit, used to merge the first target branch if the condition is met; The first branch retention unit is used to retain different first target branches simultaneously if not, and to identify the database type from which the first target branch originates; The second judgment unit is used to determine whether all syntax branches have been traversed; The migration syntax tree determination unit is used to determine if the merged syntax tree is the same as the migration syntax tree of the data table. The first return execution unit is used to select the next syntax branch as the first target branch if not, and return to the execution step of determining whether the first target branches in each syntax tree are the same.
[0131] In some embodiments of the present invention, the database migration apparatus further includes: The encoding module is used to encode different types of databases during the process of merging the syntax trees of the same data tables in the multiple databases, with each type of database having a unique database code; The encoding preservation module is used to preserve the database encoding of the source database when merging syntax trees of the same data tables.
[0132] In some embodiments of the present invention, the database migration apparatus further includes: The table rollback statement generation module is used to generate a table rollback clause corresponding to each syntax branch in the migration syntax tree after merging the syntax trees of the same data tables in the multiple databases to obtain the same data table.
[0133] In some embodiments of the present invention, the table creation statement conversion module 205 includes: The syntax tree determination submodule is used to determine the source syntax tree of the source table creation statement and the target migration syntax tree corresponding to the source table creation statement; The branch replacement submodule is used to retain the same syntax branches in the source syntax tree as those in the target migration syntax tree, and replace the different syntax branches with the syntax branches in the target database in the target migration syntax tree to obtain the target syntax tree; The target table creation statement generation submodule is used to generate the target table creation statement for the target database based on the target syntax tree.
[0134] In some embodiments of the present invention, the branch replacement submodule includes: The second branch selection unit is used to select a syntax branch as the second target branch, starting from the root node of the syntax tree, for the source syntax tree and the target transfer syntax tree. The third decision unit is used to determine whether the second target branch in the two syntax trees is the same; The second branch retention unit is used to retain the second target branch in the source syntax tree if the condition is met. A branch replacement unit is used to, if not, select a second target branch from the target database in the target migration syntax tree to replace the second target branch in the source syntax tree; The fourth judgment unit is used to determine whether all syntax branches have been traversed; The target syntax tree determination unit is used to determine the target syntax tree if the replacement syntax tree is used. The second return execution unit is used to select the next syntax branch as the second target branch if not, and return to the execution step of determining whether the second target branches in the two syntax trees are the same.
[0135] In some embodiments of the present invention, the table creation module 206 includes: The table creation clause execution submodule is used to execute the clauses in the target table creation statement in the order of their respective clauses to create the table. The first deletion submodule is used to delete the table rollback clause of the syntax branch corresponding to the clause when the clause is executed successfully; The table rollback submodule is used to execute the table rollback clause of the syntax branch corresponding to the clause when the clause fails to be executed, and re-execute the clause until the clause is executed successfully, or skip the clause after repeating the execution a preset number of times and execute the next clause.
[0136] In some embodiments of the present invention, the data import module 207 includes: The import rollback statement generation submodule is used to generate data import statements and data rollback statements for the data table to be migrated. The data import statement includes multiple import clauses, and the data rollback statement includes multiple data rollback clauses corresponding to the multiple import clauses. The import clause execution submodule is used to execute the import clauses sequentially according to their order, and to import the data corresponding to the import clauses in the data table to be migrated into the target table in sequence; The second deletion submodule is used to delete the data rollback clause corresponding to the import clause when the import clause is executed successfully; The data rollback submodule is used to execute the data rollback clause corresponding to the import clause when the import clause fails to execute, and re-execute the import clause until the import clause is executed successfully, or to skip the import clause after repeating it a preset number of times and execute the next import clause.
[0137] The aforementioned database migration apparatus can execute the database migration method provided in the foregoing embodiments of the present invention, and has the corresponding functional modules and beneficial effects for executing the database migration method.
[0138] Figure 3 This is a schematic diagram of an electronic device provided by the present invention. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (such as helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.
[0139] like Figure 3 As shown, the electronic device includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer programs stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 can also store various programs and data required for the operation of the electronic device. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.
[0140] Multiple components in the electronic device are connected to the I / O interface 15, including: an input unit 16, such as a keyboard, mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a disk, optical disk, etc.; and a communication unit 19, such as a network card, modem, wireless transceiver, etc. The communication unit 19 allows the electronic device to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0141] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as database migration methods.
[0142] In some embodiments, the database migration method may be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and / or installed on an electronic device via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the database migration method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the database migration method by any other suitable means (e.g., by means of firmware).
[0143] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0144] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0145] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0146] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0147] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.
[0148] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.
[0149] This invention also provides a computer program product, including a computer program that, when executed by a processor, implements the database migration method provided in any embodiment of this application.
[0150] In implementing the computer program product, computer program code for performing the operations of this invention can be written in one or more programming languages or a combination thereof. Programming languages include object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0151] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.
[0152] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.
Claims
1. A database migration method, characterized in that, include: For each of the multiple databases, retrieve the data table from the database; Generate the syntax tree for the data table; The syntax trees of the same data tables in the multiple databases are merged to obtain the migration syntax tree of the same data table; Obtain the data tables to be migrated from the source database and generate the source table creation statements for the data tables to be migrated; Based on the migration syntax tree, the source table creation statement is converted into the target table creation statement of the target database; The target table is created in the target database using the target table creation statement. Import the data from the data table to be migrated into the target table.
2. The database migration method according to claim 1, characterized in that, Generating the syntax tree for the data table includes: The statement that creates the data table; The table creation statement is parsed to determine the syntax elements of the table creation statement and the syntax for connecting the syntax elements; A syntax tree for the data table is constructed based on the syntax elements and the syntax that connects the syntax elements.
3. The database migration method according to claim 1, characterized in that, Merging the syntax trees of identical data tables from the various databases yields a migration syntax tree for the same data table, including: Merge identical syntax branches in syntax trees of the same data tables from multiple databases, retain different syntax branches, and identify the database type from which the syntax branches originate, to obtain a migration syntax tree for the same data table.
4. The database migration method according to claim 3, characterized in that, Merge identical syntax branches from syntax trees of the same data table from multiple databases, retaining all different syntax branches, and identify the database type from which the syntax branches originate, resulting in a migration syntax tree for the same data table, including: For a syntax tree of the same data table from multiple databases, start from the root node of the syntax tree and select the syntax branch as the first target branch; Determine whether the first target branch is the same in each syntax tree; If so, merge the first target branch; If not, then retain different first target branches and identify the database type from which the first target branch originates; Determine if all syntax branches have been traversed; If so, the merged syntax tree will be used as the migration syntax tree for the same data table; If not, select the next syntax branch as the first target branch and return to the step of determining whether the first target branches in each syntax tree are the same.
5. The database migration method according to claim 2, characterized in that, The process of merging the syntax trees of identical data tables from the multiple databases also includes: Different types of databases are coded, and each type of database has a unique database code; When merging syntax trees of the same data tables, the database encoding of the database from which the syntax tree originated is preserved.
6. The database migration method according to any one of claims 1-5, characterized in that, After merging the syntax trees of the same data tables from the various databases to obtain the migration syntax tree for the same data table, the process further includes: For each syntax branch in the migration syntax tree, generate the table rollback clause corresponding to the syntax branch to obtain the table rollback statement.
7. The database migration method according to any one of claims 1-5, characterized in that, Based on the migration syntax tree, the source table creation statement is converted into the target table creation statement for the target database, including: Determine the source syntax tree of the source table creation statement, and the target transition syntax tree corresponding to the source table creation statement; The source syntax tree retains the same syntax branches as the target migration syntax tree, and the different syntax branches are replaced with the syntax branches in the target database in the target migration syntax tree to obtain the target syntax tree; The target table creation statement for the target database is generated based on the target syntax tree.
8. The database migration method according to claim 7, characterized in that, The source syntax tree retains the syntax branches that are identical to those in the target migration syntax tree, and replaces the different syntax branches with the syntax branches from the target database in the target migration syntax tree, thus obtaining the target syntax tree, including: For the source syntax tree and the target transfer syntax tree, starting from the root node of the syntax tree, a syntax branch is selected as the second target branch; Determine whether the second target branches in two syntax trees are the same; If so, then retain the second target branch in the source syntax tree; If not, then select a second target branch from the target database from the target migration syntax tree to replace the second target branch in the source syntax tree; Determine if all syntax branches have been traversed; If so, the syntax tree after the replacement is used as the target syntax tree; If not, select the next syntax branch as the second target branch and return to the step of determining whether the second target branches in the two syntax trees are the same.
9. The database migration method according to claim 6, characterized in that, The target table is created in the target database using the target table creation statement, including: The clauses in the target table creation statement are executed sequentially to create the table. When the clause is executed successfully, the table rollback clause of the syntax branch corresponding to the clause is deleted; When the clause fails to execute, the table rollback clause of the syntax branch corresponding to the clause is executed, and the clause is re-executed until the clause is executed successfully, or the clause is skipped after being executed a preset number of times and the next clause is executed.
10. The database migration method according to any one of claims 1-5, characterized in that, Importing data from the data table to be migrated into the target table includes: The system generates data import statements and data rollback statements for the data table to be migrated. The data import statements include multiple import clauses, and the data rollback statements include multiple data rollback clauses corresponding to the multiple import clauses. The import clauses are executed sequentially in the order described above, and the data corresponding to the import clauses in the data table to be migrated are sequentially imported into the target table. When the import clause is executed successfully, the corresponding data rollback clause is deleted; If the import clause fails to execute, the data rollback clause corresponding to the import clause is executed, and the import clause is re-executed until the import clause is executed successfully, or the import clause is skipped after being executed a preset number of times, and the next import clause is executed.