Data processing method and device, electronic equipment and storage medium
By stopping and upgrading the Meta database during the Hive upgrade process and linking the Hive MetaStore service to the new database, the data synchronization problem during the Hive upgrade was resolved, achieving data synchronization and risk reduction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DUXIAOMAN TECH (BEIJING) CO LTD
- Filing Date
- 2023-02-24
- Publication Date
- 2026-06-19
AI Technical Summary
During the Hive upgrade process, the data inconsistency between the old and new Meta databases caused data modifications made by users through different Hive MetaStore services to fail to synchronize, increasing operational risks and the possibility of data loss.
By responding to the version upgrade command, the first Hive MetaStore service is stopped, the first Meta database is upgraded to the second Meta database, and the first Hive MetaStore service is linked to the second Meta database, so that both Hive MetaStore services can manage the metadata in the second Meta database and achieve data synchronization.
After the Hive upgrade, metadata synchronization was implemented, which simplified the data synchronization process, reduced labor costs, and decreased the risk of data loss.
Smart Images

Figure CN116150126B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data processing, specifically to a data processing method, apparatus, electronic device, and storage medium. Background Technology
[0002] Hive is a data warehouse tool based on Hadoop, used for data extraction, transformation, and loading. HiveMetaStore includes the Hive MetaStore service and the Meta database. The Meta database is where Hive metadata is stored, and the Hive MetaStore service manages the metadata in the Meta database, allowing upper-layer services to build computing frameworks based on structured database and table information instead of dealing with raw file data.
[0003] However, in the relevant technologies, during the Hive upgrade process, there is a time when the new and old Hive MetaStore services run concurrently, which will lead to the problem of data inconsistency between the new and old Meta databases. Summary of the Invention
[0004] This application provides a data processing method, apparatus, electronic device, and storage medium to solve the problem of data inconsistency between old and new Meta databases.
[0005] One aspect of this application provides a data processing method, comprising: responding to a version upgrade instruction to stop a first Hive MetaStore service, wherein the first Hive MetaStore service is linked to a first Meta database for managing metadata in the first Meta database; upgrading the first Meta database to a second Meta database according to preset upgrade information; linking the first Hive MetaStore service to the second Meta database so that the first Hive MetaStore service can manage metadata in the second Meta database; and enabling the first Hive MetaStore service and the second Hive MetaStore service, wherein the second Hive MetaStore service is used to manage metadata in the second Meta database.
[0006] Another aspect of this application provides a data processing apparatus, including: a service stopping unit, configured to stop a first Hive MetaStore service in response to a version upgrade instruction, wherein the first Hive MetaStore service is linked to a first Meta database for managing metadata in the first Meta database; an upgrade unit, configured to upgrade the first Meta database to a second Meta database according to preset upgrade information; a linking unit, configured to link the first Hive MetaStore service to the second Meta database so that the first Hive MetaStore service can manage metadata in the second Meta database; and a service enabling unit, configured to enable the first Hive MetaStore service and the second Hive MetaStore service, wherein the second Hive MetaStore service is used to manage metadata in the second Meta database.
[0007] Another aspect of this application provides an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method described in any of the preceding claims by executing the executable instructions.
[0008] Another aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in any of the preceding claims.
[0009] Another aspect of this application provides a computer program product, including a computer program, characterized in that the computer program, when executed by a processor, implements the method as described in any of the above embodiments.
[0010] The data processing method, apparatus, electronic device, and storage medium provided in this application embodiment, in response to a version upgrade command, stop the first Hive MetaStore service, wherein the first Hive MetaStore service is linked to a first Meta database for managing metadata in the first Meta database; upgrade the first Meta database to a second Meta database according to preset upgrade information; link the first Hive MetaStore service to the second Meta database so that the first Hive MetaStore service can manage metadata in the second Meta database; enable the first Hive MetaStore service and the second Hive MetaStore service, the second Hive MetaStore service is used to manage metadata in the second Meta database, thereby enabling that after Hive upgrade, any modifications to metadata generated by the user during the parallel operation of the first Hive MetaStore service and the second Hive MetaStore service can be stored in the second Meta database, and when a task is in either Hive MetaStore service, the generated data results can be queried in the other Hive MetaStore service, thus achieving data synchronization. Attached Figure Description
[0011] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0012] Figure 1 This is a schematic diagram illustrating the structure of data migration using embodiments of this application;
[0013] Figure 2 A flowchart illustrating the data processing method provided in the embodiments of this application;
[0014] Figure 3 This is a schematic diagram illustrating the structure of data backup and rollback provided in an embodiment of this application.
[0015] Figure 4 This is a schematic diagram of the structure of the data processing apparatus provided in the embodiments of this application;
[0016] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0017] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0018] First, the technical terms used in this application will be explained.
[0019] Hive: Hive is a data warehouse tool based on Hadoop, used for data extraction, transformation, and loading. It is a mechanism that can store, query, and analyze large-scale data stored in Hadoop.
[0020] Hive MetaStore: The Hive MetaStore is where Hive metadata is stored. Definitions of Hive databases, tables, functions, etc., are stored in the MetaStore. Depending on the system configuration, statistics and authorization records may also be stored here. Hive or other execution engines use this data at runtime to determine how to resolve data.
[0021] Meta: Metadata, also known as intermediary data or relay data, is data about data. It mainly describes the properties of the data and is used to support functions such as indicating storage location, historical data, resource lookup, and file records.
[0022] MySQL: MySQL is a relational database management system. Relational databases store data in different tables.
[0023] Binlog: MySQL Binlog is a binary log file used to record data updates in MySQL. It is typically used in scenarios such as data synchronization.
[0024] The main steps involved in upgrading Hive in related technologies are as follows: 1. Stop the current HiveMetaStore service in the old cluster. 2. (Optional) Back up the database data where the current metadata is located. 3. Based on the version information of the old and new clusters, execute the database upgrade scripts from the open-source community one by one to upgrade the old version of the Meta database to the new version. 4. Start the Hive MetaStore service in the new cluster and test to see if it meets expectations.
[0025] However, after the upgrade of the relevant technologies, the results of tasks running on the old cluster's Hive MetaStore service cannot be synchronized to the new cluster's Hive MetaStore service. This is especially problematic in enterprise-level offline cluster use cases where the number of tasks running on the cluster can reach tens of thousands. When Hive is upgraded, users need a certain amount of time to run concurrently to verify whether the generated data meets expectations. In this situation, the old and new cluster Hive MetaStore services run in parallel, and data modifications made by users through the old cluster's HiveMetaStore service are stored in the old version of the Meta database, while data modifications made by users through the new cluster's Hive MetaStore service are stored in the new version of the Meta database. The data in the two Meta databases is not synchronized, and data modified by a user through either Hive MetaStore service cannot be queried through the other.
[0026] To achieve data synchronization, users need to manually synchronize the Hive MetaStore services in both clusters. For example, a table created in the old Hive MetaStore cluster must also be created in the new Hive MetaStore cluster, and vice versa. This operation is risky and may result in missing relevant metadata modifications.
[0027] Additionally, if the upgraded Hive MetaStore service encounters unresolved functional issues and needs to be migrated back to the old cluster, the only recourse is to roll back to the previous version of Meta data based on backups. This will result in the loss of any changes to the table structures created by users in the new cluster. Users will then need to manually reproduce the changes.
[0028] To address at least one of the aforementioned problems, embodiments of this application provide a data processing method, apparatus, electronic device, and storage medium. In response to a version upgrade command, a first Hive MetaStore service is stopped, wherein the first Hive MetaStore service is linked to a first Meta database for managing metadata in the first Meta database; the first Meta database is upgraded to a second Meta database according to preset upgrade information; the first Hive MetaStore service is linked to the second Meta database so that the first Hive MetaStore service can manage metadata in the second Meta database; the first Hive MetaStore service and the second Hive MetaStore service are then enabled, with the second Hive MetaStore service managing metadata in the second Meta database. This ensures that after a Hive upgrade, any modifications to metadata generated by the user during the concurrent operation of the first and second Hive MetaStore services can be stored in the second Meta database. When a task is running in either Hive MetaStore service, the resulting data can be queried in the other Hive MetaStore service, achieving data synchronization.
[0029] The embodiments of this application will be described in detail below with reference to the accompanying drawings. It should be noted that the order of description of the embodiments below is not intended to limit the priority of the embodiments.
[0030] Figure 1 This is a schematic diagram of the structure of a distributed database system using embodiments of this application; Figure 2 A flowchart of the distributed database rate limiting method provided in this application embodiment; please refer to... Figure 1 and Figure 2 This application provides a data processing method 100, which includes the following steps S110 to S140.
[0031] Step S110: In response to the version upgrade instruction, stop the first Hive MetaStore service, wherein the first HiveMetaStore service is linked to the first Meta database for managing the metadata in the first Meta database.
[0032] Step S120: Upgrade the first Meta database to the second Meta database according to the preset upgrade information.
[0033] Step S130: Link the first Hive MetaStore service to the second Meta database so that the first HiveMetaStore service can manage the metadata in the second Meta database.
[0034] Step S140: Enable the first Hive MetaStore service and the second Hive MetaStore service. The second Hive MetaStore service is used to manage the metadata in the first Meta database.
[0035] The version upgrade command can be a command that the user manually triggers to start the upgrade after receiving the official Hive version upgrade prompt, or it can be an upgrade command that the server automatically triggers after receiving the official Hive version upgrade prompt.
[0036] It is understood that upgrading Hive requires upgrading multiple sub-services of Hive. This application embodiment only involves upgrading the metadata within Hive. Before the upgrade, the Hive MetaStore includes a first Hive MetaStore service and a first Meta database. The first Meta database is where Hive metadata is stored. The first Hive MetaStore service is linked to the first Meta database, allowing it to manage the metadata within the first Meta database. This enables upper-layer services to no longer deal with raw file data, but instead to build a computing framework based on structured database and table information.
[0037] During the upgrade, the first Hive MetaStore service needs to be stopped first to ensure that no data changes occur during the upgrade and to prevent users' modifications to metadata from being lost during the upgrade process.
[0038] In step S120, the preset upgrade information can be an official Hive version upgrade file. This file allows for the upgrade of the Hive MetaStore service and the Meta database, enabling Hive MetaStore to add new features. Upgrading the first Meta database to the second Meta database based on the preset upgrade information may include executing relevant official upgrade scripts according to the upgrade version gap information. This upgrade may include expanding field lengths, etc., to facilitate the addition of new features.
[0039] In step S130, the code of the first Hive MetaStore service can be modified so that the first Hive MetaStore service links to the second Meta database, that is, the first Hive MetaStore service is compatible with the second Meta database.
[0040] Understandably, before the upgrade, the first Hive MetaStore service was linked to the first Meta database. In this step, the link of the first Hive MetaStore service can be changed from the first Meta database to the second Meta database, allowing the first Hive MetaStore service to manage metadata in the second Meta database. At this point, the first Hive MetaStore service will no longer be linked to the first Meta database.
[0041] In step S140, the first Hive MetaStore service, which was stopped in step S110, is resumed, and the second Hive MetaStore service is started to facilitate the verification of the upgraded functionality. The second Hive MetaStore service is used to manage metadata in the first Meta database. It can be understood that the second Hive MetaStore service provides new services for the upgrade file.
[0042] It is understandable that during the metadata upgrade process, the first Hive MetaStore service can be linked to the second Meta database, allowing both the first and second Hive MetaStore services to connect to the second Meta database. This means that the underlying metadata is shared, allowing modifications made to metadata by users through both the first and second Hive MetaStore services to be stored in the second Meta database. When a task runs on either Hive MetaStore service, the resulting data can be queried in the other Hive MetaStore service, achieving data synchronization without requiring users to manually synchronize data, simplifying manpower costs, and eliminating the risk of data loss.
[0043] In some embodiments, before enabling the first Hive MetaStore service and the second Hive MetaStore service in step S140, method 100 further includes: stopping the version verification function of the first Hive MetaStore service; and modifying the enumeration name of the first Hive MetaStore service to make the enumeration name of the first Hive MetaStore service consistent with the name of the second Hive MetaStore service.
[0044] One way to stop the version verification function of the first Hive MetaStore service is to set the hive.metastore.schema.verification.record.version parameter in the configuration information of the first Hive MetaStore service to false when modifying the code of the first Hive MetaStore service. This can prevent the first Hive MetaStore service from mistakenly modifying the version in the metadata when it starts up.
[0045] When modifying the code of the first Hive MetaStore service, you can also modify the enumeration naming of the first Hive MetaStore service to resolve compatibility issues with changes in new and old metadata attribute enumerations. During metadata upgrades, a spelling bug in the `colelction.delim` enumeration will be fixed to ensure that the upper-layer engine parses data correctly during execution.
[0046] It is understandable that the name of the same enumeration may change during the metadata upgrade. It is necessary to modify the enumeration name to ensure that its name is consistent in the first Hive MetaStore service and the second Hive MetaStore service, so as to ensure that the first Hive MetaStore service can run normally.
[0047] The above modifications to the version verification function and enumeration naming can be performed simultaneously when modifying the code in step S130.
[0048] In some embodiments, upgrading the first Meta database to the second Meta database according to preset upgrade information in step S120 may include the following steps: creating an intermediate Meta database; migrating data from the first Meta database to the intermediate Meta database; upgrading the intermediate Meta database according to preset upgrade information, and using the upgraded intermediate Meta database as the second Meta database.
[0049] It is understood that in this embodiment, the first Meta database and the second Meta database can be two independent databases, for example, they can be located on two different servers.
[0050] During the upgrade from the first Meta database to the second Meta database, an intermediate Meta database can be created first. Then, all data from the first Meta database is migrated to the intermediate Meta database, at which point the data in both databases is identical. Next, the intermediate Meta database can be upgraded according to preset information to obtain the upgraded intermediate Meta database, which is the second Meta database. In this embodiment, after obtaining the upgraded second Meta database, the first Meta database can be discarded or used as a backup database.
[0051] In other embodiments, step S120, which upgrades the first Meta database to the second Meta database according to preset upgrade information, may include upgrading the first Meta database according to the preset upgrade information and using the upgraded first Meta database as the second Meta database.
[0052] It is understood that in this embodiment, the first Meta database and the second Meta database can be located on the same server, and the second Meta database is obtained by upgrading the first Meta database, that is, upgrading the first Meta database according to preset upgrade information, and then using the upgraded first Meta database as the second Meta database.
[0053] Both of the above methods can be used to upgrade from the first Meta database to the second Meta database; the choice can be made based on the user's needs.
[0054] Figure 3 This is a schematic diagram of the data backup and rollback structure provided in the embodiments of this application; please refer to... Figure 3 The above are improvements to the method of metadata upgrade process. The following will improve the backup and rollback of metadata, which can include real-time solutions and batch processing solutions.
[0055] First, the real-time solution is explained. In some embodiments, before enabling the first HiveMetaStore service and the second Hive MetaStore service in step S140, the method further includes: expanding the field length of the backup database according to preset upgrade information. The backup database includes the first Meta database or a database used to back up the first Meta data; obtaining the data change log of the second Meta database in real time, and synchronizing the metadata related to the first Meta database in the second Meta database to the backup database based on the data change log.
[0056] After enabling the first Hive MetaStore service and the second Hive MetaStore service in step S140, method 100 may further include: stopping the second Hive MetaStore service in response to a data rollback instruction; and linking the first Hive MetaStore service to a standby database.
[0057] It is understandable that the first Meta database can be backed up during the metadata upgrade process to generate a backup database. Based on the two methods described above for upgrading from the first Meta database to the second Meta database, if the first Meta database and the second Meta database are independent, the first Meta database, which will be deprecated after the upgrade, can be used as the backup database; that is, the first Meta database is the backup database. If the second Meta database is upgraded based on the first Meta database, then before step S120, method 100 may further include: creating a backup database based on the first Meta database, whereby the backup database is used to back up the first Meta data, and the data in the backup database is the same as the data in the first Meta database.
[0058] Furthermore, because the field lengths were expanded during the upgrade of the first Meta database to add new features, meaning the field lengths of the second Meta database differ from those of the first Meta database, it's necessary to expand the field lengths of the backup database first to match the field lengths of the second Meta database in order to synchronously back up user modifications to metadata after the upgrade. This prevents issues with excessively long field lengths during synchronization. Additionally, the field length expansion can be based on the predefined field lengths specified in the upgrade information, ensuring consistency between the field lengths of the backup and second Meta databases.
[0059] In some embodiments, the data change log includes the Binlog log of the second Meta database; method 100 may further include: disabling the foreign key constraint function of the standby database. That is, before data synchronization, in addition to expanding the field length of the standby database, the foreign key constraint function foreign_key_checks of the standby database can also be turned off to prevent synchronization failure due to foreign keys in the Meta data.
[0060] In this embodiment, when implementing real-time data synchronization between the backup database and the second Meta database, the data change log of the second Meta database can be monitored. The data change log records changes to the data in the second Meta database, and it can be a Binlog log or an inspection log, etc. Then, based on the data change log, the changes to the data in the second Meta database are synchronously written to the backup database, thereby achieving data synchronization.
[0061] After obtaining the data change log, non-core tables, such as transaction-related data changes, can be filtered out, and only information on changes in metadata related to the first meta data can be retained and converted into SQL to be executed in the backup database.
[0062] It's understandable that before the upgrade, the first Hive MetaStore service had function A, and the first Meta database stored metadata related to A. After the upgrade, the second Hive MetaStore service added function B, meaning it can simultaneously implement functions A and B. The second Meta database stores metadata related to both A and B. The metadata related to the first Meta database can refer to the metadata related to the function before the upgrade, i.e., the metadata related to function A.
[0063] Because the Hive MetaStream service will restore its functionality during rollback, only function A will be available, and function B will not be available. Therefore, the metadata related to the newly added function B does not need to be backed up. It is only necessary to ensure that the data modifications of the original function A have been backed up.
[0064] Additionally, if issues are discovered during the parallel running, verification, or usage phases, a data rollback command can be used to roll back the Hive MetaStore. This data rollback command can be triggered by developers upon discovering a problem. When a data rollback command is detected, the second Hive MetaStore service can be stopped first. Then, the code of the first Hive MetaStore service can be modified to change its connection from the second Meta database to a backup database. In other words, the connection between the first Hive MetaStore service and the second Meta database can be severed, and the first Hive MetaStore service can be linked to the backup database. Figure 3 The direction of the dotted line can then be used to verify whether the first Hive MetaStore service is functioning correctly.
[0065] The above method can synchronize data changes in the second Meta database to the standby database in real time. After confirming the rollback, simply point the link of the first Hive MetaStore service to the standby database, so that the metadata of the standby database can be managed through the first Hive MetaStore service to complete the rollback.
[0066] The following describes the batch processing solution. In some embodiments, method 100 may further include: stopping the second Hive MetaStore service in response to a data rollback instruction; creating a standby database and initializing the standby database based on an upgrade snapshot of the first Meta database; expanding the field length of the standby database according to preset upgrade information; exporting the rollback data related to the first Meta database from the second Meta database and importing the rollback data into the standby database; and linking the first HiveMetaStore service to the standby database.
[0067] The methods for stopping the second Hive MetaStore service in response to a data rollback command, expanding the field length of the backup database according to preset upgrade information, and linking the first Hive MetaStore service to the backup database are the same as those in the above embodiments and will not be described again.
[0068] In this embodiment, the backup database can be created after receiving the data rollback instruction. It can be understood that during the metadata upgrade process, an upgrade snapshot of the first Meta database before the upgrade will be obtained. When creating the backup database, the upgrade snapshot can be used to initialize the backup database of the first Meta database. Then, other data in the backup database except for the configuration table can be deleted, making the backup database an empty database, so as to facilitate the backup of the data of the second Meta database to the backup database.
[0069] Next, the rollback data related to the first Meta database in the second Meta database can be exported. It can be understood that before the upgrade, the first Hive MetaStore service had function A, and the first Meta database stored metadata related to A. After the upgrade, the second Hive MetaStore service added function B, meaning it can simultaneously implement functions A and B. The second Meta database stores metadata related to both A and B. Rollback data refers to all metadata in the second Meta database related to the functions before the upgrade, i.e., all metadata related to function A.
[0070] Data rollback export can be performed by executing SQL to export all metadata related to the first Meta database in the second Meta database, and only export the tables and fields that exist in the original first Meta database.
[0071] The rollback data is then imported into the standby database to back up the data. In this embodiment, when a problem is discovered, i.e., after it is determined that a rollback is necessary, a standby database is created, and the data is fully synchronized to the standby database in a batch process.
[0072] Understandably, both real-time and batch processing solutions can back up metadata, enabling timely downgrading of current metadata to the existing old database when short-term unrecoverable problems are discovered. This adds a fallback mechanism to the entire data migration process. Furthermore, after rollback, any data modifications made after the upgrade will be synchronized to the backup database after rollback, ensuring no data loss and eliminating the need for manual data reproduction.
[0073] In some embodiments, both the real-time and batch processing solutions can have the following prerequisites before data synchronization: using MySQL as the metadata storage medium, enabling the database's Binlog, adjusting the Binlog format to row format, and preparing an upgrade script for the backup database. The upgrade script content can be SQL statements extracted from preset upgrade information to expand the field lengths of the first Meta database. Therefore, expanding the field lengths of the backup database according to the preset upgrade information in the above steps can include expanding the field lengths of the backup database according to the upgrade script for the backup database.
[0074] In summary, the method 100 provided in this embodiment has a complete Hive upgrade mechanism and multiple rollback mechanisms. It achieves smooth Hive upgrade functionality, greatly improves the Hive development experience, and reduces the risks associated with upgrades.
[0075] In some embodiments, the storage medium used by the first Meta database to store metadata includes Derby, Mssql, MySQL, Oracle, or Postgres.
[0076] It's understandable that, besides MySQL's storage medium, metadata can also be stored in the database using Derby, MSSQL, Oracle, or Postgres, depending on the specific needs. All of these storage media can enable database management. Furthermore, it's understood that the first Meta database, the second Meta database, and the backup database use the same data storage media.
[0077] Figure 4 This is a schematic diagram of the data processing apparatus provided in the embodiments of this application; please refer to... Figure 4 This application provides a data processing device 30, which includes the following units.
[0078] Service stop unit 31 is used to stop the first Hive MetaStore service in response to a version upgrade command. The first Hive MetaStore service is linked to the first Meta database for managing metadata in the first Meta database.
[0079] Upgrade unit 32 is used to upgrade the first Meta database to the second Meta database according to preset upgrade information.
[0080] Link unit 33 is used to link the first Hive MetaStore service to the second Meta database, so that the first Hive MetaStore service can manage the metadata in the second Meta database.
[0081] Service activation unit 34 is used to activate the first Hive MetaStore service and the second Hive MetaStore service. The second Hive MetaStore service is used to manage metadata in the first Meta database.
[0082] In some embodiments, before enabling the first Hive MetaStore service and the second Hive MetaStore service, the service activation unit 34 is further configured to: stop the version verification function of the first Hive MetaStore service; and modify the enumeration name of the first Hive MetaStore service so that the enumeration name of the first Hive MetaStore service is consistent with the name of the second Hive MetaStore service.
[0083] In some embodiments, when upgrading the first Meta database to the second Meta database according to preset upgrade information, the upgrade unit 32 is further configured to: create an intermediate Meta database; migrate data from the first Meta database to the intermediate Meta database; upgrade the intermediate Meta database according to the preset upgrade information, and use the upgraded intermediate Meta database as the second Meta database; or, upgrade the first Meta database according to the preset upgrade information, and use the upgraded first Meta database as the second Meta database.
[0084] In some embodiments, before enabling the first Hive MetaStore service and the second Hive MetaStore service, the service activation unit 34 is further configured to: expand the field length of the standby database according to preset upgrade information, the standby database including the first Meta database or a database used to back up the first Meta data; obtain the data change log of the second Meta database in real time, and synchronize the metadata related to the first Meta database in the second Meta database to the standby database based on the data change log; after enabling the first Hive MetaStore service and the second Hive MetaStore service, the device further includes: a rollback unit, configured to stop the second Hive MetaStore service in response to a data rollback instruction; and link the first Hive MetaStore service to the standby database.
[0085] In some embodiments, the data change log includes the Binlog log of the second Meta database; the apparatus also includes a front-end unit for stopping the foreign key constraint function of the standby database.
[0086] In some embodiments, the apparatus further includes: a rollback unit, configured to stop the second Hive MetaStore service in response to a data rollback command; create a standby database and initialize the standby database based on an upgrade snapshot of the first Meta database; expand the field length of the standby database according to preset upgrade information; export rollback data related to the first Meta database from the second Meta database and import the rollback data into the standby database; and link the first Hive MetaStore service to the standby database.
[0087] In some embodiments, the storage medium used by the first Meta database to store metadata includes Derby, Mssql, MySQL, Oracle, or Postgres.
[0088] In some embodiments, this application also provides an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the methods implemented in any of the above embodiments by executing the executable instructions.
[0089] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 5As shown, the electronic device 40 may include: a communication interface 401, a memory 402, a processor 403, and a communication bus 404. The communication interface 401, memory 402, and processor 403 communicate with each other via the communication bus 404. The communication interface 401 is used for data communication between the electronic device 40 and external devices. The memory 402 can be used to store software programs and modules, and the processor 403 runs the software programs and modules stored in the memory 402, such as the software programs for the corresponding operations in the above method embodiments.
[0090] In some embodiments, the processor 403 may invoke software programs and modules stored in the memory 402 to perform the following operations: in response to a version upgrade instruction, stop the first Hive MetaStore service, wherein the first Hive MetaStore service is linked to the first Meta database for managing metadata in the first Meta database; upgrade the first Meta database to a second Meta database according to preset upgrade information; link the first Hive MetaStore service to the second Meta database so that the first Hive MetaStore service can manage metadata in the second Meta database; enable the first Hive MetaStore service and the second Hive MetaStore service, wherein the second Hive MetaStore service is used to manage metadata in the first Meta database.
[0091] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the methods of any of the above embodiments. For brevity, further details are omitted here.
[0092] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the methods in any of the above embodiments. For brevity, further details are omitted here.
[0093] This application also provides a computer program that includes computer instructions stored in a computer-readable storage medium. The processor of an electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the electronic device to perform the corresponding processes in the methods described in this application. For brevity, these details are not elaborated here.
[0094] It should be understood that the processor in the embodiments of this application may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method embodiments can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor described above can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.
[0095] It is understood that the memory in the embodiments of this application can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memory used in the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
[0096] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0097] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the above method embodiments, and will not be repeated here.
[0098] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0099] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0100] In addition, the functional units in the embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0101] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer or a server) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, ROM, RAM, magnetic disks, or optical disks.
[0102] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A data processing method, characterized by, include: In response to the version upgrade instruction, the first Hive MetaStore service is stopped, wherein the first Hive MetaStore service is linked to the first Meta database for managing the metadata in the first Meta database; According to the preset upgrade information, the first Meta database is upgraded to the second Meta database; Link the first Hive MetaStore service to the second Meta database so that the first HiveMetaStore service can manage the metadata in the second Meta database; Enable the first Hive MetaStore service and the second Hive MetaStore service. The second Hive MetaStore service is used to manage the metadata in the second Meta database. The step of upgrading the first Meta database to the second Meta database according to preset upgrade information includes: Create an intermediate Meta database; Migrate the data in the first Meta database to the intermediate Meta database; According to the preset upgrade information, the intermediate Meta database is upgraded, and the upgraded intermediate Meta database is used as the second Meta database. or, According to the preset upgrade information, the first Meta database is upgraded, and the upgraded first Meta database is used as the second Meta database.
2. The method of claim 1, wherein, Before enabling the first Hive MetaStore service and the second Hive MetaStore service, the method further includes: Stop the version verification function of the first Hive MetaStore service; Modify the enumeration name of the first Hive MetaStore service to make the enumeration name of the first Hive MetaStore service consistent with the name of the second Hive MetaStore service.
3. The method according to claim 1, characterized in that, Before enabling the first Hive MetaStore service and the second Hive MetaStore service, the method further includes: According to the preset upgrade information, the field length of the backup database is expanded, and the backup database includes the first Meta database or a database used to back up the first Meta data; The system acquires the data change log of the second Meta database in real time, and based on the data change log, synchronizes the metadata related to the first Meta database in the second Meta database to the backup database. After enabling the first Hive MetaStore service and the second Hive MetaStore service, the method further includes: In response to the data rollback command, the second Hive MetaStore service is stopped; Link the first Hive MetaStore service to the backup database.
4. The method of claim 3, wherein, The data change log includes the Binlog log of the second Meta database; the method further includes: Stop the foreign key constraint function of the backup database.
5. The method of claim 1, wherein, The method further includes: In response to the data rollback command, the second Hive MetaStore service is stopped; Create a standby database and initialize the standby database based on the upgrade snapshot of the first Meta database; According to the preset upgrade information, the field length of the backup database is expanded; Export the rollback data related to the first Meta database from the second Meta database, and import the rollback data into the backup database; Link the first Hive MetaStore service to the backup database.
6. The method according to claim 1, characterized in that, The first Meta database uses Derby, Mssql, MySQL, Oracle, or Postgres as its storage medium for storing metadata.
7. A data processing apparatus, characterized by, include: The service stop unit is used to stop the first Hive MetaStore service in response to a version upgrade command. The first Hive MetaStore service is linked to the first Meta database for managing metadata in the first Meta database. An upgrade unit is used to upgrade the first Meta database to the second Meta database according to preset upgrade information; The linking unit is used to link the first Hive MetaStore service to the second Meta database, so that the first Hive MetaStore service can manage the metadata in the second Meta database; The service activation unit is used to activate the first Hive MetaStore service and the second Hive MetaStore service, wherein the second Hive MetaStore service is used to manage metadata in the second Meta database; The upgrade unit is further configured to: create an intermediate Meta database; migrate data from the first Meta database to the intermediate Meta database; upgrade the intermediate Meta database according to the preset upgrade information, and use the upgraded intermediate Meta database as the second Meta database; or, upgrade the first Meta database according to the preset upgrade information, and use the upgraded first Meta database as the second Meta database.
8. An electronic device, comprising: include: processor; as well as Memory for storing the executable instructions of the processor; The processor is configured to execute the method of any one of claims 1-6 by executing the executable instructions.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method according to any one of claims 1-6.