A method and apparatus for managing hive metadata

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By archiving and restoring metadata in the Hive Metastore, the problems of excessive memory consumption during Impala engine startup and excessively large or frequently invalidated metadata cache are solved, thus improving Impala's running efficiency.

CN117009293BActive Publication Date: 2026-06-19TUYOO GAMES +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TUYOO GAMES
Filing Date: 2023-08-08
Publication Date: 2026-06-19

Application Information

Patent Timeline

08 Aug 2023

Application

19 Jun 2026

Publication

CN117009293B

IPC: G06F16/11; G06F16/178; G06F16/16; G06F16/172; G06F16/242; G06F11/14; G06F11/1446

AI Tagging

Application Domain

File system administration File/folder operations

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

The Impala engine suffers from excessive memory consumption due to loading a large amount of metadata at startup, and performance degradation caused by excessively large or frequently invalidated metadata caches.

Method used

This paper provides a method for managing Hive metadata. It obtains the target partition information from the Hive Metastore, archives it to a backup storage space, loads the remaining metadata in the Impala engine, and restores it from the backup storage space when missing metadata is detected in the cache.

Benefits of technology

It reduces the memory pressure on Impala and Hive Metastore, improves Impala's execution efficiency, and avoids problems such as excessively large or frequently invalidated metadata caches.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117009293B_ABST

Patent Text Reader

Abstract

The application discloses a Hive metadata management method and device, computing equipment and a computer readable storage medium. The method comprises: directly operating metadata in a Hive Metastore, and firstly archiving part of the metadata; when synchronizing, the Impala only needs to synchronize the remaining metadata in the Hive Metastore, thereby realizing that the memory pressure is reduced and the execution efficiency of the Impala is improved without deleting original data. Correspondingly, when the Impala engine needs related metadata, the part of metadata is recovered from a backup storage space to the Hive Metastore, and then the metadata of the Impala is refreshed. The embodiment of the application avoids operating the original data, efficiently archives part of the metadata information, and greatly optimizes the operation of the Impala.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a Hive metadata management method, apparatus, computing device, and computer-readable storage medium. Background Technology

[0002] Impala is a query engine capable of querying information stored in petabyte-scale big data. It enables efficient querying in business systems. Impala interacts with the Hive Metastore to obtain database metadata and persist it in memory. For example, Impala loads all metadata into memory upon startup; if the metadata is too large, it can consume significant memory and may even cause Impala to fail to start. Furthermore, Impala's performance degrades when processing queries containing a large number of small files.

[0003] In existing technologies, Impala has a built-in method for on-demand metadata access, which alleviates the caching problem of the Impala execution module, but it cannot solve the problem of excessively large metadata caches. Meanwhile, Impala also has an automatic metadata expiration function, which automatically invalidates data that is used infrequently or is temporarily unnecessary; however, frequent metadata expiration may lead to negative performance optimization. Therefore, there is an urgent need for a metadata management method within the Impala engine to solve the above problems. Summary of the Invention

[0004] In view of this, embodiments of this application provide a Hive metadata management method, apparatus, computing device, and computer-readable storage medium to address the technical deficiencies existing in the prior art.

[0005] According to a first aspect of the embodiments of this application, a Hive metadata archiving method is provided, including:

[0006] Retrieve target partition information from Hive Metastore;

[0007] Archive the target partition information to the spare storage space;

[0008] Remove the target partition information from the Hive Metastore.

[0009] According to a second aspect of the embodiments of this application, a method for loading metadata in Impala is provided, comprising:

[0010] Within the Impala engine, load the remaining metadata in the Hive Metastore after it has been archived according to the Hive metadata archiving method.

[0011] According to a third aspect of the embodiments of this application, a method for recovering metadata in Impala is provided, comprising:

[0012] Detect Impala metadata cache; if the target metadata is not present, retrieve the recovery partition information from the backup storage space.

[0013] The restored partition information is copied to the Hive Metastore, and the Impala metadata cache is refreshed.

[0014] According to a fourth aspect of the embodiments of this application, a Hive metadata archiving apparatus is provided, comprising:

[0015] The information acquisition module is used to obtain target partition information from the Hive Metastore;

[0016] The archiving module is used to archive the target partition information to the spare storage space;

[0017] The cleanup module is used to delete the target partition information from the Hive Metastore.

[0018] According to a fifth aspect of the embodiments of this application, an apparatus for loading metadata in Impala is provided, comprising:

[0019] The loading module is used within the Impala engine to load the remaining metadata in the HiveMetastore after it has been archived according to the Hive metadata archiving method.

[0020] According to a sixth aspect of the embodiments of this application, an apparatus for recovering metadata in Impala is provided, comprising:

[0021] The detection module is used to detect whether the target metadata exists in the Impala metadata cache;

[0022] The information acquisition module is used to obtain the restoration partition information from the backup storage space when the target metadata is not present in the Impala metadata cache.

[0023] The recovery module is used to copy the recovered partition information to the Hive Metastore and refresh the Impala metadata cache.

[0024] According to a seventh aspect of the present application, a computer-readable storage medium is provided that stores computer instructions which, when executed by a processor, implement the steps of the aforementioned method.

[0025] According to an eighth aspect of the present application, a computing device is provided, including a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor executes the instructions to implement the steps of the method.

[0026] In this embodiment, based on Impala's mechanism for synchronizing metadata with the Hive Metastore, operations are performed directly on the metadata in the Hive Metastore before synchronization, quickly archiving some of the metadata. During synchronization, Impala only needs to synchronize the remaining metadata in the Hive Metastore, thus reducing the memory pressure on Impala and the Hive Metastore without deleting the original data, and improving Impala's execution efficiency. Correspondingly, when the Impala engine needs the relevant metadata, this metadata is restored from the backup storage space to the Hive Metastore, and then Impala's metadata is refreshed to complete the synchronization. Attached Figure Description

[0027] Figure 1 A structural block diagram of a computing device provided in an embodiment of this application;

[0028] Figure 2 This is a diagram illustrating the metadata loading process in the Impala engine.

[0029] Figure 3 A flowchart illustrating a Hive metadata archiving method provided in an embodiment of this application;

[0030] Figure 4 This is a schematic diagram of the structure of a spare storage space provided in an embodiment of this application;

[0031] Figure 5 This is a flowchart illustrating a method for recovering metadata in Impala, as provided in an embodiment of this application. Detailed Implementation

[0032] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to specific embodiments and the accompanying drawings. It should be understood that these descriptions are merely exemplary and not intended to limit the scope of the invention. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concept of the invention.

[0033] Many specific details are set forth in the following description to provide a full understanding of this application. However, this application can be implemented in many other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this application; therefore, this application is not limited to the specific embodiments disclosed below.

[0034] The terminology used in one or more embodiments of this application is for the purpose of describing particular embodiments only and is not intended to limit the scope of one or more embodiments of this application. The singular forms “a,” “the,” and “the” used in one or more embodiments of this application and in the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” used in one or more embodiments of this application refers to and includes any or all possible combinations of one or more associated listed items.

[0035] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this application, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this application, and similarly, second may also be referred to as first. Depending on the context, the word "if" as used herein may be interpreted as "in response to a determination".

[0036] Figure 1 A structural block diagram of a computing device 100 according to an embodiment of this application is shown. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. The processor 120 is connected to the memory 110 via a bus 130, and a database 150 is used to store data.

[0037] The computing device 100 also includes an access device 140, which enables the computing device 100 to communicate via one or more networks 160. Examples of these networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the Internet. The access device 140 may include one or more of any type of wired or wireless network interface (e.g., a Network Interface Card (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) interface, a Wi-MAX interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, a Near Field Communication (NFC) interface, and so on.

[0038] In one embodiment of this application, the aforementioned components of the computing device 100 and Figure 1Other components, not shown, can also be connected to each other, for example, via a bus. It should be understood that... Figure 1 The block diagram of the computing device shown is for illustrative purposes only and is not intended to limit the scope of this application. Those skilled in the art can add or replace other components as needed.

[0039] The computing device 100 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable computing devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. The computing device 100 can also be a mobile or stationary server.

[0040] In existing technologies, the Impala engine is a query engine capable of querying information stored in petabyte-scale big data. It enables efficient querying functions in business systems. The Impala engine's operating mode is as follows: Figure 2 As shown.

[0041] Since Impala is a stateless system, its metadata is obtained from external systems. In this embodiment, Impala relies on Hive and utilizes the Hive Metastore to store its metadata. This metadata includes information about the database, tables, columns, and data types in the original cluster data. When Impala starts, the Catalog Server (hereinafter referred to as CatalogD) in Impala requests metadata from the Hive Metastore, obtains the metadata information, and persists it in memory. Subsequently, CatalogD broadcasts the persisted metadata to the Impala execution module ImpalaD through the Statestore Server (hereinafter referred to as StatestoreD). When performing SQL tasks, ImpalaD only needs to call the cached metadata and does not need to interact with the external database Hive again, thus improving task execution speed.

[0042] However, because Impala loads all metadata into memory at startup, a large number of tables and partitions can consume significant amounts of memory, potentially causing Impala to fail to start. Additionally, Impala's performance degrades when processing queries containing a large number of small files.

[0043] To address the aforementioned issues, Impala incorporates a method for on-demand metadata retrieval, alleviating caching problems in the Impala execution module. However, this doesn't solve the problem of excessively large metadata caches in CatalogD. Impala also features automatic expiration of Catalog metadata, automatically invalidating infrequently used or temporarily unnecessary data. However, frequent expiration of metadata for large tables can negatively impact performance. To reduce the size of Impala metadata storage and improve table metadata loading speed, common solutions include table partitioning and data cleanup, inevitably requiring the movement of original files. Since these original files are extremely complex, direct manipulation is very difficult; furthermore, modifying the original files requires additional backups, consuming significant space.

[0044] Therefore, this application provides a Hive metadata management method that can directly archive Hive Metastore metadata, thereby reducing the memory pressure of Impala and Hive Metastore loading metadata without deleting tables or original data.

[0045] For ease of explanation, this application first illustrates the database table structure in Hive Metastore. It should be understood that the Hive Metastore database table structure described in this solution is provided solely for the purpose of illustrating the implementation details, and the solution itself is not limited by the Hive Metastore table structure. The topological relationship of the Hive Metastore database table structure can be described as follows:

[0046] The DBS table (Database Sheet), with DB_ID as its primary key, stores metadata information about the databases. Each row represents a database, and the table information includes the database name, description, creation time, and database location.

[0047] The TBLS (Table Sheet) with the primary key TBL_ID stores metadata information about the tables. Each row represents a table, and the table information includes the table name (TBL_NAME), database ID (DB_ID), table type (such as regular table, external table, view, etc.), creation time, modification time, and other information.

[0048] The PARTITIONS table, with PART_ID as its primary key, stores metadata information about table partitions. Each row represents a table partition, and the form information includes the table ID (TBL_ID), partition value, partition name, creation time, and last access time. Form partitions refer to the division of different parts of the same form in big data storage.

[0049] The PARTITION_KEY_VALS table stores metadata about partition key-value pairs. Each row represents a partition key-value pair, and the table information includes the partition ID (PART_ID) and the partition key value.

[0050] The PARTITION_PARAMS table stores metadata information about partition parameters. Each row represents a partition parameter, and the form information includes the partition ID (PART_ID), parameter key, and parameter value.

[0051] This application provides a Hive metadata archiving method, including steps 301 to 303, such as... Figure 3 As shown:

[0052] Step 301. Obtain the target partition information from the Hive Metastore;

[0053] Connect to the Hive Metastore database and retrieve all partition information that needs to be archived.

[0054] In one feasible implementation, archiving is required when the metadata is too complex, resulting in an excessively large metadata cache and high memory consumption. In this case, the archiving process should be carried out on partitions that are used less frequently or will not be accessed for the time being.

[0055] Specifically, this operation can be achieved using SQL statements. An example statement for retrieving partition information is as follows:

[0056] 1.select DB_ID from dbs where name='xxx'

[0057] 2.select*from tbls where DB_ID=xx and TBL_NAME='xxx'

[0058] 3.select PART_ID from hivemetadata.partitions where TBL_ID=xxx andPART_NAME in('xxx')

[0059] In the Hive Metastore database structure, as mentioned above, the primary key of the partition table is PART_ID, which is the partition ID; therefore, this value is needed to obtain partition information. Since partitions are the result of dividing different parts of the same form, in addition to the partition name, the form identifier TBL_ID is also needed, and obtaining the form identifier depends on the database identifier DB_ID. Therefore, SQL statements need to be executed sequentially when obtaining the information.

[0060] The first statement retrieves the database identifier DB_ID of the target partition. The second statement retrieves the TBL_ID using the database identifier DB_ID and the name of the target table. The third statement retrieves the target partition identifier PART_ID using the TBL_ID and the partition name, and stores the retrieved target partition identifier in the first set PART_ID_DS.

[0061] It should be understood that the specific method for obtaining partition information described above is only one example of an implementable method, and the method described in this invention is not limited to the specific steps or computer statements described above.

[0062] In this step, partition information is selected as the content to be archived because Impala puts the database tables into the loading sequence when it starts up; therefore, in order to ensure that Impala can correctly obtain information when it starts up, no archiving operations will be performed at the database and form levels.

[0063] Step 302. Archive the target partition information to the spare storage space;

[0064] In this step, a backup storage space, HiveMetastore_Backup, is created, and the target partition information selected in the previous step is archived into the backup storage space.

[0065] In one specific embodiment, the spare storage space has the same partition information topology as the Hive Metastore storage space, enabling data to be moved quickly and accurately using SQL statements; for example, the partition information topology of the spare storage space also includes the three tables PARTITION_PARAMS, PARTITION_KEY_VALS, and PARTITIONS, such as... Figure 4 As shown.

[0066] Optionally, the spare storage space is stored in the same database as the Hive Metastore.

[0067] In one specific implementation, this operation can be achieved using SQL statements.

[0068] An example of an archiving statement is as follows:

[0069] 1.insert into hivemetadata_backup.partition_key_vals select*fromhivemetadata.partition_key_vals where PART_ID in(PART_ID_DS)

[0070] 2.insert into hivemetadata_backup.partition_params select*fromhivemetadata.partition_params where PART_ID in(PART_ID_DS)

[0071] 3.insert into hivemetadata_backup.partitions select*fromhivemetadata.partitions where PART_ID in(PART_ID_DS)

[0072] In this step, all the information of the target partition identifier PART_ID in the first set PART_ID_DS in the original PARTITION_PARAMS, PARTITION_KEY_VALS and PARTITIONS tables is copied to the spare storage space. Since all three tables contain the PART_ID field, we only need to use the PART_ID obtained in the above steps to index and archive all the corresponding data.

[0073] Step 303. Delete the target partition information from the Hive Metastore;

[0074] After the data is archived, it needs to be deleted from the Hive Metastore and the partition needs to be refreshed.

[0075] In one specific implementation, this operation can be achieved using SQL statements.

[0076] An example statement for deleting metadata is as follows:

[0077] 1.delete from hivemetadata.partition_key_vals where PART_ID in (PART_ID_DS)

[0078] 2.delete from hivemetadata.partition_params where PART_ID in (PART_ID_DS)

[0079] 3.delete from hivemetadata.partitions where PART_ID in (PART_ID_DS)

[0080] In the SQL statement above, the corresponding partition information in the original Hive Metastore is deleted based on the target partition identifier PART_ID in the first set PART_ID_DS, thus reducing the size of the initial metadata.

[0081] According to another aspect of this application, a method for loading metadata in Impala is provided, comprising:

[0082] Within the Impala engine, the remaining metadata in the Hive Metastore, after being archived according to the Hive metadata management methods described above, is loaded. For example:

[0083] When the Impala engine starts, the Catalog Server in the Impala engine requests metadata from the Hive Metastore, obtains the remaining metadata information, and persists it in memory.

[0084] Optionally, a specified partition in Impala can be refreshed so that modifications to the Hive Metastore are updated in the Impala cache. This is possible in scenarios where Impala is not loading metadata for the first time.

[0085] By directly manipulating the metadata in Hive, the metadata that is not currently in use can be quickly and efficiently archived and deleted from the Hive Metastore. This eliminates the need for the Impala engine to load all the metadata, significantly optimizing its operating efficiency and reducing the memory pressure on the Impala server.

[0086] According to another aspect of this application, a method for recovering metadata in Impala is provided, such as... Figure 5 As shown, it includes:

[0087] Step 501. Check the Impala metadata cache. If the target metadata does not exist, proceed to step 502.

[0088] Step 502. Obtain the recovery partition information from the spare storage space;

[0089] In this step, the partition information that needs to be restored is obtained from the spare storage space.

[0090] In one specific implementation, this operation can be achieved using SQL statements.

[0091] An example statement for retrieving a partition ID is as follows:

[0092] 1.select DB_ID from dbs where name='xxx'

[0093] 2.select*from tbls where DB_ID=xx and TBL_NAME='xxx'

[0094] 3.select PART_ID from hivemetadata_backup.partitions where TBL_ID=xxx and PART_NAME in('xxx')

[0095] The first statement retrieves the database identifier DB_ID of the restored partition. The second statement retrieves the TBL_ID using the database identifier DB_ID and the name of the restored table. The third statement retrieves the restored partition identifier PART_ID using the TBL_ID and the partition name, and stores the retrieved restored partition identifier in the second set PART_ID_DS2.

[0096] Step 503. Copy the restored partition information to the Hive Metastore and refresh the Impala metadata cache.

[0097] Copy the partition information of the partition identifier in the second set PART_ID_DS2 from the spare storage space to HiveMetastore.

[0098] In one specific implementation, this operation can be achieved using SQL statements, and an example of a recovery statement is as follows:

[0099] 1.insert into hivemetadata.partition_key_vals select*fromhivemetadata.partition_key_vals_backup where PART_ID in(PART_ID_DS2)

[0100] 2.insert into hivemetadata.partition_params select*fromhivemetadata.partition_params_backup where PART_ID in(PART_ID_DS2)

[0101] 3.insert into hivemetadata.partitions select*fromhivemetadata.partitions_backup where PART_ID in(PART_ID_DS2)

[0102] In the SQL statement above, the partition information in the spare storage space is restored to the corresponding partition table in the Hive Metastore based on the restored partition identifier PART_ID in the second set PART_ID_DS2.

[0103] After the restoration, since the changes to the Metastore have not yet been updated in the Impala cache, it is necessary to refresh the specified Impala partition so that the modifications to the Hive Metastore are updated in the Impala cache.

[0104] In the embodiments of this application, to improve the operating efficiency of the Impala engine and reduce its memory pressure, based on Impala's mechanism for synchronizing metadata with the Hive Metastore, operations are performed directly on the metadata in the Hive Metastore before synchronization, quickly archiving some of the metadata. During synchronization, Impala only needs to synchronize the remaining metadata in the Hive Metastore, thereby reducing the memory pressure on Impala and the Hive Metastore without deleting the original data, and also improving Impala's execution efficiency. Furthermore, during metadata archiving, partition information is selected as the archived metadata based on Impala's startup characteristics, without affecting Impala's operation. Correspondingly, when the Impala engine needs the relevant metadata, this metadata is restored from the backup storage space to the Hive Metastore, and then Impala's metadata is refreshed to complete the synchronization.

[0105] Corresponding to the above-described Hive metadata management method embodiments, this application also provides a Hive metadata management device, which includes:

[0106] The information acquisition module is used to obtain target partition information from the Hive Metastore;

[0107] The archiving module is used to archive the target partition information to the spare storage space;

[0108] The cleanup module is used to delete the target partition information from the Hive Metastore.

[0109] Corresponding to the above-described method embodiment for loading metadata in Impala, one embodiment of this application also provides an apparatus for loading metadata in Impala, the apparatus comprising:

[0110] The loading module is used to load the remaining metadata in the Hive Metastore after it has been archived according to the Hive metadata management method described above.

[0111] Corresponding to the above-described method embodiments for recovering metadata in Impala, one embodiment of this application also provides an apparatus for recovering metadata in Impala, comprising:

[0112] The detection module is used to detect whether the target metadata exists in the Impala metadata cache;

[0113] The information acquisition module is used to obtain the restoration partition information from the backup storage space when the target metadata is not present in the Impala metadata cache.

[0114] The recovery module is used to copy the recovered partition information to the Hive Metastore and refresh the Impala metadata cache.

[0115] The above is a schematic diagram of the relevant apparatus in this embodiment. It should be noted that the technical solutions of the above apparatus and the above method belong to the same concept, and any details not described in detail in the technical solutions of the above apparatus can be found in the description of the technical solutions of the above method.

[0116] In one embodiment of this application, a computing device is also provided, including a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor executes the instructions to implement the steps of the aforementioned method.

[0117] An embodiment of this application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the method as described above.

[0118] The above is an illustrative scheme of a computer-readable storage medium according to this embodiment. It should be noted that the technical solution of this storage medium and the technical solution of the above method belong to the same concept, and all details not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above method.

[0119] The foregoing has described specific embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0120] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0121] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0122] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0123] The preferred embodiments disclosed above are merely illustrative of this application. The optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the content of this application. These embodiments are selected and specifically described in this application to better explain the principles and practical applications of this application, thereby enabling those skilled in the art to better understand and utilize this application. This application is limited only by the claims and their full scope and equivalents.

Claims

1. A method for loading metadata in Impala, characterized in that, Includes the following steps: Retrieve target partition information from Hive Metastore; Archive the target partition information to the spare storage space; Remove the target partition information from the Hive Metastore; Within the Impala engine, load the remaining metadata from the Hive Metastore; The Impala metadata cache is checked, and if the target metadata does not exist, the restoration partition information is obtained from the backup storage space. Copy the restored partition information to the Hive Metastore and refresh the Impala metadata cache.

2. The method of claim 1, wherein, The target partition information includes partition information that is used less frequently.

3. The method of claim 1, wherein, The process of obtaining the target partition information in the Hive Metastore includes: Connect to the Hive Metastore database, obtain the database identifier of the target partition, obtain the table identifier using the database identifier and the name of the target table, and then obtain the target partition identifier based on the table identifier and the partition name. Store the obtained target partition identifier in the first set.

4. The method according to claim 1, wherein, Archiving the target partition information to the spare storage space includes: All information identifying the target partition in the first set in the original partition table is copied to the spare storage space; the spare storage space has the same partition information topology as the Hive Metastore storage space.

5. The method of claim 1, wherein, Copying the restored partition information to the Hive Metastore includes: Obtain the database identifier where the restored partition is located. Obtain the table identifier using the database identifier and the name of the restored table. Obtain the restored partition identifier using the table identifier and the partition name. Store the obtained restored partition identifier in the second set. Then restore the partition information of the partition identifier in the second set from the spare storage space to the partition table in the Hive Metastore.

6. An apparatus for loading metadata in Impala, the apparatus comprising: include: The information acquisition module is used to obtain target partition information from the Hive Metastore; The archiving module is used to archive the target partition information to the spare storage space; The cleanup module is used to delete the target partition information from the Hive Metastore; The loading module is used to load the remaining metadata in the Hive Metastore within the Impala engine; The detection module is used to detect whether the target metadata exists in the Impala metadata cache; The information acquisition module is used to obtain the restoration partition information from the backup storage space when the target metadata is not present in the Impala metadata cache. The recovery module is used to copy the recovered partition information to the Hive Metastore and refresh the Impala metadata cache.

7. A computer-readable storage medium storing computer instructions, wherein, When executed by the processor, this instruction implements the steps of the method according to any one of claims 1-5.

8. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein, When the processor executes the instructions, it implements the steps of the method according to any one of claims 1-5.