Cluster deployment method and apparatus, computer device, and storage medium
By isolating and deploying management node groups and data storage node groups in a data lake cluster, and using a MySQL database to store metadata information, the memory pressure problem of the management node group is solved, the stability and availability of the cluster are improved, and production risks are reduced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2023-10-10
- Publication Date
- 2026-06-19
Smart Images

Figure CN117407469B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of big data technology and fintech, and in particular to cluster deployment methods, devices, computer equipment and storage media. Background Technology
[0002] In recent years, data lake technology has gradually emerged in the big data field, with lake-warehouse integration and batch-stream integration becoming the mainstream development direction. This has led to the increasingly widespread application of data lake technology in fintech companies, such as insurance companies and banks. Currently, the deployment scheme of ordinary production clusters adopted by fintech companies typically involves co-deploying the storage node groups of the data lake cluster with the storage node groups of the original production cluster, and the data lake cluster and the original production cluster are managed by the same management node group. However, the influx of too much streaming data into the data lake generates a large number of small files. The metadata information of each file block is loaded into the physical machine memory of the same management node group. The generation of a large number of small files will cause huge operational pressure and poor stability to the management node group, such as the management node group starting up for maintenance or crashing. At the same time, if the memory of the management node group is too occupied by the metadata information of small files, it will be unable to respond to and process requests from other computing nodes in the cluster in a timely manner, and the availability of the cluster will also decrease.
[0003] Therefore, existing cluster deployment solutions have high production risks, and the availability and stability of cluster production are poor. Summary of the Invention
[0004] The purpose of this application is to provide a cluster deployment method, apparatus, computer equipment, and storage medium to solve the technical problems of high production risk and poor availability and stability of existing cluster deployment schemes.
[0005] To address the aforementioned technical problems, this application provides a cluster deployment method, employing the following technical solution:
[0006] Set up the first management node group of the pre-defined data lake cluster;
[0007] Establish the first data storage node group of the data lake cluster;
[0008] Deploy the pre-defined MySQL database;
[0009] Configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and store the first metadata information in the second data storage node group into the MySQL database;
[0010] Identify the first MetaStore service corresponding to all first node groups of the original production cluster, and establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster;
[0011] Configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and store the second metadata information in the first data storage node group into the MySQL database;
[0012] Identify the second MetaStore service corresponding to all second node groups of the data lake cluster, and establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group.
[0013] Furthermore, the step of building the first management node group of the preset data lake cluster specifically includes:
[0014] Obtain the first performance threshold corresponding to the preset first computing performance requirement;
[0015] Identify the first physical unit whose computing performance is greater than the first performance threshold;
[0016] The first management node group of the data lake cluster is built based on the first physical unit.
[0017] Furthermore, the step of building the first data storage node group of the data lake cluster specifically includes:
[0018] Obtain the second performance threshold corresponding to the preset second computing performance requirement;
[0019] Acquire a second physical unit whose computing performance is less than the second performance threshold;
[0020] The first data storage node group of the data lake cluster is built based on the second physical unit.
[0021] Furthermore, the step of determining the first MetaStore service corresponding to all first node groups of the original production cluster specifically includes:
[0022] Obtain the deployment component version corresponding to the first node group of the original production cluster;
[0023] The evaluation results are obtained by evaluating the version of the deployment component.
[0024] Based on the evaluation results, the corresponding designated MetaStore service is determined;
[0025] The specified MetaStore service is designated as the first MetaStore service.
[0026] Furthermore, after the step of building the first data storage node group of the data lake cluster, the method further includes:
[0027] Obtain the preset mapping service construction strategy;
[0028] Based on the mapping service construction strategy, a mapping service is constructed between the first management node group and the first data storage node group.
[0029] Furthermore, after the steps of determining the second MetaStore service corresponding to all second node groups of the data lake cluster and establishing a connection between the second MetaStore service and the MySQL database, the method further includes:
[0030] Obtain the remote access link number of the first MetaStore service corresponding to the first MetaStore service;
[0031] Based on the remote access link number of the first MetaStore service, set the remote access link of the first MetaStore service corresponding to all first node groups of the original production cluster in the MySQL database.
[0032] Obtain the remote access link number of the second MetaStore service corresponding to the second MetaStore service;
[0033] Based on the remote access link number of the second MetaStore service, set the remote access link of the second MetaStore service for all second node groups in the data lake cluster in the MySQL database.
[0034] Furthermore, after the step of setting the remote access link for the second MetaStore service corresponding to all second node groups of the data lake cluster in the MySQL database based on the remote access link number of the second MetaStore service, the method further includes:
[0035] Determine whether a data calculation request triggered by a user corresponding to a remote access link to a target MetaStore service has been received; wherein, the remote access link to a target MetaStore service includes either the first remote access link to a MetaStore service or the second remote access link to a MetaStore service, and the data calculation request carries a query data identifier;
[0036] If so, extract the query data identifier from the data calculation request;
[0037] Retrieve the target data table data corresponding to the query data identifier from the MySQL database;
[0038] The target data table data is loaded into a preset memory for calculation and processing to obtain the corresponding calculation results;
[0039] The calculation results are then pushed to the user.
[0040] To address the aforementioned technical problems, this application also provides a cluster deployment device, which employs the following technical solution:
[0041] The first setup module is used to set up the first management node group of the preset data lake cluster;
[0042] The second construction module is used to build the first data storage node group of the data lake cluster.
[0043] The deployment module is used to deploy a pre-defined MySQL database;
[0044] The first processing module is used to configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and to store the first metadata information in the second data storage node group into the MySQL database.
[0045] The second processing module is used to determine the first MetaStore service corresponding to all first node groups of the original production cluster, and to establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster;
[0046] The third processing module is used to configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and to store the second metadata information in the first data storage node group into the MySQL database.
[0047] The fourth processing module is used to determine the second MetaStore service corresponding to all second node groups of the data lake cluster, and to establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group.
[0048] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:
[0049] Set up the first management node group of the pre-defined data lake cluster;
[0050] Establish the first data storage node group of the data lake cluster;
[0051] Deploy the pre-defined MySQL database;
[0052] Configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and store the first metadata information in the second data storage node group into the MySQL database;
[0053] Identify the first MetaStore service corresponding to all first node groups of the original production cluster, and establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster;
[0054] Configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and store the second metadata information in the first data storage node group into the MySQL database;
[0055] Identify the second MetaStore service corresponding to all second node groups of the data lake cluster, and establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group.
[0056] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:
[0057] Set up the first management node group of the pre-defined data lake cluster;
[0058] Establish the first data storage node group of the data lake cluster;
[0059] Deploy the pre-defined MySQL database;
[0060] Configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and store the first metadata information in the second data storage node group into the MySQL database;
[0061] Identify the first MetaStore service corresponding to all first node groups of the original production cluster, and establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster;
[0062] Configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and store the second metadata information in the first data storage node group into the MySQL database;
[0063] Identify the second MetaStore service corresponding to all second node groups of the data lake cluster, and establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group.
[0064] Compared with the prior art, the embodiments of this application have the following main advantages:
[0065] This application embodiment first establishes a first management node group of a preset data lake cluster; and then establishes a first data storage node group of the data lake cluster; then deploys a preset MySQL database; subsequently, a remote connection service corresponding to the MySQL database is configured on the physical machines of the second data storage node group of the original production cluster, and the first metadata information in the second data storage node group is stored in the MySQL database; next, a first MetaStore service corresponding to all first node groups of the original production cluster is determined, and a connection is established between the first MetaStore service and the MySQL database; further, a remote connection service corresponding to the MySQL database is configured on the physical machines of the first data storage node group of the data lake cluster, and the second metadata information in the first data storage node group is stored in the MySQL database; finally, a second MetaStore service corresponding to all second node groups of the data lake cluster is determined, and a connection is established between the second MetaStore service and the MySQL database. This application embodiment constructs a first management node group and a first data storage node group for the data lake cluster based on the concept of storage-computing classification. This achieves isolated deployment of the management node group and data storage node group, effectively addressing the risks posed by small files in the data lake cluster and reducing the memory pressure on the management node group caused by a large number of small files, thus improving the stability of the production cluster. Furthermore, by using a pre-configured MySQL database, the metadata services of each cluster are deployed simultaneously, enabling metadata information exchange between the data lake cluster and the existing production cluster. This achieves node isolation and independence while maintaining shared metadata, confining the production risks of the data lake cluster to a small scope and effectively ensuring the availability and stability of the cluster in production. Attached Figure Description
[0066] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0067] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;
[0068] Figure 2 A flowchart of an embodiment of the cluster deployment method according to this application;
[0069] Figure 3 This is a schematic diagram of a structure of an embodiment of the cluster deployment apparatus according to this application;
[0070] Figure 4 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation
[0071] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.
[0072] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0073] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0074] like Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.
[0075] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.
[0076] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc.
[0077] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.
[0078] It should be noted that the cluster deployment method provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the cluster deployment device is generally set in the server / terminal device.
[0079] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0080] Continue to refer to Figure 2 A flowchart illustrating an embodiment of the cluster deployment method according to this application is shown. The order of steps in the flowchart can be changed, and some steps can be omitted, depending on different requirements. The cluster deployment method provided by this application embodiment can be applied to any scenario requiring cluster deployment, and thus can be applied to products in these scenarios, such as cluster deployment in the financial insurance field. The cluster deployment method includes the following steps:
[0081] Step S201: Build the first management node group of the preset data lake cluster.
[0082] In this embodiment, the cluster deployment method runs on electronic devices (e.g., Figure 1The server / terminal device shown can access the data lake cluster via wired or wireless connections. It should be noted that the aforementioned wireless connection methods include, but are not limited to, 3G / 4G / 5G connections, WiFi connections, Bluetooth connections, WiMAX connections, Zigbee connections, UWB (ultra-wideband) connections, and other currently known or future-developed wireless connection methods. A data lake is a storage system that includes different file formats and lake table formats at its underlying layer, capable of storing large amounts of unstructured and semi-structured raw data. Data consumers can access this data for data analysis, including BI, reporting, and machine learning model training. With a data lake, data becomes increasingly usable. Specifically, the aforementioned data lake cluster is a pre-built storage system applied to fintech companies, such as insurance companies and banks. In the application field of fintech, the data lake cluster can be used to store financial data, such as business data, transaction data, payment data, purchase data, financial product data, etc. The specific implementation process of building the first management node group of the pre-built data lake cluster will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated upon here.
[0083] Step S202: Build the first data storage node group of the data lake cluster.
[0084] In this embodiment, the specific implementation process of building the first data storage node group of the data lake cluster will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0085] Step S203: Deploy the preset MySQL database.
[0086] In this embodiment, MySQL is a relational database management system. Relational databases store data in different tables, rather than storing all data in one large repository, thus increasing speed and improving flexibility. The SQL language used by MySQL is the most commonly used standardized language for accessing databases. MySQL software employs a dual-licensing policy, offering both a community edition and a commercial edition. Due to its small size, high speed, low total cost of ownership, and especially its open-source nature, MySQL is generally chosen as the database for the development of small, medium, and large websites. This application introduces a MySQL database to persistently store metadata information for the data tables of two clusters: the data lake cluster and the original production cluster, achieving interoperability of metadata information between the two clusters. Furthermore, after deploying the MySQL database, a remote connection service for the MySQL database is initiated.
[0087] Step S204: Configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and store the first metadata information in the second data storage node group into the MySQL database.
[0088] In this embodiment, the second data storage node group and the second management node group of the original production cluster remain unchanged. Specifically, the first metadata information in the second data storage node group can be persistently stored in the MySQL database by configuring remote connection services corresponding to the MySQL database on all physical machines of the second data storage node group of the original production cluster in sequence.
[0089] Step S205: Determine the first MetaStore service corresponding to all first node groups of the original production cluster, and establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster.
[0090] In this embodiment, the specific implementation process of determining the first MetaStore service corresponding to all first node groups of the original production cluster will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here. Specifically, after determining the first MetaStore service, the connection between the first MetaStore service and the MySQL database can be established by deploying the first MetaStore service to connect to the MySQL data.
[0091] Step S206: Configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and store the second metadata information in the first data storage node group into the MySQL database;
[0092] In this embodiment, by configuring remote connection services corresponding to the MySQL database on all physical machines of the first data storage node group in the data lake cluster, the second metadata information in the first data storage node group is persistently stored in the MySQL database. Specifically, by deploying two different versions of the MetaStore service, the metadata information of the data tables in the two clusters stored in the MySQL database is accessed. Users access the MySQL database through the two different versions of the MetaStore service, thereby achieving information exchange between the data table metadata of the two clusters.
[0093] Step S207: Determine the second MetaStore service corresponding to all second node groups of the data lake cluster, and establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group.
[0094] In this embodiment, because emerging data lake technologies have high version requirements for open-source components, a high-version MetaStore service is used as the second MetaStore service to ensure compatibility between the second MetaStore service and the data lake cluster. After determining the second MetaStore service, it can be deployed to connect to the MySQL data, thus completing the connection establishment process between the second MetaStore service and the MySQL database. The cluster deployment scheme proposed in this application supports the deployment of new data lake technologies within the scope of old clusters. While not changing the original production cluster architecture, it deploys high-version services to ensure compatibility with emerging data lake technologies, leveraging the company's existing hardware physical assets and effectively reducing the company's change costs in the context of developing emerging technologies.
[0095] This application first establishes a first management node group for a pre-defined data lake cluster; and then establishes a first data storage node group for the same data lake cluster; next, it deploys a pre-defined MySQL database; then, it configures a remote connection service corresponding to the MySQL database on the physical machines of the second data storage node group of the original production cluster, and stores the first metadata information from the second data storage node group into the MySQL database; subsequently, it determines the first MetaStore service corresponding to all the first node groups of the original production cluster, and establishes a connection between the first MetaStore service and the MySQL database; further, it configures a remote connection service corresponding to the MySQL database on the physical machines of the first data storage node group of the data lake cluster, and stores the second metadata information from the first data storage node group into the MySQL database; finally, it determines the second MetaStore service corresponding to all the second node groups of the data lake cluster, and establishes a connection between the second MetaStore service and the MySQL database. This application constructs a first management node group and a first data storage node group for a data lake cluster based on the concept of storage-computing classification. This achieves isolated deployment of the management node group and data storage node group, effectively addressing the risks posed by small files in the data lake cluster and reducing the memory pressure on the management node group caused by a large number of small files, thus improving the stability of the production cluster. Furthermore, by using a pre-configured MySQL database, metadata services are deployed simultaneously, enabling metadata information exchange between the data lake cluster and the existing production cluster. This achieves node isolation and independence while maintaining shared metadata, confining production risks within a small scope and effectively ensuring the availability and stability of the cluster in production.
[0096] In some alternative implementations, step S201 includes the following steps:
[0097] Obtain the first performance threshold corresponding to the preset first computing performance requirement.
[0098] In this embodiment, the first computing performance requirement is high computing performance, and the first performance threshold is a value pre-constructed based on the actual computing performance classification corresponding to high computing performance. The value of the first performance threshold is not specifically limited.
[0099] The first physical unit whose computing performance is greater than the first performance threshold is selected.
[0100] In this embodiment, the computational performance of all physical units in a preset set of physical units can be analyzed to select a first physical unit whose computational performance exceeds the first performance threshold. The first physical unit is a physical unit with a high-performance computing configuration.
[0101] The first management node group of the data lake cluster is built based on the first physical unit.
[0102] In this embodiment, since the management node requires high CPU and memory configuration to complete complex calculations, the first physical machine group with high-performance computing configuration will be intelligently used to deploy the first management node group of the data lake cluster to bear the computing pressure of many small files and high load in the data lake. This can effectively solve the operational pressure risk brought by small files in the data lake to the cluster management node and improve the stability guarantee of the cluster management node.
[0103] This application obtains a first performance threshold corresponding to a preset first computing performance requirement; then obtains a first physical unit with computing performance greater than the first performance threshold; subsequently, it builds a first management node group for the data lake cluster based on the first physical unit. By using a first physical unit with high-performance computing configuration to deploy the first management node group of the data lake cluster, this application effectively addresses the computational pressure caused by the large number of small files and high load in the data lake, thereby mitigating the operational and maintenance pressure risks brought by small data lake files to the management nodes of the original production cluster and improving the stability of the management nodes of the original production cluster.
[0104] In some optional implementations of this embodiment, step S202 includes the following steps:
[0105] Obtain the second performance threshold corresponding to the preset second computing performance requirement.
[0106] In this embodiment, the second computing performance requirement is ordinary computing performance, and the second performance threshold is a value pre-constructed based on the actual computing performance classification corresponding to ordinary computing performance. There is no specific limitation on the value of the second performance threshold.
[0107] Obtain a second physical unit whose computing performance is less than the second performance threshold.
[0108] In this embodiment, the computational performance of all physical units in a preset set of physical units can be analyzed to select a second physical unit whose computational performance is less than the second performance threshold. The first physical unit is a physical unit with a normal performance computational configuration.
[0109] The first data storage node group of the data lake cluster is built based on the second physical unit.
[0110] In this embodiment, since the data storage nodes do not require high CPU and memory configurations to complete complex calculations, but only need to perform data storage processing, the data storage node group of the data lake cluster will be intelligently deployed using a second physical unit with ordinary performance computing configuration. This can effectively reduce the deployment cost of the data storage node group and improve the deployment intelligence of the data storage node group.
[0111] This application obtains a second performance threshold corresponding to a preset second computing performance requirement; then obtains a second physical server group whose computing performance is less than the second performance threshold; subsequently, it builds the first data storage node group of the data lake cluster based on the second physical server group. This application effectively reduces the deployment cost of the data storage node group and improves the intelligence of the data storage node group deployment by intelligently using a second physical server group with ordinary performance computing configuration to deploy the data storage node group of the data lake cluster.
[0112] In some optional implementations, the step S205 of determining the first MetaStore service corresponding to all first node groups of the original production cluster includes the following steps:
[0113] Obtain the deployment component version corresponding to the first node group of the original production cluster.
[0114] In this embodiment, the first node group includes the second data storage node group and the second management node group of the original production cluster. The corresponding deployment component versions can be obtained by querying the deployment component versions of the second data storage node group and the second management node group of the original production cluster.
[0115] The evaluation results are obtained by evaluating the version of the deployment component.
[0116] In this embodiment, evaluating the deployment component version refers to determining whether the deployment component can undergo a large-scale version change. The evaluation result includes whether the deployment component can undergo a large-scale version change or not. For existing production clusters, to ensure the operational stability of the existing production cluster, the evaluation result is usually that the deployment component cannot undergo a large-scale version change.
[0117] The corresponding designated MetaStore service is determined based on the evaluation results.
[0118] In this embodiment, if the evaluation result indicates that the deployment component cannot undergo large-scale version updates, then the specified MetaStore service is a MetaStore service of a compatible version corresponding to the deployment component version of the first node group.
[0119] The specified MetaStore service is designated as the first MetaStore service.
[0120] This application obtains the deployment component version corresponding to the first node group of the original production cluster; then evaluates the deployment component version to obtain the corresponding evaluation result; subsequently, it determines the corresponding designated MetaStore service based on the evaluation result; and then uses the designated MetaStore service as the first MetaStore service. This application improves the adaptability of the generated first MetaStore service to all first node groups of the original production cluster by evaluating the deployment component version corresponding to the first node group of the original production cluster and then accurately determining the first MetaStore service based on the obtained evaluation result.
[0121] In some alternative implementations, after step S202, the electronic device may further perform the following steps:
[0122] Obtain the preset mapping service construction strategy.
[0123] In this embodiment, the mapping service construction strategy is a pre-written construction strategy for mapping services between node groups used to build a cluster.
[0124] Based on the mapping service construction strategy, a mapping service is constructed between the first management node group and the first data storage node group.
[0125] In this embodiment, by executing the mapping service construction strategy, a mapping service between the first management node group and the first data storage node group is constructed. After the mapping service is constructed, the first management node group and the first data storage node group of the data lake cluster can perform data interaction, such as request and response services.
[0126] This application obtains a preset mapping service construction strategy; subsequently, based on the mapping service construction strategy, it constructs a mapping service between the first management node group and the first data storage node group. After completing the construction of the first management node group and the first data storage node group of the data lake cluster, this application will also intelligently construct a mapping service between the first management node group and the first data storage node group based on the preset mapping service construction strategy, so that normal data interaction between the first management node group and the first data storage node group can be realized based on this mapping service, thereby ensuring the smooth deployment of the data lake cluster.
[0127] In some optional implementations of this embodiment, after step S207, the electronic device may further perform the following steps:
[0128] Obtain the remote access link number of the first MetaStore service corresponding to the first MetaStore service.
[0129] In this embodiment, after determining the first MetaStore service corresponding to all first node groups of the original production cluster and establishing a connection between the first MetaStore service and the MySQL database, a first MetaStore service remote access link number corresponding to the first MetaStore service is further generated and stored. The first MetaStore service remote access link number has a mapping relationship with the first MetaStore service and has a unique identifier. For example, the first MetaStore service remote access link number is MetaStore Service-1.
[0130] Based on the remote access link number of the first MetaStore service, set the remote access link of the first MetaStore service corresponding to all first node groups of the original production cluster in the MySQL database.
[0131] In this embodiment, according to a preset remote access link generation strategy, the remote access link for the first MetaStore service in the MySQL database for all first node groups of the original production cluster can be set based on the remote access link number of the first MetaStore service. The remote access link generation strategy is a strategy constructed based on actual remote access link generation requirements. For example, the remote access link generation strategy includes: the remote access link and the remote access link number are the same. For instance, if the remote access link number for the first MetaStore service is MetaStoreService-1, then the remote access link for the first MetaStore service can be set to MetaStoreService-1.
[0132] Obtain the remote access link number of the second MetaStore service corresponding to the second MetaStore service.
[0133] In this embodiment, after determining the second MetaStore service corresponding to all second node groups in the data lake cluster and establishing a connection between the second MetaStore service and the MySQL database, a remote access link number for the second MetaStore service is further generated and stored. The remote access link number for the second MetaStore service has a mapping relationship with the second MetaStore service and is a unique identifier. For example, the remote access link number for the second MetaStore service is MetaStore Service-2.
[0134] Based on the remote access link number of the second MetaStore service, set the remote access link of the second MetaStore service for all second node groups in the data lake cluster in the MySQL database.
[0135] In this embodiment, the method for setting the remote access link of the second MetaStore service can refer to the aforementioned remote access link of the first MetaStore service, and will not be elaborated further here.
[0136] This application obtains the remote access link number of the first MetaStore service corresponding to the first MetaStore service; then, based on the first MetaStore service remote access link number, sets the remote access link of the first MetaStore service for all first node groups of the original production cluster in the MySQL database; then, obtains the remote access link number of the second MetaStore service corresponding to the second MetaStore service; subsequently, based on the second MetaStore service remote access link number, sets the remote access link of the second MetaStore service for all second node groups of the data lake cluster in the MySQL database. This application deploys two different MetaStore services—a first MetaStore service and a second MetaStore service—for all first-node groups in the original production cluster and all second-node groups in the data lake cluster. It intelligently sets the remote access links for the first MetaStore service in the MySQL database for all first-node groups in the original production cluster based on the remote access link number of the first MetaStore service, and sets the remote access links for the second MetaStore service in the MySQL database for all second-node groups in the data lake cluster based on the remote access link number of the second MetaStore service. This allows for rapid setup of different MetaStore service remote access links for the original production cluster and the data lake cluster. This enables subsequent use of the first and second MetaStore services corresponding to the two different MetaStore service remote access links in the MySQL database, achieving interoperability of data table metadata information between the two clusters. This achieves the goal of isolated and independent nodes in the two clusters, but with shared and interoperable metadata, effectively ensuring the availability and stability of the original production cluster.
[0137] In some optional implementations of this embodiment, after the step of setting the remote access links for the second MetaStore service of all second node groups in the data lake cluster in the MySQL database based on the second MetaStore service remote access link number, the electronic device may further perform the following steps:
[0138] Determine whether a data calculation request triggered by a user and corresponding to the remote access link of the target MetaStore service has been received.
[0139] In this embodiment, the target MetaStore service remote access link includes either the first MetaStore service remote access link or the second MetaStore service remote access link, and the data calculation request carries a query data identifier.
[0140] If so, extract the query data identifier from the data calculation request.
[0141] In this embodiment, the query data identifier can be extracted from the data calculation request by extracting information from the data calculation request.
[0142] Retrieve the target data table data corresponding to the query data identifier from the MySQL database.
[0143] In this embodiment, the target metadata in the MySQL database can be accessed via a remote access link using the target MetaStore service, and then the target metadata can be queried based on the query data identifier to obtain the target data table data mentioned above.
[0144] The target data table data is loaded into a preset memory for calculation and processing to obtain the corresponding calculation results.
[0145] In this embodiment, the aforementioned memory may refer to the program memory within the electronic device. By loading the target data table data into a preset memory for computation, the computation speed can be improved, thereby quickly obtaining the corresponding computation results.
[0146] The calculation results are then pushed to the user.
[0147] This application determines whether a user-triggered data calculation request corresponding to a remote access link to a target MetaStore service has been received. If so, it extracts the query data identifier from the data calculation request; then, it retrieves the target data table data corresponding to the query data identifier from the MySQL database; subsequently, it loads the target data table data into a preset memory for calculation processing to obtain the corresponding calculation result; and finally, it pushes the calculation result to the user. Upon receiving a user-triggered data calculation request corresponding to a remote access link to a target MetaStore service, this application intelligently retrieves the target data table data corresponding to the query data identifier carried in the data calculation request from the MySQL database, loads the target data table data into a preset memory for calculation processing, and pushes the obtained calculation result to the user. This approach, based on the use of the MySQL database and memory, rapidly completes the processing of data calculation requests, improving the processing efficiency and intelligence of data calculation requests.
[0148] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0149] It should be emphasized that, to further ensure the privacy and security of the aforementioned metadata information, the metadata information can also be stored in a node of a blockchain.
[0150] The blockchain referred to in this application is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include an underlying blockchain platform, a platform product service layer, and an application service layer.
[0151] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0152] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0153] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware with computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the above methods. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).
[0154] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0155] Further reference Figure 3 As a response to the above Figure 2 To implement the method shown, this application provides an embodiment of a cluster deployment device, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0156] like Figure 3 As shown, the cluster deployment device 300 described in this embodiment includes: a first setup module 301, a second setup module 302, a deployment module 303, a first processing module 304, a second processing module 305, a third processing module 306, and a fourth processing module 307. Wherein:
[0157] The first setup module 301 is used to set up the first management node group of the preset data lake cluster;
[0158] The second construction module 302 is used to construct the first data storage node group of the data lake cluster.
[0159] Deployment module 303 is used to deploy a pre-defined MySQL database;
[0160] The first processing module 304 is used to configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and to store the first metadata information in the second data storage node group into the MySQL database.
[0161] The second processing module 305 is used to determine the first MetaStore service corresponding to all first node groups of the original production cluster, and to establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster;
[0162] The third processing module 306 is used to configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and to store the second metadata information in the first data storage node group into the MySQL database.
[0163] The fourth processing module 307 is used to determine the second MetaStore service corresponding to all second node groups of the data lake cluster, and to establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group.
[0164] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0165] In some optional implementations of this embodiment, the first construction module 301 includes:
[0166] The first acquisition submodule is used to acquire the first performance threshold corresponding to the preset first computing performance requirement;
[0167] The second acquisition submodule is used to acquire the first physical unit whose computing performance is greater than the first performance threshold.
[0168] The first construction submodule is used to build the first management node group of the data lake cluster based on the first physical unit.
[0169] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0170] In some optional implementations of this embodiment, the second construction module 302 includes:
[0171] The third acquisition submodule is used to acquire the second performance threshold corresponding to the preset second computing performance requirement;
[0172] The fourth acquisition submodule is used to acquire the second physical unit whose computing performance is less than the second performance threshold;
[0173] The second construction submodule is used to build the first data storage node group of the data lake cluster based on the second physical unit.
[0174] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0175] In some optional implementations of this embodiment, the first processing module 304 includes:
[0176] The fifth acquisition submodule is used to acquire the deployment component version corresponding to the first node group of the original production cluster;
[0177] The evaluation submodule is used to evaluate the version of the deployed component and obtain the corresponding evaluation result;
[0178] The first determining submodule is used to determine the corresponding designated MetaStore service based on the evaluation results;
[0179] The second determining submodule is used to designate the specified MetaStore service as the first MetaStore service.
[0180] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0181] In some optional implementations of this embodiment, the cluster deployment apparatus further includes:
[0182] The first acquisition module is used to acquire the preset mapping service construction strategy;
[0183] The construction module is used to construct a mapping service between the first management node group and the first data storage node group based on the mapping service construction strategy.
[0184] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0185] In some optional implementations of this embodiment, the cluster deployment apparatus further includes:
[0186] The second acquisition module is used to acquire the remote access link number of the first MetaStore service corresponding to the first MetaStore service;
[0187] The first setting module is used to set the remote access link of the first MetaStore service for all first node groups of the original production cluster in the MySQL database based on the remote access link number of the first MetaStore service.
[0188] The third acquisition module is used to acquire the remote access link number of the second MetaStore service corresponding to the second MetaStore service;
[0189] The second setting module is used to set the remote access link of the second MetaStore service for all second node groups of the data lake cluster in the MySQL database based on the remote access link number of the second MetaStore service.
[0190] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0191] In some optional implementations of this embodiment, the cluster deployment apparatus further includes:
[0192] The judgment module is used to determine whether a data calculation request triggered by a user corresponding to a remote access link of a target MetaStore service has been received; wherein, the remote access link of the target MetaStore service includes the first remote access link of the MetaStore service or the second remote access link of the MetaStore service, and the data calculation request carries a query data identifier;
[0193] An extraction module is used to extract the query data identifier from the data calculation request if the condition is met.
[0194] The fourth acquisition module is used to acquire target data table data corresponding to the query data identifier from the MySQL database;
[0195] The calculation module is used to load the target data table data into a preset memory for calculation processing to obtain the corresponding calculation results;
[0196] The push module is used to push the calculation results to the user.
[0197] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the cluster deployment method in the aforementioned implementation method, and will not be repeated here.
[0198] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 4 , Figure 4 This is a basic structural block diagram of the computer device in this embodiment.
[0199] The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are interconnected via a system bus. It should be noted that only the computer device 4 with components 41-43 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0200] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.
[0201] The memory 41 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as the hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 4. Of course, the memory 41 may also include both the internal storage unit and its external storage device of the computer device 4. In this embodiment, the memory 41 is typically used to store the operating system and various application software installed on the computer device 4, such as computer-readable instructions for cluster deployment methods. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
[0202] In some embodiments, the processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is used to execute computer-readable instructions stored in the memory 41 or to process data, for example, to execute computer-readable instructions for the cluster deployment method.
[0203] The network interface 43 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 4 and other electronic devices.
[0204] Compared with the prior art, the embodiments of this application have the following main advantages:
[0205] In this embodiment, a first management node group and a first data storage node group are constructed based on the concept of storage-computing classification. This achieves isolated deployment of the management node group and data storage node group of the data lake cluster, effectively addressing the risks posed by small files in the data lake cluster and reducing the memory pressure on the management node group caused by a large number of small files, thereby improving the stability of the production cluster. Furthermore, by using a pre-configured MySQL database, metadata services are deployed simultaneously, enabling metadata information exchange between the data lake cluster and the existing production cluster. This achieves node isolation and independence, but with shared and interoperable metadata, thus isolating the production risks of the data lake cluster within a small scope and effectively ensuring the availability and stability of the cluster production.
[0206] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the cluster deployment method described above.
[0207] Compared with the prior art, the embodiments of this application have the following main advantages:
[0208] In this embodiment, a first management node group and a first data storage node group are constructed based on the concept of storage-computing classification. This achieves isolated deployment of the management node group and data storage node group of the data lake cluster, effectively addressing the risks posed by small files in the data lake cluster and reducing the memory pressure on the management node group caused by a large number of small files, thereby improving the stability of the production cluster. Furthermore, by using a pre-configured MySQL database, metadata services are deployed simultaneously, enabling metadata information exchange between the data lake cluster and the existing production cluster. This achieves node isolation and independence, but with shared and interoperable metadata, thus isolating the production risks of the data lake cluster within a small scope and effectively ensuring the availability and stability of the cluster production.
[0209] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0210] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.
Claims
1. A cluster deployment method, characterized in that, Includes the following steps: Set up the first management node group of the pre-defined data lake cluster; Establish the first data storage node group of the data lake cluster; Deploy the pre-defined MySQL database; Configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and store the first metadata information in the second data storage node group into the MySQL database; Identify the first MetaStore service corresponding to all first node groups of the original production cluster, and establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster; Configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and store the second metadata information in the first data storage node group into the MySQL database; Identify the second MetaStore service corresponding to all second node groups of the data lake cluster, and establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group; The step of setting up the first management node group of the preset data lake cluster specifically includes: Obtain the first performance threshold corresponding to the preset first computing performance requirement; Identify the first physical unit whose computing performance is greater than the first performance threshold; The first management node group of the data lake cluster is built based on the first physical unit.
2. The cluster deployment method according to claim 1, characterized in that, The steps for building the first data storage node group of the data lake cluster specifically include: Obtain the second performance threshold corresponding to the preset second computing performance requirement; Acquire a second physical unit whose computing performance is less than the second performance threshold; The first data storage node group of the data lake cluster is built based on the second physical unit.
3. The cluster deployment method according to claim 1, characterized in that, The step of determining the first MetaStore service corresponding to all first node groups of the original production cluster specifically includes: Obtain the deployment component version corresponding to the first node group of the original production cluster; The evaluation results are obtained by evaluating the version of the deployment component. Based on the evaluation results, the corresponding designated MetaStore service is determined; The specified MetaStore service is designated as the first MetaStore service.
4. The cluster deployment method according to claim 1, characterized in that, Following the step of building the first data storage node group of the data lake cluster, the method further includes: Obtain the preset mapping service construction strategy; Based on the mapping service construction strategy, a mapping service is constructed between the first management node group and the first data storage node group.
5. The cluster deployment method according to claim 1, characterized in that, After the steps of determining the second MetaStore service corresponding to all second node groups of the data lake cluster and establishing a connection between the second MetaStore service and the MySQL database, the method further includes: Obtain the remote access link number of the first MetaStore service corresponding to the first MetaStore service; Based on the remote access link number of the first MetaStore service, set the remote access link of the first MetaStore service corresponding to all first node groups of the original production cluster in the MySQL database. Obtain the remote access link number of the second MetaStore service corresponding to the second MetaStore service; Based on the remote access link number of the second MetaStore service, set the remote access link of the second MetaStore service for all second node groups in the data lake cluster in the MySQL database.
6. The cluster deployment method according to claim 5, characterized in that, After the step of setting the remote access link of the second MetaStore service for all second node groups of the data lake cluster in the MySQL database based on the remote access link number of the second MetaStore service, the method further includes: Determine whether a data calculation request triggered by a user corresponding to a remote access link to a target MetaStore service has been received; wherein, the remote access link to a target MetaStore service includes either the first remote access link to a MetaStore service or the second remote access link to a MetaStore service, and the data calculation request carries a query data identifier; If so, extract the query data identifier from the data calculation request; Retrieve the target data table data corresponding to the query data identifier from the MySQL database; The target data table data is loaded into a preset memory for calculation and processing to obtain the corresponding calculation results; The calculation results are then pushed to the user.
7. A cluster deployment device, characterized in that, include: The first setup module is used to set up the first management node group of the preset data lake cluster; The second construction module is used to build the first data storage node group of the data lake cluster. The deployment module is used to deploy a pre-defined MySQL database; The first processing module is used to configure a remote connection service corresponding to the MySQL database on the physical machine of the second data storage node group of the original production cluster, and to store the first metadata information in the second data storage node group into the MySQL database. The second processing module is used to determine the first MetaStore service corresponding to all first node groups of the original production cluster, and to establish a connection between the first MetaStore service and the MySQL database; wherein, the first node group includes the second data storage node group and the second management node group of the original production cluster; The third processing module is used to configure a remote connection service corresponding to the MySQL database on the physical machine of the first data storage node group of the data lake cluster, and to store the second metadata information in the first data storage node group into the MySQL database. The fourth processing module is used to determine the second MetaStore service corresponding to all second node groups of the data lake cluster, and to establish a connection between the second MetaStore service and the MySQL database; wherein, the second node group includes the first data storage node group and the first management node group; The first construction module includes: The first acquisition submodule is used to acquire the first performance threshold corresponding to the preset first computing performance requirement; The second acquisition submodule is used to acquire the first physical unit whose computing performance is greater than the first performance threshold. The first construction submodule is used to build the first management node group of the data lake cluster based on the first physical unit.
8. A computer device comprising a memory and a processor, the memory storing computer-readable instructions, wherein the processor, when executing the computer-readable instructions, implements the steps of the cluster deployment method as described in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the cluster deployment method as described in any one of claims 1 to 6.