Backup method and apparatus, computer device, readable storage medium, and program product
By configuring three disaster recovery groups and an arbitration node in the vBRAS system, master control switching is achieved, which solves the problem of large-scale network failures caused by CP failures in traditional vBRAS backup methods, and improves the reliability of user access and the stability of the system.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CHINA TELECOM CORP LTD TECHNOLOGY INNOVATION CENTER
- Filing Date
- 2025-10-16
- Publication Date
- 2026-07-02
AI Technical Summary
In traditional vBRAS backup methods, the reliability of the CP is guaranteed by a 1:1 hot standby method in different data centers. This results in the primary and backup CPs being deployed on different core nodes. If one CP fails, the other CP will have problems, causing a large-scale network failure and low reliability of user access.
A backup system is adopted, with three disaster recovery groups configured. Each disaster recovery group uses one of the three CPs as the primary CP and the other two as backup CPs. When the primary CP fails, the arbitration node performs a master switch and selects a new primary CP to carry the UP pool, thereby achieving fault domain isolation and user load balancing.
In extreme failure scenarios, ensuring normal user access improves the availability of the CP and the reliability of user access, and limits the scope of failure occurrence.
Smart Images

Figure CN2025128143_02072026_PF_FP_ABST
Abstract
Description
Backup methods, devices, computer equipment, readable storage media, and program products
[0001] Related applications
[0002] This application claims priority to Chinese patent application filed on December 23, 2024, with application number 202411905824.3, entitled "Backup Method, Apparatus, Computer Equipment, Readable Storage Medium and Program Product", the entire contents of which are incorporated herein by reference. Technical Field
[0003] This application relates to the field of data communication technology, and in particular to a backup method, apparatus, computer equipment, computer-readable storage medium, and computer program product. Background Technology
[0004] The virtual broadband remote access server (vBRAS) with separate control and forwarding is an important service access network element in the metropolitan area network. The vBRAS control plane (vBRAS-CP) is responsible for user access and centralized management and control, while the vBRAS user plane (vBRAS-UP) is responsible for traffic forwarding.
[0005] In traditional backup scenarios, two disaster recovery groups are typically configured. Disaster recovery group 1 contains CP1 (primary) and CP2 (backup), and manages UP pool 1. Disaster recovery group 2 contains CP1 (backup) and CP2 (primary), and manages UP pool 2. However, in this configuration, the reliability of the CPs is guaranteed by a 1:1 hot standby method across different data centers, with the primary and backup CPs generally deployed on different core nodes. Since the number of users supported by each CP is typically in the millions, if one CP fails, the other will also experience problems, leading to widespread network failures and low reliability for user access. Summary of the Invention
[0006] Therefore, it is necessary to provide a backup method, apparatus, computer equipment, computer-readable storage medium, and computer program product to address the technical problems of the above methods, which are prone to causing large-scale network failures and have low reliability of user access.
[0007] This application provides a backup method in a first aspect, applied to a backup system. The backup system includes three CPs and three UP pools, configured with three disaster recovery groups. Each disaster recovery group has one CP as the primary CP and the other two as backup CPs. The primary CP in each disaster recovery group corresponds to one UP pool. The method includes: in the event of a failure of the current primary CP in a target disaster recovery group, receiving a master control switch request sent by at least one backup CP in the target disaster recovery group, wherein the target disaster recovery group is any one of the three disaster recovery groups; determining a new primary CP from the at least one backup CP according to the master control switch request; and sending the determination result to the new primary CP so that the new primary CP carries the UP pool corresponding to the target disaster recovery group.
[0008] In one embodiment, each CP deploys three user management virtual machines, and each user management virtual machine establishes a binding relationship with three UP pools respectively; each UP pool is actually carried by a user management virtual machine with a binding relationship in the primary CP of the corresponding disaster recovery group; the method further includes: sending a determination result to a target user management virtual machine in the new primary CP, so that the target user management virtual machine carries the UP pool corresponding to the target disaster recovery group; the target user management virtual machine is a user management virtual machine in the new primary CP that has established a binding relationship with the UP pool corresponding to the target disaster recovery group.
[0009] In one embodiment, the three CPs corresponding to each disaster recovery group are connected by a heartbeat; the method further includes: when at least one backup CP in the target disaster recovery group detects that the interruption duration of the heartbeat connection with the current primary CP exceeds a threshold, the current primary CP is determined to be faulty.
[0010] In one embodiment, the method further includes: upon receiving a master control switching request sent by two backup CPs in the target disaster recovery group, obtaining the current network status of each backup CP; and determining a new master CP from the two backup CPs based on the current network status of each backup CP.
[0011] In one embodiment, the method further includes: acquiring historical fault data of each backup CP; and determining a new primary CP from the two backup CPs based on the current network status of each backup CP and the historical fault data.
[0012] In one embodiment, the method is applied to an arbitration node in the backup system, the arbitration node being IP-connected to the three UP pools and the three CPs.
[0013] In a second aspect, this application also provides a backup device applied to a backup system, the backup system including three CPs and three UP pools, configured with three disaster recovery groups, each disaster recovery group having one CP as the primary CP and the other two CPs as backup CPs, the primary CP in each disaster recovery group corresponding to one UP pool, the device including: a receiving module, used to receive a master control switch request sent by at least one backup CP in the target disaster recovery group in the event of a failure of the current primary CP in the target disaster recovery group; the target disaster recovery group is any one of the three disaster recovery groups; a determining module, used to determine a new primary CP from the at least one backup CP according to the master control switch request; and a sending module, used to send the determination result to the new primary CP, so that the new primary CP carries the UP pool corresponding to the target disaster recovery group.
[0014] In one embodiment, each CP deploys three user management virtual machines, and each user management virtual machine establishes a binding relationship with three UP pools respectively; each UP pool is actually carried by a user management virtual machine with a binding relationship in the primary CP of the corresponding disaster recovery group; the sending module is further configured to: send a determination result to a target user management virtual machine in the new primary CP, so that the target user management virtual machine carries the UP pool corresponding to the target disaster recovery group, wherein the target user management virtual machine is a user management virtual machine in the new primary CP that has established a binding relationship with the UP pool corresponding to the target disaster recovery group.
[0015] In one embodiment, the three CPs corresponding to each disaster recovery group are connected by a heartbeat; the device also includes a detection module for determining that the current primary CP is faulty when at least one backup CP in the target disaster recovery group detects that the interruption duration of the heartbeat connection with the current primary CP exceeds a threshold.
[0016] In one embodiment, the determining module is further configured to: upon receiving a master control switching request sent by two backup CPs in the target disaster recovery group, obtain the current network status of each backup CP; and determine a new master CP from the two backup CPs based on the current network status of each backup CP.
[0017] In one embodiment, the determining module is further configured to: acquire historical fault data of each backup CP; and determine a new primary CP from the two backup CPs based on the current network status of each backup CP and the historical fault data.
[0018] In one embodiment, the above-described device is applied to an arbitration node in the backup system, the arbitration node being IP-connected to three UP pools and three CP pools.
[0019] In a third aspect, this application also provides a computer device. The computer device includes a memory and a processor. The memory stores a computer program, and the processor, when executing the computer program, performs the following steps: in the event of a failure of the current primary CP in a target disaster recovery group, receiving a master control switch request sent by at least one backup CP in the target disaster recovery group, wherein the target disaster recovery group is any one of three disaster recovery groups; determining a new primary CP from the at least one backup CP according to the master control switch request; and sending the determination result to the new primary CP, so that the new primary CP carries the UP pool corresponding to the target disaster recovery group.
[0020] This application also provides a non-volatile computer-readable storage medium in a fourth aspect. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, causes the processor to perform the following steps: in the event of a failure of the current primary CP in a target disaster recovery group, receiving a master control switch request sent by at least one backup CP in the target disaster recovery group, wherein the target disaster recovery group is any one of three disaster recovery groups; determining a new primary CP from the at least one backup CP according to the master control switch request; and sending the determination result to the new primary CP, so that the new primary CP carries the UP pool corresponding to the target disaster recovery group.
[0021] This application also provides a computer program product in a fifth aspect. The computer program product includes a computer program that, when executed by a processor, causes the processor to perform the following steps: in the event of a failure of the current primary CP in a target disaster recovery group, receiving a master control switch request sent by at least one backup CP in the target disaster recovery group, wherein the target disaster recovery group is any one of three disaster recovery groups; determining a new primary CP from the at least one backup CP according to the master control switch request; and sending the determination result to the new primary CP, so that the new primary CP carries the UP pool corresponding to the target disaster recovery group.
[0022] The aforementioned backup method, apparatus, computer equipment, storage medium, and computer program product, in the event of a failure of the current primary CP in the target disaster recovery group, receive a master control switch request sent by at least one backup CP in the target disaster recovery group, wherein the target disaster recovery group is any one of three disaster recovery groups; based on the master control switch request, determine a new primary CP from the at least one backup CP, and send the determination result to the new primary CP, so that the new primary CP carries the UP pool corresponding to the target disaster recovery group. This method, in the event of a failure of the current primary CP in the target disaster recovery group, ensures normal user access even if two CPs fail, by enabling available backup CPs to send master control switch requests to select a new primary CP, greatly improving the availability of CPs in extreme failure scenarios and enhancing the reliability of user access. Attached Figure Description
[0023] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the accompanying drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are merely some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without any creative effort.
[0024] Figure 1 is a schematic diagram of the backup method in the prior art.
[0025] Figure 2 is a flowchart illustrating the backup method in one embodiment of this application.
[0026] Figure 3 is a schematic diagram of the improved backup method in one embodiment of this application.
[0027] Figure 4 is a structural block diagram of the backup device in one embodiment of this application.
[0028] Figure 5 is an internal structural diagram of a computer device according to an embodiment of this application. Detailed Implementation
[0029] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0030] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in sequences other than those illustrated or described herein.
[0031] vBRAS is an important service access network element in metropolitan area networks, consisting of a control plane and a forwarding plane, and is deployed using a separation of control and forwarding. The control plane is the vBRAS Control Plane, abbreviated as CP; the forwarding plane is the vBRAS User Plane, abbreviated as UP.
[0032] In traditional scenarios, the division and connection relationship of CP disaster recovery groups are shown in Figure 1. Two disaster recovery groups are configured: Disaster Recovery Group 1 contains CP1 (primary) and CP2 (backup), and manages UP pool 1; Disaster Recovery Group 2 contains CP1 (backup) and CP2 (primary), and manages UP pool 2. In this configuration mode, each UP pool corresponds to only one disaster recovery group, i.e., only one CP primary-backup relationship, ensuring that users of a single UP pool are not distributed across different UPs, avoiding problems such as data fragmentation and management difficulties. The backup CP backs up the user information of the primary CP in real time.
[0033] However, in this approach, the reliability of the CP (Content Provider) is guaranteed by a 1:1 hot standby configuration across different data centers, with the primary and backup CPs typically deployed on different core nodes. The CP typically supports millions of users. Considering the extreme scenario where both core nodes fail (dual CP failure), all user data would be lost, requiring all users to redial and causing a widespread network outage with a broad impact.
[0034] Therefore, in order to reduce the impact of failures in extreme scenarios, this application proposes a control plane (CP) multi-active scheme. By dividing the virtual machines within the control plane (CP), it achieves fault domain isolation of the UP pool and user load balancing. At the same time, it combines data nodes and arbitration nodes to realize primary and backup arbitration and data synchronization between CP nodes in the event of a failure, which greatly improves the reliability of user access in extreme scenarios when both data nodes (DC) fail.
[0035] In one embodiment, as shown in Figure 2, a backup method is provided for a backup system. The backup system includes three CPs and three UP pools, and is configured with three disaster recovery groups. Each disaster recovery group uses one of the three CPs as the primary CP and the other two as backup CPs. The primary CP in each disaster recovery group corresponds to one UP pool. In this embodiment, the method includes the following steps S210 to S230.
[0036] Step S210: In the event of a failure of the current primary CP in the target disaster recovery group, receive a primary control switch request sent by at least one backup CP in the target disaster recovery group.
[0037] The target disaster recovery group is any one of the three disaster recovery groups.
[0038] In the specific implementation, the backup system includes three CPs and three UP pools, which can form three disaster recovery groups. For each disaster recovery group, the three CPs are configured as one primary CP and two backup CPs, and each disaster recovery group corresponds to one UP pool.
[0039] For example, let the three CPs be CP1, CP2, and CP3, and the three UP pools be UP pool 1, UP pool 2, and UP pool 3, then we have:
[0040] Within Disaster Recovery Group 1, CP1 is the primary component, while CP2 and CP3 are backups. The primary CP1 takes over UP pool 1.
[0041] Within Disaster Recovery Group 2, CP2 is the primary component, while CP1 and CP3 are backups. The primary CP2 takes over UP pool 2.
[0042] Within Disaster Recovery Group 3, CP3 is the primary component, while CP1 and CP2 are backups. The primary CP3 takes over UP pool 3.
[0043] Under normal circumstances, within disaster recovery group 1, primary CP1 sends real-time hot standby user data to backup CP2 and backup CP3 via the disaster recovery group channel. Within disaster recovery group 2, primary CP2 sends real-time hot standby user data to backup CP1 and backup CP2 via the disaster recovery group channel. Within disaster recovery group 3, primary CP3 sends real-time hot standby user data to backup CP1 and backup CP2 via the disaster recovery group channel.
[0044] Within each disaster recovery group, each CP can monitor the status of the other CPs. When any backup CP in a disaster recovery group detects a failure of the primary CP, it can send a master control switchover request to the arbitration node of the backup system, allowing the arbitration node to arbitrate and determine a new primary CP. For example, in CP disaster recovery group 1, if backup CP2 and backup CP3 detect a failure of the primary CP, they will send a master control switchover request to the arbitration node of the backup system.
[0045] Step S220: Based on the master control switching request, determine the new master CP from at least one backup CP.
[0046] In practice, after the arbitration node of the backup system receives a master control switch request from at least one backup CP, it can determine a new master CP from at least one backup CP through preset rules, or arbitrarily select a backup CP as the new master CP.
[0047] It is understood that in some embodiments, when both the primary CP and the backup CP in the target disaster recovery group fail, the arbitration node will receive a master control switch request sent by a backup CP. In this case, the backup CP can be directly determined as the new primary CP. That is, when two CPs fail, the other CP is directly determined as the primary CP to take over all users in the corresponding UP pool.
[0048] Step S230: Send the confirmation result to the new primary CP so that the new primary CP can carry the UP pool corresponding to the target disaster recovery group.
[0049] In practice, after a new primary CP is determined, the determination result can be sent to the new primary CP to notify it to perform a primary / backup switch and carry the users of the UP pool corresponding to the target disaster recovery group.
[0050] In the aforementioned backup method, in the event of a failure of the current primary CP in the target disaster recovery group, a master control switch request is received from at least one backup CP in the target disaster recovery group, where the target disaster recovery group can be any one of the three disaster recovery groups. Based on the master control switch request, a new primary CP is determined from at least one backup CP, and the determination result is sent to the new primary CP so that the new primary CP can carry the UP pool corresponding to the target disaster recovery group. This method, in the event of a failure of the current primary CP in the target disaster recovery group, ensures normal user access even if two CPs fail, by enabling available backup CPs to send master control switch requests to select a new primary CP. This significantly improves the availability of CPs in extreme failure scenarios and enhances the reliability of user access.
[0051] In one exemplary embodiment, each CP is equipped with three user management virtual machines, and each user management virtual machine is bound to one of the three UP pools; each UP pool is actually carried by the user management virtual machine bound to the main CP in the corresponding disaster recovery group.
[0052] The method further includes: sending the determination result to the target user management virtual machine in the new primary CP, so that the target user management virtual machine carries the UP pool corresponding to the target disaster recovery group; the target user management virtual machine is a user management virtual machine in the new primary CP that has established a binding relationship with the UP pool corresponding to the target disaster recovery group.
[0053] In some embodiments, each user virtual management machine is configured as one primary and one backup.
[0054] Specifically, as shown in Figure 3, each CP deploys three user-managed virtual machines (VMs), and each VM establishes a binding relationship with one of the three UP pools. It's important to note that establishing a binding relationship does not imply actual load balancing. The purpose of the binding relationship is to determine which VM will be selected for actual load balancing during primary / standby failover. When acting as a standby CP, even with binding relationships, it does not actually load any UP pools. In other words, actual load balancing occurs when each UP pool is handled by the VM with the binding relationship within the primary CP of the corresponding disaster recovery group.
[0055] For example, suppose that within each CP, user-managed virtual machine 1, user-managed virtual machine 2, and user-managed virtual machine 3 are bound to UP pool 1, UP pool 2, and UP pool 3 respectively, then:
[0056] If CP1 is the primary node in disaster recovery group 1, then virtual machine 1 within primary CP1 will host users of UP pool 1.
[0057] If CP2 is the primary instance in disaster recovery group 2, then virtual machine 2 within primary CP2 will host users of UP pool 2.
[0058] If CP3 is the primary instance in disaster recovery group 3, then virtual machine 3 within primary CP3 will host users of UP pool 3.
[0059] When the primary CP1 in disaster recovery group 1 fails and the backup CP2 switches to become the primary CP, virtual machine 1 within the new primary CP2 will host the users of UP pool 1. Similarly, when the primary CP3 in disaster recovery group 3 fails and the backup CP1 switches to become the primary CP, virtual machine 3 within the new primary CP1 will host the users of UP pool 3. In other words, each CP has a one-to-one binding relationship between its three virtual machines and three UP pools. The CP will use the corresponding virtual machine to host the corresponding UP pool depending on which disaster recovery group it serves as the primary CP for. For example, as the primary CP of disaster recovery group 1, virtual machine 1 will host UP pool 1; as the primary CP of disaster recovery group 2, virtual machine 2 will host UP pool 2; and as the primary CP of disaster recovery group 3, virtual machine 3 will host UP pool 3.
[0060] Therefore, the determination result of the primary CP is sent to the new primary CP, and the target user management virtual machine that has established a binding relationship with the UP pool corresponding to the target disaster recovery group is sent to the new primary CP, so that the target user management virtual machine can carry the UP pool corresponding to the target disaster recovery group.
[0061] In this embodiment, each Control Plane (CP) deploys three user management virtual machines (MPVMs), and each MPVM is bound to one of the three UP pools. Each UP pool is actually carried by the MPVMs bound to the primary CP in the corresponding disaster recovery group. By dividing the virtual machines within the Control Plane (CP), fault domain isolation of the UP pools can be achieved, thereby limiting the fault domain range when a fault occurs and improving the utilization of each CP, thus achieving the effect of user load balancing.
[0062] In an exemplary embodiment, the three CPs corresponding to each disaster recovery group are connected by a heartbeat; the method further includes: when at least one backup CP in the target disaster recovery group detects that the interruption duration of the heartbeat connection with the current primary CP exceeds a threshold, the current primary CP is determined to be faulty.
[0063] In the specific implementation, as shown in Figure 3, a heartbeat connection is established between each CP for real-time synchronization of CP working status. When the backup CP detects that the interruption duration of the heartbeat connection with the current primary CP exceeds the threshold, it can determine that the current primary CP is faulty and send a master control switchover request to the arbitration node of the backup system. The arbitration node then decides to promote the backup CP to the primary CP.
[0064] In this embodiment, a heartbeat connection is established between each CP, and heartbeat signals are sent periodically to ensure that their working status is synchronized. This method can provide timely feedback on the online status of the CP. Through timely fault detection and automatic switching, the system can still maintain normal operation in the event of a failure of the main CP, thereby improving the reliability of user access.
[0065] In an exemplary embodiment, the method further includes: upon receiving a master control switching request sent by two backup CPs in the target disaster recovery group, obtaining the current network status of each backup CP; and determining a new master CP from the two backup CPs based on the current network status of each backup CP.
[0066] In practice, when the received master control switchover request includes master control switchover requests for two backup CPs, a selection needs to be made between the two backup CPs. The selection criteria can be the current network status of each backup CP. The current network status can include the status of multiple evaluation indicators, such as the current network link status and CPU status. The network link status can include connectivity with other network devices, bandwidth utilization, packet loss rate, etc.; the CPU status can include CPU utilization, load, etc.
[0067] Specifically, a weight can be assigned to each evaluation indicator, and the scores of each evaluation indicator can be weighted to obtain a score for each backup CP. The backup CP with the highest score can then be selected as the new primary CP.
[0068] In this embodiment, the status of multiple backup CPs is evaluated by network status to select the primary CP, which can reduce the risk of secondary failures caused by improper selection, reduce the possibility of users being affected, and improve the security and stability of the system.
[0069] In an exemplary embodiment, the method further includes: acquiring historical fault data of each backup CP; and determining a new primary CP from the two backup CPs based on the current network status and historical fault data of each backup CP.
[0070] In practice, when selecting a primary CP from two backup CPs, in addition to the current network status of the two backup CPs, their historical fault data can also be considered. Historical fault data may include runtime, number of failures, and fault recovery time. The selection of a new primary CP is made by combining the current network status and historical fault data.
[0071] Specifically, a stability score can be determined for each backup CP based on historical failure data, and a performance score can be determined for each backup CP based on the current network status. A total score for each backup CP is obtained by weighting the stability and performance scores, and the backup CP with the higher total score is selected as the new primary CP.
[0072] In this embodiment, by combining the current network status and historical fault data, the reliability and performance of the backup CP can be evaluated more comprehensively, thereby improving the accuracy of selecting a new primary CP.
[0073] In one embodiment, this application proposes a control plane (CP) multi-active scheme, consisting of three CPs forming a CP multi-active backup group, with each CP containing one data node. An arbitration node is configured within the CP multi-active backup group. During normal operation, user data is distributed to different CPs at the granular level of user pools (UPs). Each CP deploys three user management virtual machines (MUs), each managing one UP pool. Under normal circumstances, UP pool 1 establishes inter-CU channel connections with all three CPs. UP pool 1 is managed by different user management virtual machines within the three CPs, and its hosting location is determined based on the initial device configuration. Simultaneously, the arbitration node is used to determine the primary / backup status of the CPs within the multi-active backup group in real time. The data node within each CP ensures that the user data replicas on the backup CP are consistent with those on the primary CP.
[0074] The arbitration node is connected to the three UP pools and three CPs via IP, with underlying network IP reachable. When the user management virtual machine of the primary CP in the multi-active backup group fails, the user management virtual machine of the backup CP issues a request to become the new leader to the arbitration node. The arbitration node makes a decision on becoming the new leader based on the current network link status, CPU status, and other information of the backup CP, and returns the decision result to the user management virtual machine of the backup CP. The user management virtual machine in the backup CP then re-manages the UP pool according to the decision result.
[0075] To facilitate understanding of the embodiments of this application by those skilled in the art, the following will provide further explanation with reference to the example in Figure 3. As shown in Figure 3, there are 3 CPs and 3 UP pools. Each CP contains 3 user-managed virtual machines (typically, the virtual machines themselves are one primary and one backup). UP pool 1 belongs to disaster recovery group 1 and is managed by virtual machine 1 in disaster recovery group 1. CP1 is the primary device, and CP2 / CP3 are backup devices. UP pool 2 belongs to disaster recovery group 2 and is managed by virtual machine 2 in disaster recovery group 2. CP2 is the primary device, and CP1 / CP3 are backup devices. UP pool 3 belongs to disaster recovery group 3 and is managed by virtual machine 3 in disaster recovery group 3. CP3 is the primary device, and CP1 / CP2 are backup devices.
[0076] Taking UP pool 1 as an example: All UPs and 3 CPs in the pool establish CU channels. CU channels are used for the transmission of control and user information between CUs. CPs establish disaster recovery channels with each other. Backup CPs synchronize user information from the primary CP in real time through the disaster recovery channels. At this time, the data replicas in the user data nodes within the three CPs remain consistent. CP2 / CP3 act as disaster recovery backups for CP1, establishing a disaster recovery group connection with CP1 (the disaster recovery group connection is different within different disaster recovery groups, and the disaster recovery group connection requires specifying the IP address of the peer device). CPs configure heartbeat channels to establish heartbeat connections and perform heartbeat detection for real-time synchronization of CP working status. When the heartbeat connection between the primary CP and both backup CPs times out, a third-party arbitration node determines that the backup CP should become the primary CP.
[0077] In the event of a primary CP failure within disaster recovery group 1 (single CP failure scenario), if backup CP2 and backup CP3 detect an interruption in their heartbeat connection with the primary CP1, then the primary CP is confirmed to have failed. The arbitration node determines whether backup CP2 or backup CP3 will become the new primary CP (assuming backup CP2 is promoted to primary).
[0078] If CP2 fails after being promoted to the primary CP (a dual CP failure scenario), then the third CP3 will be directly promoted to the primary CP, and the users of UP pool 1 carried by CP2 will be switched to virtual machine 1 of CP3 to protect users from being lost.
[0079] The proposed solution has the following advantages: (1) It ensures that users have a balanced load distribution in the multi-active backup group and improves the utilization of each CP; (2) It improves the availability of CP under extreme failure scenarios and can cope with dual CP failure and dual DC (data node) power failure scenarios; (3) It limits the fault domain range when a failure occurs.
[0080] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0081] Based on the same inventive concept, this application also provides a backup device for implementing the backup method described above. The solution provided by this device is similar to the solution described in the above method; therefore, the specific limitations in one or more backup device embodiments provided below can be found in the limitations of the backup method described above, and will not be repeated here.
[0082] In one embodiment, as shown in Figure 4, a backup device is provided for use in a backup system. The backup system includes three CPs and three UP pools, and is configured with three disaster recovery groups. Each disaster recovery group uses one of the three CPs as the primary CP and the other two as backup CPs. The primary CP in each disaster recovery group corresponds to one UP pool. The device includes: a receiving module 410, a determining module 420, and a sending module 430, wherein:
[0083] The receiving module 410 is used to receive a master control switch request sent by at least one backup CP in the target disaster recovery group in the event of a failure of the current master CP in the target disaster recovery group; the target disaster recovery group is any one of the three disaster recovery groups;
[0084] The determination module 420 is used to determine a new primary CP from at least one backup CP based on the primary control switching request; and
[0085] The sending module 430 is used to send the determination result to the new primary CP so that the new primary CP can carry the UP pool corresponding to the target disaster recovery group.
[0086] In one embodiment, each CP deploys three user management virtual machines, and each user management virtual machine establishes a binding relationship with three UP pools respectively; each UP pool is actually carried by a user management virtual machine with a binding relationship in the primary CP of the corresponding disaster recovery group; the sending module 430 is also used to send the determination result to the target user management virtual machine in the new primary CP, so that the target user management virtual machine carries the UP pool corresponding to the target disaster recovery group; the target user management virtual machine is the user management virtual machine in the new primary CP that has established a binding relationship with the UP pool corresponding to the target disaster recovery group.
[0087] In one embodiment, the three CPs corresponding to each disaster recovery group are connected by a heartbeat connection; the device further includes a detection module for determining that the current primary CP is faulty when at least one backup CP in the target disaster recovery group detects that the interruption duration of the heartbeat connection with the current primary CP exceeds a threshold.
[0088] In one embodiment, the determining module 420 is further configured to, upon receiving a master control switching request sent by two backup CPs in the target disaster recovery group, obtain the current network status of each backup CP; and determine a new master CP from the two backup CPs based on the current network status of each backup CP.
[0089] In one embodiment, the determining module 420 is further configured to acquire historical fault data of each backup CP; and determine a new primary CP from the two backup CPs based on the current network status and historical fault data of each backup CP.
[0090] In one embodiment, the device is applied to an arbitration node in a backup system, and the arbitration node is IP-connected to three UP pools and three CP pools.
[0091] Each module in the aforementioned backup device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of the computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0092] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in Figure 5. The computer device includes a processor, memory, communication interface, display screen, and input device connected via a system bus. The processor of the computer device provides computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for wired or wireless communication with external terminals. Wireless communication can be achieved through Wi-Fi, mobile cellular networks, near-field communication (NFC), or other technologies. When the computer program is executed by the processor, it implements a backup method. The display screen of the computer device may be an LCD screen or an e-ink display screen. The input device of the computer device may be a touch layer covering the display screen, or buttons, a trackball, or a touchpad located on the casing of the computer device, or an external keyboard, touchpad, or mouse, etc.
[0093] Those skilled in the art will understand that the structure shown in Figure 5 is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or may combine certain components, or may have different component arrangements.
[0094] In one embodiment of this application, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.
[0095] In one embodiment of this application, a non-volatile computer-readable storage medium is also provided, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps in the above-described method embodiments.
[0096] In one embodiment of this application, a computer program product is also provided, including a computer program that, when executed by a processor, causes the processor to perform the steps in the above-described method embodiments.
[0097] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data shall comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0098] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.
[0099] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0100] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A backup method applied to a backup system, the backup system comprising three CPs and three UP pools, configured with three disaster recovery groups, each disaster recovery group having one CP as the primary CP and the other two CPs as backup CPs, the primary CP in each disaster recovery group corresponding to one UP pool, the method comprising: In the event of a failure of the current primary CP in the target disaster recovery group, receive a primary control switch request sent by at least one backup CP in the target disaster recovery group, wherein the target disaster recovery group is any one of the three disaster recovery groups; Based on the master control switching request, a new master CP is determined from the at least one backup CP; and Send the confirmation result to the new primary CP so that the new primary CP can carry the UP pool corresponding to the target disaster recovery group.
2. The method according to claim 1, wherein each CP is equipped with three user management virtual machines, and each user management virtual machine is respectively bound to the three UP pools; each UP pool is actually carried by the user management virtual machine bound to the primary CP in the corresponding disaster recovery group; The method further includes: The determination result is sent to the target user management virtual machine within the new primary CP, so that the target user management virtual machine carries the UP pool corresponding to the target disaster recovery group; The target user management virtual machine is the user management virtual machine within the new primary CP that has a binding relationship with the UP pool corresponding to the target disaster recovery group.
3. The method according to claim 1, wherein heartbeat connections are established between the three CPs corresponding to each disaster recovery group; the method further includes: When at least one backup CP in the target disaster recovery group detects that the interruption duration of the heartbeat connection with the current primary CP exceeds a threshold, the current primary CP is determined to be faulty.
4. The method according to claim 1, further comprising: Upon receiving a master control switchover request from two backup CPs in the target disaster recovery group, obtain the current network status of each backup CP; Based on the current network status of each of the backup CPs, a new primary CP is determined from the two backup CPs.
5. The method according to claim 4, further comprising: Obtain historical fault data for each backup CP; Based on the current network status and historical fault data of each backup CP, the new primary CP is determined from the two backup CPs.
6. The method according to any one of claims 1-5, wherein the method is applied to an arbitration node in the backup system, and the arbitration node is IP-connected to the three UP pools and the three CPs.
7. A backup device applied to a backup system, the backup system comprising three CPs and three UP pools, configured with three disaster recovery groups, each disaster recovery group having one CP as the primary CP and the other two CPs as backup CPs, the primary CP in each disaster recovery group corresponding to one UP pool, the device comprising: The receiving module is configured to receive a master control switch request sent by at least one backup CP in the target disaster recovery group in the event of a failure of the current master CP in the target disaster recovery group, wherein the target disaster recovery group is any one of the three disaster recovery groups. The determination module is used to determine a new primary CP from the at least one backup CP based on the primary control switching request; as well as The sending module is used to send the determination result to the new primary CP, so that the new primary CP can carry the UP pool corresponding to the target disaster recovery group.
8. The apparatus according to claim 7, wherein each CP is equipped with three user management virtual machines, and each user management virtual machine is respectively bound to three UP pools; each UP pool is actually carried by a user management virtual machine bound to the primary CP in the corresponding disaster recovery group; The sending module is further configured to: send the determination result to the target user management virtual machine within the new primary CP, so that the target user management virtual machine carries the UP pool corresponding to the target disaster recovery group, wherein the target user management virtual machine is a user management virtual machine within the new primary CP that has established a binding relationship with the UP pool corresponding to the target disaster recovery group.
9. The apparatus of claim 7, wherein the three CPs corresponding to each disaster recovery group are connected by a heartbeat connection; The device further includes a detection module, used to determine that the current primary CP is faulty when at least one backup CP in the target disaster recovery group detects that the interruption duration of the heartbeat connection with the current primary CP exceeds a threshold.
10. The apparatus according to claim 7, wherein the determining module is further configured to: upon receiving a master control switching request sent by two backup CPs in the target disaster recovery group, obtain the current network status of each backup CP; and determine a new master CP from the two backup CPs based on the current network status of each backup CP.
11. The apparatus of claim 10, wherein the determining module is further configured to: acquire historical fault data of each backup CP; and determine a new primary CP from the two backup CPs based on the current network status of each backup CP and the historical fault data.
12. The apparatus according to any one of claims 7-11, wherein the apparatus is applied to an arbitration node in the backup system, and the arbitration node is IP-connected to the three UP pools and the three CP pools.
13. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, implements the backup method of any one of claims 1 to 6.
14. A non-volatile computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, causes the processor to implement the backup method of any one of claims 1 to 6.
15. A computer program product comprising a computer program, wherein when executed by a processor, the computer program causes the processor to implement the backup method of any one of claims 1 to 6.