Guarantee method and system for high availability of virtual instance of electric power system

A virtual instance, power system technology, applied in transmission systems, digital transmission systems, electrical components, etc., can solve problems such as lack of judgment basis, hidden dangers of business stability, and inability to ensure uninterrupted operation of virtual machine services, and achieve real-time operation status The effect of monitoring and reducing the burden of inspection work

Active Publication Date: 2018-01-05
INFORMATION COMM COMPANY STATE GRID SHANDONG ELECTRIC POWER +1
6 Cites 5 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0004] (1) Low degree of automation in physical fault discovery and judgment
[0005] At present, the discovery of physical faults relies on manual or decentralized monitoring platforms, and there is no unified hardware fault discovery management for cloud computing environments; even when hardware faults occur, there is no complete basis for judgment, and the fault can be clearly analyzed The impact on the virtual machine that carries the business system on the cloud platform, such as whether hard disk failure or network failure will cause a catastrophic event, and whether virtual machine migration or recovery is required often depends on manual experience
[0006] (2) Lack of effective virtual machine automatic migration and recovery mechanism
[0007] When a fault o...
View more

Method used

(3) automatic migration recovery mechanism ensures high availability of virtual instances
Step 21, adopt IPMI, Ping or SNMP mode to initiate environmental monitoring and scan, scan the network environment and computing environment of cloud computing platform, realize the real-time monitoring to physical operation state, adopt second-level interval scan, and between two scans The time interval can be set;
[0174] (2) I...
View more

Abstract

The invention discloses a guarantee method and system for high availability of a virtual instance of an electric power system. The method comprises the following steps of step 1.initial configuration;step 2.real-time environment monitoring; step 3.streaming message communication; step 4.quintuple model analysis and judgment; and step 5.virtual instance migration. The system comprises an initial configuration module, a real-time environment monitoring module, a streaming message communication module, a quintuple model analysis and judgment module and a virtual instance migration module. According to the method and the system, the continuous operation of front-end access of an application system in a cloud environment is guaranteed; the system can scan actively; when a fault of a physical device is discovered, the system analyzes according to a quintuple model and judges whether the fault makes a fatal impact on the system operation; and when the system judges that the fault can affectthe operation of the virtual instance of the application system, the system starts a virtual machine migration and recovery mechanism, so that the continuous front-end access of the virtual instance of business is guaranteed.

Application Domain

Data switching networks

Technology Topic

Continuous operationVirtual machine +6

Image

  • Guarantee method and system for high availability of virtual instance of electric power system
  • Guarantee method and system for high availability of virtual instance of electric power system
  • Guarantee method and system for high availability of virtual instance of electric power system

Examples

  • Experimental program(3)

Example Embodiment

[0085] Example 1
[0086] Such as figure 1 As shown, the method for ensuring high availability of a virtual instance of a power system provided by this embodiment includes the following steps:
[0087] Step 1. Initialize configuration: Enter the server's IPMI management port information, server role information, and communication network information, and establish a basic database based on the entered information to store and manage the configuration cores of all virtual machines and physical machines in the cloud computing environment Associated information, synchronize information when the information changes;
[0088] Step 2. Real-time monitoring of the environment: real-time monitoring of the network and computing status information in the cloud computing environment;
[0089] Step 3. Streaming message communication: the status information collected by the real-time monitoring module is processed and transmitted in the manner of stream messages, the data is cached locally and the local cache is deleted after the message is processed;
[0090] Step 4: Analyze and judge the five-tuple model: use the five-tuple model to judge the real-time monitoring status data, analyze whether there is a serious failure of the computing node server in the existing cloud computing environment, and get the judgment mechanism of whether virtual instance migration is required , To transfer the migration trigger instruction to the migration module that needs to be migrated;
[0091] Step 5, virtual instance migration: complete the migration of the virtual instance through the process of shutting down the faulty device, reading and writing back of asset management module information, virtual instance creation, shared storage mounting, virtual instance configuration and recovery.
[0092] As a possible implementation of this embodiment, the specific process of step 1 is: input the basic environmental information of the cloud platform into the configuration file. The basic information includes the IPMI IP, user name, password of the server to be monitored, and the server The role in the cloud platform, as well as the VLAN information and network card information of the network in the cloud platform, and configure the communication link to make the monitored network IP reachable.
[0093] As a possible implementation of this embodiment, the specific process of step 2 includes the following steps:
[0094] Step 21: Use IPMI, Ping or SNMP to initiate environmental monitoring scans to scan the network environment and computing environment of the cloud computing platform to achieve real-time monitoring of the physical operation status. Second-level interval scans are used, and the time interval between two scans Can be set;
[0095] Step 22, in the process of scanning the network environment and computing environment, the following parameters are obtained:
[0096] (1) Compute the power state of the node server,
[0097] (2) Virtual access to the operating status of the storage network,
[0098] (3) Manage the operating status of the network,
[0099] (4) The operating status of the production network,
[0100] (5) Whether the computing node is running this storage;
[0101] Step 23: Push the collected status data to the next step.
[0102] As a possible implementation of this embodiment, the specific process of step 3 includes the following steps:
[0103] Step 31: Receive the collected status data in a streaming message mode, and prepare for subsequent message processing and analysis through message aggregation and classification of the message channel;
[0104] Step 32: The transmission channel caches the data. By setting the cache on the local disk, the data cached on the local disk is deleted after being processed.
[0105] As a possible implementation of this embodiment, the specific process of step 4 includes the following steps:
[0106] Step 41: Filter the status information, remove all non-computing node status, and make a judgment on the integrity of the data, and then start the quintuple analysis model;
[0107] Step 42: Perform a quintuple model analysis on the status information; using the quintuple model to read and analyze the data includes the following steps:
[0108] 1) Read the IPMI power status detected by IPMI tools, if the Power status is off, skip directly to step 5), otherwise continue to step 2) to determine;
[0109] 2) Read the status of the management network (status information obtained by Ping the management network IP);
[0110] 3) Read the status of the production network (status information obtained by executing the command after ssh to the target machine);
[0111] 4) Read the status of the virtual machine accessing the back-end storage network;
[0112] 5) The five-tuple status data (IPMI power status, virtual machine access back-end storage network status, management network status, production network status, whether it is also used as a storage node) read out in the above steps are aggregated, and each group of status data has On , OFF and UNKNOWN three available values;
[0113] 6) According to all the readout results, obtain the overall status statistical information, and import it into the five-tuple model for analysis; the total number of failures of each detection index exceeds the threshold and log, and the log contains the multivariate status of the entire system. Do the processing. Otherwise continue;
[0114] 7) After the analysis of the five-tuple model is finished, the result will be spit out. If it is normal, it will return to wait for the next round of analysis and judgment; if the state is abnormal, it will be checked again and compared with the current state to prevent misjudgment;
[0115] 8) If the second test result is consistent with the first test result, and is the same abnormal state, it will be marked as abnormal and the migration will proceed;
[0116] Step 43: If migration is required, relevant information is packaged to generate a migration message, and migration is triggered.
[0117] As a possible implementation of this embodiment, the specific process of step 5 includes the following steps:
[0118] Step 51: Perform shutdown processing on the failed physical device through the IPMI network;
[0119] Step 52: Query and read the virtual instance information from the MySQL asset database according to the physical machine information passed by the migration message;
[0120] Step 53: Automatically rebuild the virtual instance according to the virtual instance information;
[0121] Step 54: After the virtual instance is built, according to the virtual instance information, use distributed shared storage technology to implement the mounting of related resources through script execution, and complete the network configuration work, thereby completing the migration of the virtual instance;
[0122] Step 55: After the virtual instance migration is completed, a log record is generated.
[0123] In the cloud computing data center environment, the technical solution of this embodiment can automatically detect related information, transmit information, start an automatic judgment mechanism based on the model, and perform migration according to the judgment result when a server hardware failure occurs or is damaged. When the failure or damage occurs, the automatic migration of the virtual instance is started; the whole process will effectively ensure the high availability of the virtual instance, and further ensure the uninterrupted operation of the front-end access response of the business application system.

Example Embodiment

[0124] Example 2
[0125] Such as figure 2 As shown, the system for ensuring high availability of power system virtual instances provided by this embodiment includes:
[0126] The initial configuration module is used to enter the server’s IPMI management port information, server role information and communication network information, and establish a basic database based on the entered information. The basic database is used to store and manage all virtual machines and physical devices in the cloud computing environment. The configuration of the machine core related information, and the information is synchronized when the information changes;
[0127] The real-time environmental monitoring module is used for real-time monitoring of the network and computing status information in the cloud computing environment;
[0128] The streaming message communication module is used to process and transmit the status information collected by the real-time monitoring module in the manner of stream messages, to cache the data locally and delete the local cache after the message is processed;
[0129] The quintuple model analysis and judgment module is used to judge the real-time monitoring status data through the quintuple model, analyze whether a serious failure of the computing node server has occurred in the existing cloud computing environment, and obtain a judgment mechanism for whether virtual instance migration is required , To transfer the migration trigger instruction to the migration module that needs to be migrated;
[0130] The virtual instance migration module is used to complete the migration of the virtual instance through the process of shutting down the faulty device, reading and writing back information of the asset management module, virtual instance creation, shared storage mounting, virtual instance configuration and recovery.
[0131] As a possible implementation of this embodiment, the real-time environmental monitoring module includes:
[0132] The environmental monitoring scanning module is used to initiate environmental monitoring scanning using IPMI, Ping or SNMP, scanning the network environment and computing environment of the cloud computing platform, and realizing real-time monitoring of the physical operation status, using second-level interval scanning, and two scanning The time interval between can be set;
[0133] The parameter acquisition module is used to acquire the following parameters in the process of scanning the network environment and computing environment: (1) the power status of the computing node server, (2) the operating status of the virtual access storage network, (3) the operating status of the management network, (4) The operating status of the production network, (5) Whether the computing node is running this storage;
[0134] The data push module is used to push the collected status data to the streaming message communication module.
[0135] As a possible implementation of this embodiment, the quintuple model analysis and judgment module includes:
[0136] Information filtering module, used to filter status information, remove all non-computing node status, and make judgments on the integrity of data;
[0137] The information analysis module is used to analyze the status information with a five-tuple model;
[0138] The migration trigger module is used to package related information that needs to be migrated to generate a migration message, and trigger the virtual instance migration module to perform migration work.
[0139] As a possible implementation of this embodiment, the virtual instance migration module includes:
[0140] Shutdown processing module, used to perform shutdown processing on faulty physical devices through the IPMI network;
[0141] The query and read module is used to query and read virtual instance information from the MySQL asset database according to the physical machine information passed by the migration message;
[0142] The reconstruction module is used to reconstruct the virtual instance according to the virtual instance information;
[0143] The mounting module is used to implement the mounting of related resources through script execution based on the virtual instance information, using distributed shared storage technology, and complete the network configuration work to complete the migration of the virtual instance after the virtual instance is built;
[0144] The log generation module is used to generate log records after the virtual instance migration is completed.
[0145] The technical solutions of the embodiments of the present invention ensure that the front-end access of the application system can run uninterruptedly in the cloud environment, and the system can actively scan. When a physical device failure is found, the five-tuple model of the present invention is used to analyze whether the failure is correct The fatal impact caused by system operation is judged. When it is judged that the fault will affect the operation of the virtual instance of the application system, the virtual machine migration and recovery mechanism will be activated to ensure uninterrupted front-end access to the business virtual instance.

Example Embodiment

[0146] Example 3
[0147] Such as image 3 with 4 As shown, as a specific application combining Embodiment 1 and Embodiment 2, this embodiment provides a system and method for ensuring high availability of a virtual instance of a power system.
[0148] 1. The structure of the guarantee system
[0149] Such as image 3 As shown, in the cloud computing data center environment, when the server hardware failure occurs or is damaged, the guarantee system can automatically detect relevant information, transmit information, and start an automatic judgment mechanism based on the model, and according to the judgment result When migration failure or damage occurs, the automatic migration of virtual instances is initiated; the entire process will effectively ensure the high availability of virtual instances, and further ensure the uninterrupted operation of the front-end access response of the business application system. The cloud computing data center environment includes a network resource pool, a computing resource pool, and a shared storage resource pool. The devices in the computing resource pool correspond to virtual machine VMs.
[0150] The specific structure of the guarantee system includes:
[0151] (1) Asset management module
[0152] The initial configuration of the asset management module is manually filled in by the operation and maintenance personnel, including the server's IPMI management port information, server role information, communication network information, etc. The asset management module will establish a basic database based on the input information for storage management cloud The configuration core related information of all virtual machines and physical machines in the computing environment. When the information changes, each system will synchronize the information to the asset management module.
[0153] (2) Real-time monitoring module
[0154] The real-time monitoring module runs in the background as a resident service to monitor the network and computing status information in the cloud computing environment in real time. It mainly uses IPMI (Intelligent Platform Management Interface), Ping, and SNMP (Simple Network Management). Protocol, Simple Network Management Protocol) and other methods to initiate environmental monitoring scans, scan the second interval, the time interval between two scans can be defined, the collector pushes the collected data to the message channel module.
[0155] (3) Message channel module
[0156] The message channel module processes and transmits the status information collected by the real-time monitoring module in the manner of stream messages. To prevent data loss during unprocessed periods, the data is cached locally, and the local cache is deleted after the message is processed. All the computing environment and network environment state data related to the quintuple analysis and judgment are transmitted to the quintuple judgment module through the message channel.
[0157] (4) Five-tuple judgment module
[0158] The five-tuple judgment module uses the five-tuple model to judge real-time monitoring status data, analyzes whether a serious failure of the computing node server has occurred in the existing cloud computing environment, and obtains a judgment mechanism for whether virtual instance migration is required. The migration trigger instruction is passed to the migration module that needs to be migrated. The five-tuple is the key module of this invention. During the test, it is found that if the design is unreasonable, the environment failure judgment is wrong, and an instance like the original virtual instance is started to compete for sharing. Storage causes data inconsistency. The design principle of this module is to prefer not to do any operation when the environmental situation cannot be judged.
[0159] (5) Virtual instance migration module
[0160] The virtual instance migration module is responsible for automating the entire process from accepting the migration instruction to completing the virtual instance migration. Specifically, it includes: shutdown of faulty equipment, asset management module information reading and writing, virtual instance creation, shared storage mounting, virtual instance configuration and recovery, etc.
[0161] 2. The specific process of the guarantee method is as follows:
[0162] Step 1: Real-time environmental monitoring
[0163] (1) In the initial situation, the operation and maintenance personnel need to manually enter the basic environmental information of the cloud platform into the configuration file, including the IPMI IP, user name, and password of the server to be monitored, the role of the server in the cloud platform, and the network in the cloud platform VLAN information, network card information, etc., and configure the communication between the controller and the cloud platform to make all the monitored network IP reachable.
[0164] (2) The controller uses IPMI, Ping, SNMP, etc. to initiate environmental monitoring and scanning to scan the network environment and computing environment of the cloud computing platform to realize real-time monitoring of the physical operation status. Scanning second-level interval, the collector on the controller side can set the time interval between two scans. This module uses a polling method, so it is recommended to increase the time interval when the cluster is particularly large.
[0165] (3) When scanning the network environment and computing environment, the controller's acquisition terminal will automatically obtain the following types of parameters:
[0166] Power status of compute node server
[0167] Operating status of virtual access storage network
[0168] Manage the operating status of the network
[0169] Operating status of the production network
[0170] Whether the compute node is running this storage
[0171] (3) The collector pushes the collected status data to the message channel, the collector does not buffer the data, and the data is pushed to the message channel in the form of messages.
[0172] Step 2: Streaming message communication
[0173] (1) Receive the data pushed by the collector on the controller side, receive it in a streaming message, and use the message aggregation and classification of the message channel to prepare for the subsequent message processing and analysis.
[0174] (2) To prevent data loss during transmission, the transmission channel can buffer the data. By setting it in the local disk cache, it can effectively solve the problem that the message is lost in a certain link in the transmission process. The data cached on the local disk is deleted after the quintuple analysis and judgment module obtains it, preventing a large amount of disk or storage space from being occupied.
[0175] Step 3: Analyze and judge the five-tuple model
[0176] (1) The status information is passed to the quintuple analysis model through the message channel. First, the status information is filtered, all non-computing node statuses are removed, and the integrity of the data is judged, and then the analysis model is started.
[0177] (2) Perform five-tuple model analysis, and the data reading analysis process is as follows:
[0178] Read the IPMI power status detected by IPMI tools, if the Power status is off, skip to step 5, otherwise continue to step 2 to judge;
[0179] Read the management network status (status information obtained by Ping the management network IP);
[0180] Read the status of the production network (status information obtained by executing the command after ssh to the target machine);
[0181] Read the virtual machine's access to the back-end storage network status;
[0182] The five-tuple status data (IPMI power status, virtual machine access back-end storage network status, management network status, production network status, whether it is also used as a storage node) read out in the above steps are aggregated, and each group of status data has three available values On,OFF,UNKNOWN.
[0183] According to all the readout results, the overall status statistics information is obtained and imported into the five-tuple model for analysis; the total number of failures of each detection index exceeds the threshold, and the log contains the entire system multi-group status, without processing. Otherwise, continue.
[0184] After the five-tuple model analysis is over, the result will be spit out, and if it is normal, it will return to wait for the next round of analysis and judgment; if the state is abnormal, it will be tested again and compared with this state to prevent misjudgment.
[0185] The second detection result is consistent with the first detection result, and when the abnormal state is the same, it is marked as abnormal and migration is performed.
[0186] If migration is needed, relevant information is packaged to generate a migration message, which is passed to the virtual instance migration module to trigger the migration. Of course, a complicated troubleshooting mechanism is added to the judgment module, that is, the total number of monitoring servers is compared with the number of servers that have failed at the same time. If the ratio or the number exceeds the threshold, it is judged as a failure of the entire cloud platform or the entire network. If the migration has no effect, no operation is performed, but an abnormal alarm is issued.
[0187] Step 4: Virtual instance migration
[0188] According to the analysis and judgment result of the five-tuple model, if it is determined that the virtual instance migration is required, the virtual instance migration process is started. The specific process is described as follows:
[0189] (1) The system shuts down the faulty physical device through the IPMI network. According to the logic, there can only be one virtual instance with the same IP and information in the entire operating environment, otherwise it will cause conflicts and cause more trouble, so shut down the faulty physical device. The device is the first step in virtual instance migration.
[0190] (2) Invoke the virtual instance asset management module, and query and read the virtual instance related information from the MySQL asset database based on the physical machine information passed by the migration message.
[0191] (3) Start the virtual instance recovery module, and automatically rebuild the virtual instance according to the virtual instance information read from the asset management module.
[0192] (4) After the virtual instance is built, based on the virtual instance information read from the asset management module, the distributed shared storage technology is used to implement the mounting of related resources through script execution, and complete network configuration and other tasks, thereby finally completing the virtual Instance migration.
[0193] (5) After the virtual instance migration is completed, a log record will be generated and related information will be written back to the asset management module.
[0194] Compared with the prior art, the present invention mainly achieves the following beneficial effects:
[0195] (1) Real-time monitoring of physical status
[0196] Realized real-time physical equipment operation status monitoring, automatic collection, and unified summary of multiple resource fault information, which reduces the burden of inspection work for operation and maintenance personnel.
[0197] (2) Intelligent fault judgment
[0198] When a physical equipment failure occurs, logical analysis is carried out through the five-tuple model, and the impact of the failure is judged in time, early warning of serious failures affecting the operation of the virtual machine business is carried out in time, and the virtual machine migration and recovery mechanism is activated.
[0199] (3) Automatic migration and recovery mechanism to ensure high availability of virtual instances
[0200] A complete virtual machine migration and recovery mechanism is designed. When a physical failure is confirmed, the mechanism will be activated to automatically realize the migration and recovery of virtual machines from the failed device to other physical devices, thereby ensuring the virtual instance's High availability, to further ensure uninterrupted front-end access to business virtual instances.
[0201] (4) High availability of the detection service itself
[0202] A highly available service architecture is designed and this detection service is put into it. The detection service itself can be switched between several servers to ensure that the normal operation of the virtual instance is not affected when a single physical server fails.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

GIS-based method and device for management and control of internal fire-fighting equipment in buildings

InactiveCN111228706AReduce the burden of inspection workImprove fire safety
Owner:BEIJING MININGLAMP SOFTWARE SYST CO LTD

Classification and recommendation of technical efficacy words

  • Reduce the burden of inspection work

GIS-based method and device for management and control of internal fire-fighting equipment in buildings

InactiveCN111228706AReduce the burden of inspection workImprove fire safety
Owner:BEIJING MININGLAMP SOFTWARE SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products