Methods for service recovery and performance enhancement, and operation and maintenance management system

An operation and maintenance management, low-performance technology, applied in the field of distributed systems, can solve problems such as service performance deterioration and service stop, and reduce economic losses.

Active Publication Date: 2017-09-19
ALIBABA (CHINA) CO LTD
View PDF8 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

That is to say, when the Paxos protocol is used to provide metadata service redundancy, if most metadata nodes stop serving, the entire service will be stopped even if there are still normal nodes.
In addition, if the performance of at least half of the nodes deteriorates, the performance of the entire service will also deteriorate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods for service recovery and performance enhancement, and operation and maintenance management system
  • Methods for service recovery and performance enhancement, and operation and maintenance management system
  • Methods for service recovery and performance enhancement, and operation and maintenance management system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] In this embodiment, a cluster of metadata nodes running the Paxos protocol or its derivative protocols in a distributed storage system is taken as an example. For ease of description, the metadata nodes in the embodiment are also referred to as nodes for short. The network architecture related to this embodiment is as figure 1 As shown, it includes: metadata node cluster, configuration center, operation and maintenance management system and client. There are 3 nodes in the metadata node cluster in the figure as an example, and multiple metadata nodes are divided into master node and slave node.

[0036] Master node: A node that provides external read and write services, converts write operations into modification logs, and synchronizes them to all slave nodes;

[0037] Slave node: Accept the log synchronized by the master node, and determine whether the log can be accepted according to the agreement. If it can be accepted, it will return the acceptance success and apply the ...

Embodiment 2

[0095] This embodiment also relates to a cluster of nodes running the synchronization state protocol, and a metadata node cluster running the Paxos protocol or its derivative protocols in a distributed storage system is also taken as an example. Its network architecture is like figure 1 Shown, not repeat them.

[0096] This embodiment is a performance improvement method proposed in view of the fact that there are many low-performance nodes in the node cluster, which leads to the deterioration of the service performance of the entire cluster, such as Figure 4 Shown, including:

[0097] Step 210: Determine low-performance nodes in the node cluster running the state synchronization protocol;

[0098] In this step, low-performance nodes can be determined according to the set node response speed and other indicators, for example, it can be determined by the administrator.

[0099] In step 220, when at least one low-performance node is successfully synchronized and the synchronization proc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to methods for service recovery and performance enhancement, and an operation and maintenance management system. The method for service recovery comprises: state changes of nodes in a node cluster with an operating state synchronization protocol are detected and the number NN of normal nodes is determined, wherein the NN is an integer; if the NN is changed from being larger than or equal to S0 to being smaller than the S0, emergency processing is carried out to recover a normal service, wherein when the normal nodes include a main node, the emergency processing is implemented as follows: the value of a parameter S stored by a configuration center and the normal node is changed to be an integer value smaller than or equal to the NN. The parameter S expresses the minimum number of successful synchronization nodes required by the node cluster for providing the normal service and the S0 is the parameter S's value determined based on the state synchronization protocol. Therefore, a disabled problem caused by simultaneous occurrence of hardware errors of a plurality of nodes can be solved.

Description

Technical field [0001] The present invention relates to a distributed system, and more specifically, to a method for service restoration and performance improvement of a distributed system and an operation and maintenance management system. Background technique [0002] At present, in large-scale distributed storage systems, in order to achieve centralized authority authentication and quota control, most of them adopt a centralized metadata management method, that is, the metadata of all data in the entire storage system is stored in several nodes for storage. The availability of metadata nodes (also called metadata servers, etc.) in this architecture is directly related to the availability of the entire system. In many distributed systems, redundancy is used to increase the availability of metadata services. [0003] The redundant method will introduce multiple nodes, and the state synchronization protocol must be used between nodes to ensure that the decisions made at any time ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/24
CPCH04L41/0681H04L41/0661
Inventor 姚文辉刘俊峰黄硕朱家稷
Owner ALIBABA (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products