Monitoring alarm method, device and equipment and computer readable storage medium
By splitting the objects to be monitored and generating alarm information using a preset monitoring and alarm system, and writing the indicator data to the tenant, the problems of data compression and alarm delay in traditional monitoring methods are solved, and zero-delay monitoring and alarm are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR TIANYUAN COMM INFORMATION SYST CO LTD
- Filing Date
- 2022-12-05
- Publication Date
- 2026-06-26
AI Technical Summary
In traditional monitoring methods, data backlog and alarm delays cause the consumer to consume and collect data at a rate slower than the collection rate, resulting in message backlog and untimely alarms.
The system splits the objects to be monitored based on their number, uses a pre-set monitoring and alarm system to monitor them, generates alarm information, writes the indicator data to the corresponding tenant, and writes the data to the target cluster via a remote write address, thus achieving horizontal scaling and zero-latency alarms.
It enables monitoring of multiple objects without alarm delay as the number of monitored objects increases, thus improving monitoring efficiency and accuracy.
Smart Images

Figure CN115981950B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer operation and maintenance monitoring, and in particular to a monitoring and alarm method, apparatus, device, and computer-readable storage medium. Background Technology
[0002] Operation and maintenance monitoring uses certain technical means to enable operation and maintenance personnel to understand the operating status of applications and hardware in a timely manner, identify bottlenecks affecting operation, promptly notify operation and maintenance personnel of operational failures, and assist operation and maintenance personnel in analyzing the causes of failures.
[0003] Traditional monitoring methods involve personnel collecting monitoring metric data and storing it in a database or message queue. The alarm module then reads and consumes this data, determining whether an alarm is needed based on metric thresholds. This method suffers from data backlog and alarm delays. When the rate at which the consumer processes the collected data is slower than the rate at which the data is collected, messages accumulate in the database or message queue, resulting in untimely alarms. Summary of the Invention
[0004] This invention provides a monitoring and alarm method, apparatus, device, and computer-readable storage medium to solve the technical problem of existing monitoring and alarm delays.
[0005] This invention provides a monitoring and alarm method, comprising:
[0006] The objects to be monitored are divided into monitoring classes based on the number of objects to be monitored.
[0007] The monitoring system is used to monitor each of the monitoring categories to obtain the indicator data of the monitoring categories.
[0008] When the preset monitoring and alarm system generates alarm information, the indicator data is written into the tenant corresponding to the monitoring class to obtain the monitoring result.
[0009] According to a monitoring and alarm method provided by the present invention, the monitoring objects are divided into monitoring classes based on the number of objects to be monitored, including:
[0010] Based on the number of objects to be monitored, the number of indicators, and the collection frequency, the objects to be monitored are divided into monitoring classes.
[0011] According to a monitoring and alarm method provided by the present invention, after monitoring each monitoring category through a preset monitoring and alarm system and obtaining the indicator data of the monitoring category, the method includes:
[0012] Deploy an alarm service and retrieve the resource identifier information of the monitoring object corresponding to the alarm service from the database;
[0013] Configure the resource identification information into the first file corresponding to the alarm service;
[0014] Configure the alarm template bound to the monitored object into the second file corresponding to the alarm service;
[0015] Alarm information is generated by a preset monitoring and alarm system and preset rules.
[0016] According to a monitoring and alarm method provided by the present invention, the step of generating alarm information through a preset monitoring and alarm system and preset rules includes:
[0017] If the time of the alarm information sent by the preset monitoring and alarm system remains unchanged, it is determined that the alarm service has been restored, and the alarm information is cleared.
[0018] According to a monitoring and alarm method provided by the present invention, when the preset monitoring and alarm system generates alarm information, writing the indicator data into the tenant corresponding to the monitoring class to obtain the monitoring result includes:
[0019] Configure a remote write address when the preset monitoring and alarm system generates alarm information;
[0020] The indicator data is written to the target cluster via the remote write address to obtain the monitoring results.
[0021] According to a monitoring and alarm method provided by the present invention, the step of writing the indicator data to the target cluster through the remote write address to obtain the monitoring result includes:
[0022] The indicator data is written to the same tenant in the target cluster, and the monitoring class of each indicator data is the same.
[0023] The metric data is written to multiple tenants in the target cluster, and the monitoring classes of each metric data are different.
[0024] The present invention also provides a monitoring and alarm device, comprising:
[0025] The monitoring object splitting module is used to split the monitoring objects based on the number of monitoring objects to obtain monitoring classes;
[0026] The indicator data determination module is used to monitor each of the monitoring categories through a preset monitoring and alarm system to obtain the indicator data of the monitoring categories.
[0027] The monitoring result determination module is used to write the indicator data into the tenant corresponding to the monitoring class when the preset monitoring alarm system generates alarm information, thereby obtaining the monitoring result.
[0028] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the monitoring and alarm method as described above.
[0029] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the monitoring and alarm method as described above.
[0030] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the monitoring and alarm method as described above.
[0031] The monitoring and alarm method, apparatus, device, and computer-readable storage medium provided by this invention first divide the objects to be monitored into monitoring classes based on the number of objects to be monitored. Then, each monitoring class is monitored through a preset monitoring and alarm system to obtain indicator data for each monitoring class. Finally, when the preset monitoring and alarm system generates alarm information, the indicator data is written into the tenant corresponding to the monitoring class to obtain the monitoring result. This enables the monitoring of multiple objects and can also be horizontally expanded as the number of monitored objects increases, with no alarm delay. Attached Figure Description
[0032] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0033] Figure 1 This is one of the flowcharts of the monitoring and alarm method provided by the present invention;
[0034] Figure 2 This is the second flowchart of the monitoring and alarm method provided by the present invention;
[0035] Figure 3 This is a schematic diagram of the monitoring and alarm device provided by the present invention;
[0036] Figure 4 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0037] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0038] The following is combined with Figures 1-2 The monitoring and alarm method of the present invention is described.
[0039] Please refer to Figure 1 This invention provides a monitoring and alarm method, comprising:
[0040] Step 100: Based on the number of objects to be monitored, the objects to be monitored are divided into monitoring classes;
[0041] Specifically, multi-cloud platform monitoring requires monitoring cloud platforms, physical machines, virtual machines, network devices, security devices, databases, and applications. Not only are there many monitoring types, but the number of objects monitored for each type is also substantial. First, the objects to be monitored are categorized according to monitoring type. If the number M of objects to be monitored in each category exceeds a certain value N, the objects to be monitored are split into monitoring categories M / N parts for separate monitoring. The value of N depends on the number of metrics and the collection frequency of the objects to be monitored. For example, if 10,000 data points are retrieved per second, and each machine collects approximately 200 metrics, and the collection frequency is 10 seconds, then approximately 200 / 10 = 20 metrics are collected per second. For 10,000 / 20 = 500 machines, we will split the objects to be monitored if the number of objects in each monitoring type exceeds 500.
[0042] Step 200: Monitor each of the monitoring categories through a preset monitoring and alarm system to obtain the indicator data of the monitoring categories;
[0043] Specifically, for each category of objects to be monitored after being split, a Prometheus system is used for monitoring. Prometheus is an open-source system monitoring and alarm system, which is the preset monitoring and alarm system in this embodiment. Prometheus features include: 1. Multi-dimensional data model; 2. Flexible query language; 3. Can be deployed directly locally without relying on other distributed storage; 4. Can discover target service objects through service discovery or static configuration; 5. Offers multiple visual graphical interfaces; 6. High-efficiency storage; 7. High availability, allowing for off-site data backup.
[0044] Step 300: When the preset monitoring and alarm system generates alarm information, the indicator data is written into the tenant corresponding to the monitoring class to obtain the monitoring result.
[0045] Specifically, in addition to generating alerts, each Prometheus instance must be configured with a remote write address to write collected metric data to a VictoriaMetrics cluster (a highly available, cost-effective, and scalable monitoring solution and time-series database). Data collected from the same object type is written to the same VictoriaMetrics tenant, while data collected from different types is written to different tenants.
[0046] In this embodiment, firstly, the objects to be monitored are divided into monitoring classes based on the number of objects to be monitored. Then, each monitoring class is monitored through a preset monitoring and alarm system to obtain the indicator data of each monitoring class. Finally, when the preset monitoring and alarm system generates alarm information, the indicator data is written into the tenant corresponding to the monitoring class to obtain the monitoring results. This realizes the monitoring of multiple objects, and can also achieve horizontal expansion as the number of monitored objects increases, with no alarm delay.
[0047] In one embodiment, the monitoring and alarm method provided in this application may further include:
[0048] Step 110: Based on the number of objects to be monitored, the number of indicators, and the collection frequency, the objects to be monitored are divided into monitoring classes.
[0049] Specifically, multi-cloud platform monitoring requires monitoring cloud platforms, physical machines, virtual machines, network devices, security devices, databases, and applications. Not only are there many monitoring types, but the number of objects monitored for each type is also substantial. First, the objects to be monitored are categorized according to monitoring type. If the number M of objects to be monitored in each category exceeds a certain value N, the objects to be monitored are split into monitoring categories M / N for separate monitoring. The value of N depends on the number of metrics for the objects to be monitored (i.e., the number of metrics in this example) and the collection frequency (i.e., the collection frequency in this example). For example, if 10,000 data points are retrieved per second, and approximately 200 metrics are collected per machine, and the collection frequency is 10 seconds, then approximately 200 / 10 = 20 metrics are collected per second. For 10,000 / 20 = 500 machines, we will split the objects to be monitored if the number of objects in each monitoring type exceeds 500.
[0050] This embodiment divides the objects to be monitored into monitoring categories by the number of objects, the number of indicators, and the collection frequency.
[0051] Please refer to Figure 2 In one embodiment, the monitoring and alarm method provided in this application may further include:
[0052] Step 210: Deploy the alarm service and retrieve the resource identifier information of the monitoring object corresponding to the alarm service from the database;
[0053] Step 220: Configure the resource identification information into the first file corresponding to the alarm service;
[0054] Step 230: Configure the alarm template bound to the monitored object into the second file corresponding to the alarm service;
[0055] Step 240: Generate alarm information through a preset monitoring and alarm system and preset rules.
[0056] Specifically, for each type of object to be monitored after being split, a Prometheus instance is used for monitoring, and an alarm service (i.e., the alarm service in this embodiment) is deployed simultaneously. The alarm service is responsible for retrieving the resource identifier and resource type configuration of the object to be monitored from the database and configuring them in the prometheus.yml file (i.e., the first file in this embodiment). The alarm template bound to the type of object to be monitored is configured in the prometheus rules.yml file (i.e., the second file in this embodiment). At the same time, it receives alarm messages sent by Prometheus, and according to the resource identifier and resource type in the tag, it searches for the corresponding resource information in the CMDB (Configuration Management Database) resource repository to bind the alarm with the resource. Then, the alarm is entered into the database, and alarm notification is sent according to the notification rules (i.e., the preset rules in this embodiment).
[0057] This embodiment solves the problem of alarm delay by generating alarm information through a preset monitoring and alarm system and preset rules.
[0058] In one embodiment, the monitoring and alarm method provided in this application may further include:
[0059] Step 250: If the time of the alarm information sent by the preset monitoring and alarm system remains unchanged, determine that the alarm service has been restored and clear the alarm information.
[0060] Specifically, when the alarm time sent by the preset monitoring and alarm system is the same as the time of the last alarm, it is determined that the alarm has been resolved and the alarm is cleared.
[0061] This embodiment uses a preset monitoring and alarm system to determine whether alarm information has been restored and to clear the alarm information.
[0062] In one embodiment, the monitoring and alarm method provided in this application may further include:
[0063] Step 310: When the preset monitoring and alarm system generates alarm information, configure the remote write address;
[0064] Step 320: Write the indicator data to the target cluster via the remote write address to obtain the monitoring results.
[0065] Specifically, in addition to generating alarms, each Prometheus instance must be configured with a remote write address to write the collected metric data to the VictoriaMetrics cluster (i.e., the target cluster in this embodiment). Data collected from the same type of object is written to the same tenant within VictoriaMetrics, while data collected from different types is written to different tenants.
[0066] In this embodiment, the indicator data is written to the target cluster via a remote write address to obtain the monitoring results.
[0067] In one embodiment, the monitoring and alarm method provided in this application may further include:
[0068] Step 321: Write the indicator data into the same tenant in the target cluster, and the monitoring class of each indicator data is the same;
[0069] Step 322: Write the indicator data into multiple tenants in the target cluster, with each indicator data having a different monitoring class.
[0070] Specifically, data for metrics with the same monitoring class are written to the same tenant in the Victoria Metrics cluster, while data for metrics with different monitoring classes are written to different tenants in the Victoria Metrics cluster. The number of tenants is the same as the number of monitoring classes, and each tenant stores only one type of monitoring class's metric data.
[0071] In this embodiment, the indicator data is written to the tenant in the target cluster via a remote write address to obtain the monitoring results.
[0072] The monitoring and alarm device provided by the present invention is described below. The monitoring and alarm device described below can be referred to in correspondence with the monitoring and alarm method described above.
[0073] Please refer to Figure 3 The present invention also provides a monitoring and alarm device, comprising:
[0074] The monitoring object splitting module 301 is used to split the monitoring objects based on the number of monitoring objects to obtain monitoring classes;
[0075] The indicator data determination module 302 is used to monitor each of the monitoring categories through a preset monitoring and alarm system to obtain the indicator data of the monitoring categories.
[0076] The monitoring result determination module 303 is used to write the indicator data into the tenant corresponding to the monitoring class when the preset monitoring alarm system generates alarm information, so as to obtain the monitoring result.
[0077] Optionally, the object-to-be-monitored splitting module includes:
[0078] The monitoring class determination unit is used to divide the objects to be monitored into monitoring classes based on the number of objects to be monitored, the number of indicators, and the collection frequency.
[0079] Optionally, the monitoring and alarm device includes:
[0080] The resource identification information acquisition module is used to deploy alarm services and obtain the resource identification information of the monitoring objects corresponding to the alarm services from the database.
[0081] The first configuration module is used to configure the resource identification information into the first file corresponding to the alarm service;
[0082] The second configuration module is used to configure the alarm template bound to the monitored object into the second file corresponding to the alarm service;
[0083] The alarm information generation module is used to generate alarm information based on preset monitoring and alarm systems and preset rules.
[0084] Optionally, the monitoring and alarm device includes:
[0085] The alarm information clearing module is used to determine that the alarm service has been restored and to clear the alarm information when the time of the alarm information sent by the preset monitoring and alarm system remains unchanged.
[0086] Optionally, the monitoring result determination module includes:
[0087] The remote write address configuration unit is used to configure the remote write address when the preset monitoring and alarm system generates alarm information;
[0088] The monitoring result determination unit is used to write the indicator data to the target cluster through the remote write address to obtain the monitoring result.
[0089] Optionally, the monitoring result determination unit includes:
[0090] The first indicator data writing unit is used to write the indicator data to the same tenant in the target cluster, and the monitoring class of each indicator data is the same.
[0091] The second indicator data writing unit is used to write the indicator data to multiple tenants in the target cluster, and the monitoring classes of each indicator data are different.
[0092] Figure 4 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 4 As shown, the electronic device may include a processor 410, a communications interface 420, a memory 430, and a communication bus 440. The processor 410, communications interface 420, and memory 430 communicate with each other via the communication bus 440. The processor 410 can call logical instructions stored in the memory 430 to execute monitoring and alarm methods.
[0093] Furthermore, the logical instructions in the aforementioned memory 430 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0094] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the monitoring and alarm methods provided by the above methods.
[0095] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the monitoring and alarm methods provided by the above methods.
[0096] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0097] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0098] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A monitoring and alarm method, characterized in that, include: The objects to be monitored are divided into monitoring classes based on the number of objects to be monitored. The monitoring system is used to monitor each of the monitoring categories to obtain the indicator data of the monitoring categories. When the preset monitoring and alarm system generates alarm information, the indicator data is written into the tenant corresponding to the monitoring class to obtain the monitoring result; The step of monitoring each of the monitoring categories through a preset monitoring and alarm system to obtain the indicator data of the monitoring categories includes: Deploy an alarm service and retrieve the resource identifier information of the monitoring object corresponding to the alarm service from the database; Configure the resource identification information into the first file corresponding to the alarm service; Configure the alarm template bound to the monitored object into the second file corresponding to the alarm service; Alarm information is generated by a preset monitoring and alarm system and preset rules; After generating alarm information through a preset monitoring and alarm system and preset rules, the process includes: determining that the alarm service has been restored and clearing the alarm information, provided that the time of the alarm information sent by the preset monitoring and alarm system remains unchanged.
2. The monitoring and alarm method according to claim 1, characterized in that, The monitoring objects are split based on the number of objects to be monitored, resulting in monitoring classes including: Based on the number of objects to be monitored, the number of indicators, and the collection frequency, the objects to be monitored are divided into monitoring classes.
3. The monitoring and alarm method according to claim 1, characterized in that, When the preset monitoring and alarm system generates alarm information, the indicator data is written into the tenant corresponding to the monitoring class to obtain the monitoring results, including: Configure a remote write address when the preset monitoring and alarm system generates alarm information; The indicator data is written to the target cluster via the remote write address to obtain the monitoring results.
4. The monitoring and alarm method according to claim 3, characterized in that, The step of writing the indicator data to the target cluster via the remote write address to obtain the monitoring results includes: The indicator data is written to the same tenant in the target cluster, and the monitoring class of each indicator data is the same. The metric data is written to multiple tenants in the target cluster, and the monitoring classes of each metric data are different.
5. A monitoring and alarm device, characterized in that, include: The monitoring object splitting module is used to split the monitoring objects based on the number of monitoring objects to obtain monitoring classes; The indicator data determination module is used to monitor each of the monitoring categories through a preset monitoring and alarm system to obtain the indicator data of the monitoring categories. The monitoring result determination module is used to write the indicator data into the tenant corresponding to the monitoring class when the preset monitoring alarm system generates alarm information, so as to obtain the monitoring result; The monitoring and alarm device also includes: The resource identification information acquisition module is used to deploy alarm services and obtain the resource identification information of the monitoring objects corresponding to the alarm services from the database. The first configuration module is used to configure the resource identification information into the first file corresponding to the alarm service; The second configuration module is used to configure the alarm template bound to the monitored object into the second file corresponding to the alarm service; The alarm information generation module is used to generate alarm information based on preset monitoring and alarm systems and preset rules; The monitoring and alarm device also includes: The alarm information clearing module is used to determine that the alarm service has been restored and to clear the alarm information when the time of the alarm information sent by the preset monitoring and alarm system remains unchanged.
6. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the monitoring and alarm method as described in any one of claims 1 to 4.
7. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the monitoring and alarm method as described in any one of claims 1 to 4.
8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the monitoring and alarm method as described in any one of claims 1 to 4.