A service cluster capacity evaluation method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By training a capacity assessment model for service clusters, and assessing the capacity of service clusters based on historical data and preset thresholds, the problem of high transformation costs and impact on downstream services in existing technologies is solved, and flexible and low-cost capacity assessment is achieved.

CN117667613BActive Publication Date: 2026-06-12BEIJING VOLCANO ENGINE TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING VOLCANO ENGINE TECH CO LTD
Filing Date: 2022-08-22
Publication Date: 2026-06-12

Application Information

Patent Timeline

22 Aug 2022

Application

12 Jun 2026

Publication

CN117667613B

IPC: G06F11/34; G06F11/30

CPC: G06F11/3447; G06F11/3006; G06F11/3051; Y02D10/00

AI Tagging

Application Domain

Hardware monitoring Energy efficient computing

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing methods for assessing service cluster capacity require modifying the service architecture, increasing costs, potentially impacting downstream services, and failing to respond to service changes in real time.

⚗Method used

By training a capacity assessment model based on historical data of service latency, processing resource utilization, and throughput, and setting a preset threshold, the capacity of the service cluster is assessed, thus avoiding modifications to the service cluster architecture and impacting downstream services.

🎯Benefits of technology

It achieves capacity assessment without modifying the service cluster architecture or affecting downstream services, enabling rapid response to service changes while balancing system utilization efficiency and user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117667613B_ABST

Patent Text Reader

Abstract

The present disclosure provides a service cluster capacity evaluation method, device, electronic equipment and storage medium, comprising: obtaining a capacity evaluation model of a target service cluster; determining a capacity value of the target service cluster based on a preset service delay threshold, a preset processing unit processing resource utilization threshold and the capacity calculation evaluation model for the target service cluster; the capacity value represents the maximum throughput of the target service cluster; the capacity evaluation model comprises a first capacity evaluation model and a second capacity evaluation model, the first capacity evaluation model is trained based on historical data of the processing unit processing resource utilization and historical data of the throughput; the second capacity evaluation model is obtained by training based on historical data of the service delay, historical data of the processing unit processing resource utilization and historical data of the throughput. Through the cluster capacity model, the cluster capacity can be calculated according to the service delay threshold and the processor utilization threshold set by the target cluster service, the cost of calculating the cluster capacity and the running risk generated by the service are reduced, and the latest cluster capacity can be obtained in time after the threshold changes configuration or service upgrade iteration.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, electronic device, and storage medium for capacity assessment of service clusters. Background Technology

[0002] In a clustered service architecture, cluster capacity is a crucial indicator for ensuring stable service delivery to users. Exceeding cluster capacity may lead to system congestion, resulting in degraded service quality. Conversely, operating services far below cluster capacity may lead to inefficient system operation and increased service costs. Therefore, service providers and developers need to understand the capacity of their service clusters to avoid system congestion or inefficiency. Current technologies typically assess service cluster capacity through stress testing. However, this method has several drawbacks. For example, it requires service modifications and resource configuration for stress testing, increasing costs; conducting stress tests in a production environment risks impacting downstream services; and service capacity changes with service iterations, necessitating retesting. Summary of the Invention

[0003] According to a first aspect of this disclosure, a capacity assessment method for a service cluster is provided, comprising:

[0004] Obtain the capacity assessment model for the target service cluster;

[0005] Based on the preset service latency threshold, the preset processing unit resource utilization threshold, and the capacity calculation and evaluation model for the target service cluster, the capacity value of the target service cluster is determined.

[0006] The target service cluster is configured with multiple service resources for providing services. Each service resource includes a processing unit for processing the target service cluster. The processing unit performs processing related to the target service cluster. The capacity value represents the maximum throughput of the target service cluster. The capacity assessment model is trained based on historical data of the target service cluster. The historical data includes historical data on service latency, historical data on the utilization rate of processing unit resources, and historical data on throughput. The capacity assessment model includes a first capacity assessment model and a second capacity assessment model. The first capacity assessment model is trained based on historical data on the utilization rate of processing unit resources and historical data on throughput. The second capacity assessment model is trained based on historical data on service latency, historical data on the utilization rate of processing unit resources, and historical data on throughput.

[0007] According to a second aspect of this disclosure, a capacity assessment apparatus for a service cluster is provided, comprising:

[0008] The acquisition module is used to obtain the capacity assessment model of the target service cluster;

[0009] The determination module is used to determine the capacity value of the target service cluster based on a preset service latency threshold, a preset processing resource utilization threshold, and the capacity assessment model.

[0010] The target service cluster is configured with multiple service resources for providing services, each service resource including processing resources for processing the target service cluster; the capacity value represents the maximum throughput of the target service cluster; the capacity assessment model is trained based on historical data of the target service cluster, including historical data of service latency, historical data of processing resource utilization, and historical data of throughput; the capacity assessment model includes a first capacity assessment model and a second capacity assessment model, the first capacity assessment model being trained based on historical data of processing resource utilization and historical data of throughput; the second capacity assessment model being trained based on historical data of service latency, historical data of processing resource utilization, and historical data of throughput.

[0011] According to a third aspect of this disclosure, an electronic device is provided, comprising:

[0012] Processor; and

[0013] Stored program memory,

[0014] The program includes instructions that, when executed by the processor, cause the processor to perform the method as described above.

[0015] According to a fourth aspect of this disclosure, a storage medium is provided storing computer instructions for causing the computer to perform the method as described above.

[0016] This disclosure provides a method, apparatus, electronic device, and storage medium for capacity assessment of a service cluster. Based on historical data of service latency, processing resource utilization, and throughput of the target service cluster, a capacity assessment model is trained. The capacity value is assessed based on the trained capacity assessment model according to pre-set service latency and processing resource utilization thresholds for the target service cluster. On the one hand, it does not require modification of the service cluster architecture or the request for additional resources. On the other hand, the assessment process is independent of the service cluster's business operations and will not affect downstream services. Furthermore, when services change or the service latency and processing resource utilization thresholds set for the service change, the capacity assessment result can be quickly obtained based on the trained capacity assessment model. Therefore, the technical solution provided by this disclosure can help service providers or developers understand the service capacity in a timely manner to ensure that service provision balances system utilization efficiency and user experience. Attached Figure Description

[0017] Further details, features, and advantages of this disclosure are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which:

[0018] Figure 1 A schematic diagram of a target service cluster system provided in this disclosure is shown;

[0019] Figure 2 A schematic diagram of a capacity assessment method provided in this disclosure is shown;

[0020] Figure 3 A schematic diagram of a method for evaluating a training capacity model according to the present disclosure is shown;

[0021] Figure 4 A schematic diagram of a capacity assessment apparatus provided in accordance with this disclosure is shown;

[0022] Figure 5 A schematic structural block diagram of a chip according to the present disclosure is shown;

[0023] Figure 6 A schematic structural block diagram is shown that can be used to implement an electronic device provided in this disclosure. Detailed Implementation

[0024] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0025] It should be understood that the steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect.

[0026] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below. It should be noted that the concepts of "first", "second", etc., used in this disclosure are only used to distinguish different devices, modules, or units, and are not intended to limit the order of functions performed by these devices, modules, or units or their interdependencies.

[0027] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0028] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0029] Before introducing the embodiments of this disclosure, the relevant terms involved in the embodiments of this disclosure are first defined as follows:

[0030] A server cluster is a group of multiple servers that collectively provide a specific service. These servers can process the service in parallel and respond to user service requests. The multiple servers can be physical servers or virtual servers. "Multiple" refers to two or more servers.

[0031] Service cluster: Services provided by a cluster of servers.

[0032] Capacity generally refers to the maximum throughput that a system can handle, which can be the system's maximum number of accesses, maximum traffic, or maximum load.

[0033] Throughput generally refers to the amount of data processed per unit of time. The specific processing actions will vary depending on the target. For example, for a port, it can be the amount of data sent and received; for a processor, it can be the number of calculations processed.

[0034] In a service cluster, multiple service resources are configured for the target service cluster to provide services requested by users. These service resources can be, for example, servers. Figure 1 In the network system 100 shown, terminal 101 obtains target cluster services provided by a server cluster consisting of servers 102 through network 103. The servers 102 can be physical servers or virtual servers, and each server 102 has processing resources for processing the services.

[0035] Deploying and providing service clusters by configuring server clusters enables service developers or providers to respond to large-scale user requests, expanding service capacity compared to services provided by a single service resource or server. From the perspective of service resource providers, it improves server cluster utilization, optimizes server cluster resource allocation, and reduces costs while supporting more high-capacity services. In some cases, the service developer or provider and the service resource configurer may be the same entity—for example, the service developer or provider provides service resources for its own services by configuring server clusters. Alternatively, they may be different entities—for example, the service developer or provider purchases or leases server cluster resources from a service resource provider to provide service resources for its own services.

[0036] Compared to the various problems and shortcomings of existing technologies, the solution proposed in this disclosure has the following advantages: Firstly, it does not require modification of the service cluster architecture or application for additional resources. Secondly, the evaluation process is independent of the service cluster's business and will not affect downstream services. Furthermore, when services change or when the service latency threshold and processing resource utilization threshold set for the service change, the capacity evaluation results can be quickly obtained based on the trained capacity evaluation model. Therefore, adopting the technical solution provided in this disclosure can help service providers or developers understand the service capacity in a timely manner to ensure that service provision can balance system utilization efficiency and user experience.

[0037] The following description of the scheme of this disclosure refers to the accompanying drawings, such as... Figure 2 As shown, embodiments of this disclosure provide a capacity assessment method for service clusters, including:

[0038] S201, Obtain the capacity assessment model of the target service cluster.

[0039] S202, based on the preset service latency threshold, the preset processing resource utilization threshold, and the capacity calculation model for the target service cluster, determine the capacity value of the target service cluster.

[0040] The target service cluster is configured with multiple service resources for providing services, and each service resource includes processing resources for processing the target service cluster.

[0041] The service resource can be a server, such as a physical server itself or a virtual server within a server; the processing resource can be a central processing unit or a graphics processing unit, and correspondingly, the processing resource can be a physical processor or a virtual processor resource.

[0042] The capacity value represents the maximum throughput of the target service cluster. Throughput can be expressed as queries per second (QPS) or transactions per second (TPS), and QPS and TPS can be the same or different. In some embodiments, a TPS may include at least one request to the service, meaning a TPS may include multiple QPS. Throughput can be defined using the maximum value of QPS or the maximum value of TPS.

[0043] When throughput is defined as requests per second (TPS) for the target service cluster, the corresponding capacity value of the target service cluster is the maximum value of requests per second that conforms to the capacity assessment model, which is also the maximum TPS value.

[0044] In some embodiments, capacity is evaluated using the throughput on a single service resource; in other embodiments, capacity can be evaluated using the sum of the throughput on all service resources; in some specific embodiments, when multiple service resources configured for a target service cluster are identical and the load is balanced among the multiple service resources, capacity can be evaluated using the throughput of a single service resource, in which case the throughput on all service resources can be regarded as the throughput on a single service resource multiplied by the number of service resources.

[0045] In a service cluster, a single service resource can also be referred to as a cluster instance.

[0046] The capacity assessment model is trained based on historical data of the target service cluster, including historical data on service latency, processing resource utilization, and throughput. The capacity assessment model comprises a first capacity assessment model and a second capacity assessment model. The first capacity assessment model is trained based on the historical data of processing resource utilization and throughput; the second capacity assessment model is trained based on the historical data of service latency, processing resource utilization, and throughput. The capacity value represents the maximum throughput of the target service cluster.

[0047] For a target service cluster, service latency and processing resource utilization are two crucial service parameters. Service latency reflects the length of time it takes to respond to service requests, impacting user experience, while processing resource utilization reflects the efficiency of service resource utilization. Ideally, processing resource utilization is high while service latency is low. However, in some situations, a trade-off must be made. For example, high processing resource utilization often means a high processing load and a large number of service requests, which may result in a larger queue of service requests and consequently, longer service latency. Therefore, the capacity of the target service cluster, i.e., its maximum throughput, should meet the configuration requirements of the service developer or provider for these two key parameters: service latency and processor utilization.

[0048] In some embodiments, the first capacity assessment model is a linear function relating processing resource utilization and throughput, which can be expressed, for example, by Little's Law.

[0049] In some embodiments, the processing resource utilization and throughput are respectively the processing resource utilization and throughput of the cluster instance.

[0050] In some embodiments, throughput is defined in terms of TPS or QPS.

[0051] In some embodiments, the first capacity assessment model can be expressed using equation (1):

[0052] ρ=k·λ+C (1)

[0053] Where ρ represents the processing resource utilization rate of the cluster instance; λ represents the number of requests per second of the cluster instance; k represents the first linear coefficient; and C represents the first constant.

[0054] The second capacity assessment model is a functional relationship between service latency and processing resource utilization and the throughput of cluster instances of the target service cluster; this functional relationship can be expressed, for example, through the M / G / 1 queuing model.

[0055] Based on the M / G / 1 queuing model, service latency is a linear weighted sum of processing resource latency and non-processing resource latency, which can be expressed by a second linear function. The second linear function is the sum of processing resource latency weighted by a second coefficient and non-processing resource latency weighted by a third coefficient, plus a second constant.

[0056] Wherein, the processing resource latency characterizes the latency generated by the processing resources processing the target service cluster; the non-processing resource latency characterizes the latency generated by the non-processing resources processing the target service cluster.

[0057] In some embodiments, the service latency, processing resource latency, and non-processing resource latency are respectively the service latency, processing resource latency, and non-processing resource latency of the cluster instance.

[0058] Non-processing resources may include at least one of the following: disk resources, network resources;

[0059] In some embodiments, processing resource latency can be expressed as an inverse proportional function of the product of the maximum processing capacity of the processing resource and the idle rate of the processing resource, wherein the sum of the idle rate of the processing resource and the utilization rate of the processing resource is 1; non-processing resource latency can be expressed as an inverse proportional function of the difference between the maximum processing capacity of the non-processing resource and the number of requests per second of the cluster instance; wherein the maximum processing capacity is used to characterize the number of requests per second of the target service cluster in the service resource when the resource utilization rate is 1.

[0060] In some embodiments, the second capacity assessment model can be further expressed by equation (2):

[0061]

[0062] Where R represents the service latency of the cluster instance, w1 represents the second coefficient, μ1 represents the number of requests per second that the cluster instance's processing resources are used for the target service cluster, w2 represents the third coefficient, μ2 represents the number of requests per second that the cluster instance's non-processing resources are used for the target service cluster, and Base represents the second constant, such as the residual.

[0063] The second capacity assessment model can be used to obtain the functional relationship between service latency and processing resource utilization and throughput.

[0064] By using equations (1) and (2), the functional relationships between processing resource utilization and throughput and between service latency and processing resource utilization and throughput are found respectively; where k, C, w1, w2 and Base are coefficients or constants, which can be obtained by training equations (1) and (2) with a large amount of historical data.

[0065] In some embodiments, k and C can be obtained using the least squares method.

[0066] In some embodiments, w1, w2, and Base can be obtained using gradient descent.

[0067] When given a preset processing resource utilization threshold and service latency threshold for the target service cluster, the limit value of throughput can be evaluated and determined by equations (1) and (2).

[0068] The disclosed solution assesses capacity based on a pre-obtained capacity assessment model and preset service latency and processing resource utilization thresholds for the target service cluster. It does not require additional load testing modifications to the service, nor does it affect downstream services. For service iteration and upgrades, repeated load testing is no longer necessary. Therefore, it has the advantages of low cost, no risk, and flexible updates.

[0069] For the scheme disclosed herein, the capacity assessment model includes a first capacity assessment model and a second capacity assessment model. In some cases, for a preset service latency threshold and a preset processing resource utilization threshold, a first capacity value and a second capacity value can be obtained respectively through the first capacity assessment model and the second capacity assessment model.

[0070] The capacity of the target service cluster can be selected from either the first capacity value or the second capacity value. In some cases, when the first capacity value and the second capacity value are not equal, the smaller value can be determined as the capacity value of the target service cluster.

[0071] By establishing two capacity assessment models for processing resource utilization and service latency, both service latency thresholds and processing resource utilization thresholds can be considered. When service providers set these thresholds, they may only consider user experience and resource utilization without understanding the relationship between the two. Therefore, it is possible that when the system throughput approaches either the processing resource utilization threshold or the service latency threshold, the other threshold may be exceeded. By determining different maximum throughput values through the two assessment models and selecting the smaller value as the capacity, it is possible to ensure that both processing resource utilization and service latency threshold parameters are considered, and to ensure that the system operates stably within the two pre-set thresholds.

[0072] This disclosure evaluates capacity based on two pre-set thresholds for two parameters. On the one hand, this ensures that the capacity value can simultaneously meet the configuration requirements of the key parameters of two different services. On the other hand, when the settings of the two thresholds change, the corresponding capacity can be updated and evaluated in a timely manner according to the capacity evaluation model.

[0073] The processing resource can be a central processing unit (CPU) or a graphics processing unit (GPU). The processing resource utilization rate represents the occupancy of the processor's processing resources, and may be related to the processing capacity of the processing resource itself and / or the load on the processing resource.

[0074] In some embodiments, multiple service resources, such as multiple servers, are configured for the target service cluster, wherein each service resource includes a corresponding processing resource to perform related processing, and the processing resource has its own processing parameters ρ. i , λ i R i μ1 i and In the target service cluster, the service resource provider may configure the same parameters for each processing resource for the purpose of load balancing and resource standardization, and keep the load of each processor nearly balanced through scheduling algorithms or load balancing algorithms, so that the processing parameter values of each processing resource are nearly the same.

[0075] In a service cluster, when the service resource is a server, a single server is also referred to as a cluster instance.

[0076] In some embodiments, ρ, λ, R, μ1, and μ2 can be averaged over the target set of services.

[0077] For example, Figure 1 The target service cluster is provided by three servers. Each server can be configured with one processor to execute the processing of the target cluster service. Each processor has its own set of parameters, namely {ρ', λ', R', μ1' and μ2'}, {ρ”, λ”, R”, μ1” and μ2”}, and {ρ”', λ”', R”', μ1”' and μ2”'}. In some embodiments, the above parameters are the average parameter values for all processors in the target service cluster. For example, in... Figure 1 In the embodiment, ρ=1 / 3*(ρ'+ρ"+ρ"'), λ=1 / 3*(λ'+λ"+λ"'), R=1 / 3*(R'+R"+R"'), μ1=1 / 3*(μ1'+μ1"+μ1"'), μ2=1 / 3*(μ2'+μ2"+μ2"').

[0078] Using average values makes it easier to train the model using historical data in the following methods, obtaining the values of k, C, w1, w2, and Base in the model.

[0079] In some embodiments, prior to step S201, the method further includes, for example, Figure 3 The method shown is for evaluating the training capacity of the model.

[0080] In S201, the historical data of the target service cluster includes historical data on service latency, historical data on processing resource utilization, and historical data on throughput.

[0081] The method trains the capacity assessment model based on historical data of service latency, historical data of processing resource utilization, and historical data of throughput.

[0082] Training the capacity assessment model may further include:

[0083] S301, Collect historical data of the target service cluster;

[0084] In some embodiments, the acquisition action may be based on a preset trigger, which may include at least one of the following methods: timed trigger, periodic trigger, conditional trigger, and manual trigger.

[0085] For timed triggering, a preset time can be set to trigger the acquisition action, such as 8:00 AM every day. For periodic triggering, a preset period can be set to trigger the acquisition action, such as every 96 hours. For conditional triggering, a preset condition can be set to trigger the acquisition action, such as the preset condition being that at least one of the latency data, processor utilization, and throughput data reaches its peak value within the last 96 hours, thus triggering the acquisition of the historical data. For manual triggering, a trigger control can be set to acquire the historical data in response to the user's activation of the control, such as an operation button.

[0086] The historical data mentioned above can be taken as the average of all processing resources serving the target service cluster. Accordingly, the historical service latency data is the historical average service latency data of the multiple service resources; the historical processing resource utilization data is the historical average utilization data of the multiple processing resources; and the historical throughput data is the historical average throughput data of the multiple service resources.

[0087] The collection of historical data from the target service cluster can be specifically implemented as collecting all or part of the historical data up to the time of the acquisition action. For example, it can be historical data between the current acquisition action and the previous acquisition action, historical data within a specified time period, such as historical data from the first week of August, or historical data within a preset time period up to the time of the acquisition action.

[0088] S302, Preprocess the historical data;

[0089] In computer network services, factors such as noise, errors, interference, and abnormal events can lead to errors or inaccuracies in the monitored data; the data transmission process itself can also introduce some errors or inaccuracies. Therefore, performing the preprocessing in step S302 before training the capacity assessment model using the historical data can make the obtained capacity assessment model closer to the real situation.

[0090] The preprocessing may include data denoising, data smoothing, and data pruning steps; wherein, data denoising may employ the STL (Seasonal-Trend decomposition procedure based on Loess) method, data smoothing may employ the Savitzky-Golay filter, and data pruning may be performed in at least one of the following ways: removing 0 values and removing data without fluctuations.

[0091] Using data samples that have undergone the above data preprocessing operations to train the cluster capacity model can make the obtained cluster capacity model more accurate.

[0092] S303, Train the capacity assessment model based on the preprocessed historical data;

[0093] In step S303, the cluster capacity model is trained based on the historical data of the target service cluster service, specifically by training the capacity assessment model through machine learning.

[0094] In some embodiments, k and C in the first capacity assessment model are determined using the least squares method; w1, w2, and Base in the second capacity assessment model are determined using the gradient descent method.

[0095] In some embodiments, to more accurately reflect the recent operating status of the system, the capacity assessment model can be updated periodically, for example, daily. The update training can be set in association with or not in association with the data acquisition action of S301. For example, training and updating can be based on historical data acquired each time, or the acquisition action and training action can each have their own trigger conditions.

[0096] S304, Save the trained capacity assessment model.

[0097] In step S304, the trained capacity assessment model is saved so that in step S201, the capacity assessment model can be obtained and used to calculate the capacity value.

[0098] The foregoing description introduces the solutions provided by the embodiments of this disclosure. It is understood that, in order to achieve the above functions, the server may include hardware structures and / or software modules corresponding to the execution of each function. Those skilled in the art should readily recognize that, based on the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein, this disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this disclosure.

[0099] This disclosure embodiment can divide the server into functional units according to the above method example. For example, it can divide each function into a separate functional module, or it can integrate two or more functions into one processing module. The integrated module can be implemented in hardware or as a software functional module. It should be noted that the module division in this disclosure embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0100] By dividing each functional module according to its corresponding function, an exemplary embodiment of this disclosure provides a capacity assessment apparatus for a service cluster, which can be a server or a chip applied to a server. Figure 4 A schematic block diagram of the functional modules of an apparatus according to an exemplary embodiment of the present disclosure is shown. Figure 4 As shown, the device 400 includes:

[0101] Module 401 is used to obtain the capacity assessment model of the target service cluster;

[0102] The determining module 402 is used to determine the capacity value of the target service cluster based on a preset service latency threshold, a preset processing resource utilization threshold, and the capacity assessment model.

[0103] The target service cluster is configured with multiple service resources for providing services, each service resource including processing resources for processing the target service cluster; the capacity value represents the maximum throughput of the target service cluster; the capacity assessment model is trained based on historical data of the target service cluster, including historical data of service latency, historical data of processing resource utilization, and historical data of throughput; the capacity assessment model includes a first capacity assessment model and a second capacity assessment model, the first capacity assessment model being trained based on historical data of processing resource utilization and historical data of throughput; the second capacity assessment model being trained based on historical data of service latency, historical data of processing resource utilization, and historical data of throughput.

[0104] Figure 5 A schematic block diagram of a chip according to an exemplary embodiment of the present disclosure is shown. Figure 5 As shown, the chip 500 includes one or more (including two) processors 501 and a communication interface 502. The communication interface 502 can support the server in performing the data transmission and reception steps in the above-described image processing method, and the processor 501 can support the server in performing the data processing steps in the above-described image processing method.

[0105] Optional, such as Figure 5 As shown, the chip 500 also includes a memory 503, which may include read-only memory and random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (NVRAM).

[0106] In some implementations, such as Figure 5 As shown, processor 501 executes corresponding operations by calling operation instructions stored in memory (which may be stored in the operating system). Processor 501 controls the processing operations of any terminal device; processor can also be called a central processing unit (CPU). Memory 503 may include read-only memory and random access memory, and provides instructions and data to processor 501. A portion of memory 503 may also include NVRAM. For example, in applications, memory, communication interfaces, and other components are coupled together via a bus system, which may include, in addition to a data bus, a power bus, a control bus, and a status signal bus, etc. However, for clarity, in... Figure 5 The general designated all buses as Bus System 504.

[0107] The methods disclosed in the embodiments of this disclosure can be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above methods can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, a digital signal processor (DSP), an ASIC, a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this disclosure. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this disclosure can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory; the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above methods.

[0108] Exemplary embodiments of this disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to cause the electronic device to perform a method according to an embodiment of this disclosure.

[0109] Exemplary embodiments of this disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a computer's processor, is used to cause the computer to perform a method according to embodiments of this disclosure.

[0110] Exemplary embodiments of this disclosure also provide a computer program product, including a computer program, wherein, when executed by a processor of a computer, the computer program is used to cause the computer to perform a method according to an embodiment of this disclosure.

[0111] refer to Figure 6The present invention describes a structural block diagram of an electronic device 600 that can serve as a server or client of the present disclosure, which is an example of a hardware device that can be applied to various aspects of the present disclosure. The electronic device is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0112] like Figure 6 As shown, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes based on a computer program stored in a read-only memory (ROM) 602 or a computer program loaded from a storage unit 608 into a random access memory (RAM) 603. The RAM 603 may also store various programs and data required for the operation of the device 600. The computing unit 601, ROM 602, and RAM 603 are interconnected via a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

[0113] Multiple components in electronic device 600 are connected to I / O interface 605, including: input unit 606, output unit 607, storage unit 608, and communication unit 609. Input unit 606 can be any type of device capable of inputting information to electronic device 600. Input unit 606 can receive input digital or character information and generate key signal inputs related to user settings and / or function control of electronic device. Output unit 607 can be any type of device capable of presenting information and may include, but is not limited to, a display, speaker, video / audio output terminal, vibrator, and / or printer. Storage unit 604 may include, but is not limited to, disk and optical disk. Communication unit 609 allows electronic device 600 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and / or chipsets, such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communication devices, and / or the like.

[0114] The computing unit 601 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above. For example, the method for determining service capacity in some embodiments may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and / or installed on the electronic device 600 via ROM 602 and / or communication unit 609. In some embodiments, the computing unit 601 may be configured to perform the method for determining service capacity by any other suitable means (e.g., by means of firmware).

[0115] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0116] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0117] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0118] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0119] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.

[0120] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of this disclosure are performed, in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, a terminal, a user equipment, or other programmable device. The computer program or instructions can be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another. For example, the computer program or instructions can be transferred from one website, computer, server, or data center to another website, computer, server, or data center via wired or wireless means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center integrating one or more available media. The available medium can be a magnetic medium, such as a floppy disk, hard disk, or magnetic tape; it can also be an optical medium, such as a digital video disc (DVD); or it can be a semiconductor medium, such as a solid-state drive (SSD).

[0121] Although this disclosure has been described in conjunction with specific features and embodiments, it will be apparent that various modifications and combinations can be made therein without departing from the spirit and scope of this disclosure. Accordingly, this specification and drawings are merely exemplary illustrations of the disclosure as defined by the appended claims and are to be considered as covering any and all modifications, variations, combinations, or equivalents within the scope of this disclosure. It is obvious that those skilled in the art can make various alterations and modifications to this disclosure without departing from its spirit and scope. Thus, this disclosure is also intended to include any such modifications and modifications that fall within the scope of the claims of this disclosure and their equivalents.

Claims

1. A capacity assessment method for service clusters, comprising: Obtain the capacity assessment model for the target service cluster; Based on the preset service latency threshold, preset processing resource utilization threshold, and the capacity assessment model for the target service cluster, the capacity value of the target service cluster is determined. The target service cluster is configured with multiple service resources for providing services, each service resource including processing resources for processing the target service cluster; the capacity value represents the maximum throughput of the target service cluster; the capacity assessment model is trained based on historical data of the target service cluster, including historical data of service latency, historical data of processing resource utilization, and historical data of throughput; the capacity assessment model includes a first capacity assessment model and a second capacity assessment model, the first capacity assessment model is trained based on historical data of processing resource utilization and historical data of throughput, and the first capacity assessment model represents a first relationship between the throughput of a cluster instance of the target service cluster and the processing resource utilization of the cluster instance; the second capacity assessment model is trained based on historical data of service latency, historical data of processing resource utilization, and historical data of throughput, and the second capacity assessment model represents a third relationship between the service latency of the cluster instance and the processing resource utilization and throughput of the cluster instance; The step of determining the capacity value of the target service cluster based on a preset service latency threshold, a preset processing resource utilization threshold, and the capacity assessment model includes: determining a first capacity value based on the preset processing resource utilization threshold and the first capacity assessment model; determining a second capacity value based on the preset service latency threshold, the processing resource utilization threshold, and the second capacity assessment model; and determining the smaller value between the first capacity value and the second capacity value as the capacity value.

2. The method according to claim 1, wherein the service resource further includes non-processing resources: The first relationship is expressed by a first linear function, which has a first coefficient and a first constant; The second capacity assessment model characterizes a third relationship between the service latency of the cluster instance and the processing resource utilization and throughput of the cluster instance, including: The second capacity assessment model characterizes a second relationship between the service latency of the cluster instance and the processing resource latency and non-processing resource latency of the cluster instance. The second relationship is expressed by a second linear function, which has a second coefficient for the processing resource delay, a third coefficient for the non-processing resource delay, and a second constant. The processing resource latency characterizes the latency generated by the processing resources in processing the target service cluster. The processing resource latency is expressed as an inverse proportional function based on the product of the maximum processing capacity of the processing resources and the idle rate of the processing resources. The sum of the idle rate of the processing resources and the processing resource utilization rate of the cluster instance is 1. The non-processing resource latency characterizes the latency generated by non-processing resources processing the target service cluster. The non-processing resource latency is expressed as an inverse proportional function based on the difference between the maximum processing capacity of the non-processing resources and the throughput of the cluster instance. The non-processing resources include at least one of the following: disk resources and network resources; The first coefficient, the first constant, the second coefficient, the third coefficient, and the second constant are obtained by training based on the historical data.

3. The method according to claim 2, wherein: The maximum processing capacity is used to characterize the number of requests per second when the resource utilization rate is 1.

4. The method according to any one of claims 1-3, wherein: The historical service latency data is the historical data of the average service latency of the multiple service resources; The historical data on the utilization rate of processing resources is the historical data of the average utilization rate of multiple processing resources. The historical throughput data is the historical average throughput data of the multiple service resources.

5. The method according to any one of claims 1-3, further comprising, before obtaining the capacity assessment model of the target service cluster: Collect historical data from the target service cluster; Preprocess the historical data; The capacity assessment model is trained based on the preprocessed historical data; Save the trained capacity assessment model.

6. The method according to claim 5, wherein the preprocessing of the historical data includes: The historical data is subjected to noise reduction processing; The historical data, after noise reduction, is then smoothed. The smoothed historical data is then cropped.

7. A capacity assessment device for a service cluster, comprising: The acquisition module is used to obtain the capacity assessment model of the target service cluster; The determination module is used to determine the capacity value of the target service cluster based on a preset service latency threshold, a preset processing resource utilization threshold, and the capacity assessment model. The target service cluster is configured with multiple service resources for providing services, each service resource including processing resources for processing the target service cluster; the capacity value represents the maximum throughput of the target service cluster; the capacity assessment model is trained based on historical data of the target service cluster, including historical data of service latency, historical data of processing resource utilization, and historical data of throughput; the capacity assessment model includes a first capacity assessment model and a second capacity assessment model, the first capacity assessment model is trained based on historical data of processing resource utilization and historical data of throughput, and the first capacity assessment model represents a first relationship between the throughput of a cluster instance of the target service cluster and the processing resource utilization of the cluster instance; the second capacity assessment model is trained based on historical data of service latency, historical data of processing resource utilization, and historical data of throughput, and the second capacity assessment model represents a third relationship between the service latency of the cluster instance and the processing resource utilization and throughput of the cluster instance; The step of determining the capacity value of the target service cluster based on a preset service latency threshold, a preset processing resource utilization threshold, and the capacity assessment model includes: determining a first capacity value based on the preset processing resource utilization threshold and the first capacity assessment model; determining a second capacity value based on the preset service latency threshold, the processing resource utilization threshold, and the second capacity assessment model; and determining the smaller value between the first capacity value and the second capacity value as the capacity value.

8. An electronic device, comprising: processor; as well as Stored program memory, The program includes instructions that, when executed by the processor, cause the processor to perform the method according to any one of claims 1-6.

9. A storage medium storing computer instructions for causing the computer to perform the method according to any one of claims 1-6.