Big data computing power resource dynamic allocation and load balancing optimization device and method
By constructing a unified abstraction and multi-dimensional dynamic profile of heterogeneous computing resources, and combining an improved temporal fusion Transformer model and a deep reinforcement learning scheduling model, the technical challenges of computing power scheduling and load balancing in large-scale heterogeneous big data clusters have been solved. This has enabled efficient unified resource invocation, accurate pre-allocation, and multi-dimensional optimization, ensuring business stability and rigid SLA guarantees.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JIANGXI MODERN POLYTECHNIC COLLEGE
- Filing Date
- 2026-04-29
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies in large-scale heterogeneous big data clusters suffer from problems such as difficulty in unified management and control of heterogeneous computing power, inaccurate demand forecasting, unbalanced multi-objective optimization, imprecise load balancing, and non-closed-loop scheduling system, which cannot meet the computing power scheduling and load balancing requirements in mixed business load scenarios.
By constructing a unified abstraction of heterogeneous computing resources and a multi-dimensional dynamic computing power profile, an improved time-series fusion Transformer model is used to predict business computing power demand. A deep reinforcement learning scheduling model is combined to achieve multi-objective optimization, and a decentralized scheduling strategy is adopted for node-level load balancing, thus constructing a full-link closed-loop optimization mechanism.
It achieves efficient and unified access to heterogeneous computing power, precise resource pre-allocation, multi-dimensional optimization, and node-level load balancing, ensuring the rigid guarantee of core business SLAs and improving the utilization rate of heterogeneous computing power and business stability.
Smart Images

Figure CN122309173A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of big data computing power scheduling and distributed cluster resource management, and in particular to a device and method for dynamic allocation and load balancing optimization of big data computing power resources. Background Technology
[0002] With the explosive growth of big data, artificial intelligence, and cloud computing technologies, the scale of enterprise-level big data clusters continues to expand. Heterogeneous computing power with multiple architectures such as CPU, GPU, NPU, and FPGA has become the mainstream deployment form for big data clusters. Business scenarios have also expanded from traditional offline batch processing to mixed load scenarios such as real-time stream computing, AI model training and inference, and interactive big data queries. Computing resource scheduling and load balancing technologies are core technologies that determine the service capabilities, resource utilization efficiency, operating costs, and business stability of big data clusters, directly determining the performance release capability of heterogeneous computing power and the quality of business services.
[0003] While current mainstream resource scheduling frameworks (such as Hadoop YARN, Kubernetes, Spark Standalone, etc.) have achieved basic heterogeneous computing resource scheduling and load balancing capabilities, they still have many unresolved technical shortcomings in practical engineering applications involving large-scale heterogeneous clusters and mixed business loads, as follows: 1. The existing framework only implements basic quota management for heterogeneous computing power, without deep unified abstraction and standardized encapsulation of computing resources for different hardware architectures. It cannot effectively shield the differences in underlying hardware, and upper-layer businesses need to carry out special adaptation development for different hardware architectures, which is costly and time-consuming. At the same time, the existing technology can only collect basic load data of nodes, lacking multi-dimensional dynamic quantitative evaluation of computing power nodes. It cannot build a full-dimensional computing power profile covering computing power performance, load status, energy efficiency, network transmission, and operational stability. The scheduling decision lacks accurate data support, resulting in serious computing power resource mismatch problems and the inability to fully release the peak performance of heterogeneous computing power.
[0004] 2. Most existing technologies use traditional time-series models such as ARIMA and single LSTM for computing power demand prediction. These models cannot effectively separate and integrate the periodic, trend, and sudden characteristics of business demands. They have large prediction errors for computing power demand in scenarios such as e-commerce promotions, sudden AI training tasks, and traffic peaks. They cannot achieve advance pre-allocation of computing power resources and have always failed to solve the core contradiction of "insufficient resources during business peaks and idle resources during business troughs". At the same time, existing technologies lack a sound business priority hierarchical management system and have not configured dedicated resource guarantee mechanisms for core businesses with high SLA requirements. Core businesses are easily preempted by low-priority tasks, which cannot achieve rigid guarantee of core business SLA and is prone to business failures.
[0005] 3. Most existing computing power scheduling technologies take "maximizing resource utilization" as the single optimization objective. A few multi-objective scheduling schemes have not standardized and normalized the optimization objectives of different dimensions, making it impossible to achieve balanced optimization of multiple objectives. At the same time, existing schemes do not incorporate the SLA requirements of high-priority services as hard constraints into the optimization model, which can easily lead to the problem of sacrificing the service quality of core services in pursuit of resource utilization. In addition, most existing schemes do not take into account the fairness of tenant resource allocation and the optimization of cluster energy consumption, which can easily lead to uneven resource allocation among tenants and high energy consumption per unit of computing power in the cluster, making it impossible to achieve a multi-dimensional balance of performance, cost, and fairness.
[0006] 4. Most existing load balancing technologies adopt a centralized scheduling architecture, with all scheduling decisions being uniformly issued by the central management node. In large-scale cluster scenarios with thousands of nodes, the central node is prone to performance bottlenecks, resulting in high scheduling response latency and a serious risk of single point of failure. At the same time, most existing solutions adopt a global cross-node task migration strategy, which can easily cause overall cluster scheduling jitter and affect the stability of business operations. For different types of abnormal load nodes such as overload, idle, and faulty nodes, existing technologies only use general task migration strategies and do not develop targeted correction schemes. Faulty node switching is slow, which can easily lead to business interruption and cannot achieve fine-grained load balancing optimization.
[0007] 5. Most existing scheduling technologies adopt statically configured scheduling strategies and model parameters. After going live, they cannot adaptively adjust according to the actual operation data of the cluster. Faced with dynamic changes in business scenarios and cluster status, the scheduling effect will continue to decline. A few solutions with feedback optimization capabilities only perform local optimization for a single scheduling module. They do not achieve closed-loop feedback across the entire chain of computing power profiling, demand prediction, global scheduling, and load balancing. They cannot achieve continuous iterative optimization of the entire scheduling system, and their long-term adaptability and stability are insufficient.
[0008] In summary, existing technologies have significant shortcomings in areas such as unified management and control of heterogeneous computing power, accurate demand prediction, multi-objective global optimization, refined load balancing, and end-to-end closed-loop optimization. They cannot meet the core requirements of computing power scheduling and load balancing for large-scale heterogeneous big data clusters in mixed business load scenarios. There is an urgent need to develop a solution that can systematically solve the above-mentioned technical deficiencies. Summary of the Invention
[0009] In order to overcome the shortcomings of the prior art, one of the objectives of this invention is to provide a device and method for dynamic allocation and load balancing optimization of big data computing resources.
[0010] One of the objectives of this invention is achieved through the following technical solution: A method for dynamic allocation and load balancing optimization of big data computing resources includes the following steps: S1. Unified abstraction of heterogeneous computing power resources and construction of multi-dimensional dynamic computing power profile: Through distributed collection agent, it connects to heterogeneous computing power nodes in the big data cluster, synchronously collects static attribute data and millisecond-level real-time running data of each computing power node, and uses standardized resource description protocol to uniformly abstract and encapsulate heterogeneous computing power resources with different hardware architectures to shield the differences in underlying hardware. At the same time, based on the collected full-dimensional data, it constructs a dynamic computing power profile including five dimensions: computing power performance, resource load, energy efficiency, network transmission, and operation stability, and updates the weight coefficients of each dimension feature in real time based on the node's historical running data and business adaptability. S2. Multi-feature fusion time-series prediction and priority hierarchical management of business computing power demand: Collect full historical operation data and real-time task request data of big data business, extract periodic, sudden and trend characteristics of business computing power demand, generate multi-time granularity business computing power demand prediction results based on the improved time-series fusion Transformer model, and build a business hierarchical matrix model with business type, SLA level and task urgency as core dimensions. Generate priority scores of business tasks through weighted calculation, divide priority levels according to scores, and configure dedicated computing power resource redundancy pools for high priority tasks. S3. Multi-objective optimization of global computing power pre-allocation with hard constraints: Taking the maximization of computing power resource utilization, the minimization of business response latency, the minimization of energy consumption per unit of computing power in the cluster, and the fairness of tenant resource allocation as optimization objectives, a multi-objective optimization function with hard constraints on the SLA of high-priority business is constructed by combining dynamic computing power profile, computing power demand prediction results and business priorities. Through a deep reinforcement learning scheduling model based on Markov decision process, a global computing power resource pre-allocation scheme is generated, and the corresponding computing power resources are mapped and allocated to the dedicated node groups corresponding to each business task. At the same time, a dynamically adjustable resource redundancy threshold is set for each node group. S4. Distributed adaptive load balancing and precise anomaly correction within node groups: Using node groups as the smallest scheduling unit, the load status of each computing node in the group is monitored in real time through distributed probes. When the node load index exceeds the preset upper limit threshold, the dynamic migration of low-priority tasks and secondary resource scheduling within the group are triggered. At the same time, three types of load anomaly nodes, namely overload, idle, and fault, are identified in real time. Targeted correction strategies are generated for different anomaly types to complete the real-time optimization of node-level load balancing. S5. Closed-loop feedback and iterative optimization of end-to-end scheduling effect: Collect all operational indicators of computing power resource allocation and load balancing according to a preset period, perform weighted calculation of each operational indicator based on the entropy weight method, generate a comprehensive score and evaluation report of scheduling effect, and perform incremental iterative optimization of the time series prediction model and deep reinforcement learning scheduling model based on the feedback data of the evaluation report. At the same time, update the feature weights of the dynamic computing power profile to complete the end-to-end closed-loop optimization of dynamic allocation and load balancing.
[0011] As a further improvement to the above technical solution: In step S1, the static attribute data includes the hardware architecture of the computing node, the number of computing cores, memory capacity, storage bandwidth, network bandwidth, and peak computing performance; the real-time running data includes CPU utilization, memory usage, GPU / accelerator card utilization, video memory usage, disk I / O usage, network throughput, task queue length, node real-time power consumption, and task execution success rate. Each dimension of the dynamic computing power profile is set with a corresponding quantitative evaluation index. The update rule of the weight coefficient is: based on the task execution success rate, response latency compliance rate and energy consumption efficiency of the node under the corresponding business type, the weight ratio of each dimension is adjusted in real time through the sliding window weighted average algorithm.
[0012] In step S2, the improved temporal fusion Transformer model introduces a multi-branch attention mechanism to extract and weightedly fuse the periodic, bursty, and trend characteristics of business computing power demand. At the same time, it learns online from real-time task request data through an incremental sliding window mechanism to generate computing power demand prediction results at multiple time granularities, such as minute, hour, and day. In the business tier matrix model, the highest weight coefficient is set for the SLA level. The resource quota of the dedicated computing power resource redundancy pool corresponding to high-priority tasks is not less than 5% of the total computing power of the cluster, and it can only be used to supplement the sudden computing power demand of high-priority tasks and take over the tasks of faulty nodes.
[0013] In step S3, the multi-objective optimization function first converts each optimization objective into a 0-1 range index of the same dimension through minimum-maximum normalization processing, sets configurable weight coefficients for each optimization objective, and adds a hard constraint condition of 100% SLA compliance rate for high-priority business. The state space of the deep reinforcement learning scheduling model includes the global computing power resource status of the cluster, the business demand prediction results, the business priority distribution, the current cluster load distribution and node energy consumption data. The action space includes computing power resource quota allocation actions, business task and node group mapping actions, and resource redundancy threshold adjustment actions. The reward function is the weighted calculation result of a multi-objective optimization function, and a penalty item is set when the SLA of high-priority business fails to meet the standard.
[0014] In step S4, the distributed adaptive load balancing within the node group adopts a decentralized scheduling strategy. Each node group elects a master node to be responsible for summarizing the load data within the group and issuing scheduling instructions. When the comprehensive load index of the nodes within the group exceeds the preset upper limit threshold, the master node prioritizes migrating the stateless task with the lowest priority and shortest runtime on that node to an idle node within the group whose comprehensive load is lower than the preset lower limit threshold. The normal execution of business tasks is not interrupted during the migration process. The targeted correction strategy is as follows: for overloaded abnormal nodes, perform task migration and temporary resource quota increase; for idle abnormal nodes, perform idle resource reclamation and unified task reallocation in the cluster; for faulty abnormal nodes, perform real-time task isolation and seamless switching of backup nodes in the redundancy pool.
[0015] A device for dynamic allocation and load balancing optimization of big data computing resources includes a computing resource management and control module, a business demand analysis module, a global computing power scheduling module, a load balancing optimization module, and a closed-loop iterative optimization module. The computing power resource management module is used to connect to heterogeneous computing power nodes in the big data cluster through a distributed collection agent, synchronously collect static attribute data and millisecond-level real-time operation data of each computing power node, and uniformly abstract and encapsulate heterogeneous computing power resources with different hardware architectures through a standardized resource description protocol to shield the underlying hardware differences. At the same time, based on the collected full-dimensional data, a dynamic computing power profile is constructed, which includes five dimensions: computing power performance, resource load, energy efficiency, network transmission, and operation stability. The weight coefficients of each dimension feature are updated in real time based on the node's historical operation data and business adaptability. The business demand analysis module is used to collect all historical operation data and real-time task request data of big data business, extract the periodic characteristics, sudden characteristics and trend characteristics of business computing power demand, generate multi-time granularity business computing power demand prediction results based on the improved time series fusion Transformer model, and construct a business hierarchical matrix model with business type, SLA level and task urgency as core dimensions. The module generates priority scores for business tasks through weighted calculation, divides priority levels according to scores, and configures a dedicated computing power resource redundancy pool for high-priority tasks. The global computing power scheduling module is used to optimize the utilization of computing power resources, minimize business response latency, minimize energy consumption per unit of computing power in the cluster, and ensure fairness in tenant resource allocation. It combines dynamic computing power profiles, computing power demand prediction results, and business priorities to construct a multi-objective optimization function with hard constraints on the SLA of high-priority businesses. Through a deep reinforcement learning scheduling model based on Markov decision processes, it generates a global computing power resource pre-allocation scheme, maps and allocates the corresponding computing power resources to the dedicated node groups corresponding to each business task, and sets a dynamically adjustable resource redundancy threshold for each node group. The load balancing optimization module is used to monitor the load status of each computing node in the group in real time through distributed probes, with the node group as the smallest scheduling unit. When the node load index exceeds the preset upper limit threshold, it triggers the dynamic migration of low-priority tasks and secondary resource scheduling in the group. At the same time, it identifies three types of abnormal load nodes in real time: overload, idle, and fault. It generates targeted correction strategies for different abnormal types to complete the real-time optimization of node-level load balancing. The closed-loop iterative optimization module is used to collect all operational indicators of computing power resource allocation and load balancing at a preset period, perform weighted calculations on each operational indicator based on the entropy weight method, generate a comprehensive score and evaluation report of scheduling effect, and perform incremental iterative optimization of the time series prediction model and deep reinforcement learning scheduling model based on the feedback data of the evaluation report, while updating the feature weights of the dynamic computing power profile, and completing the full-link closed-loop optimization of dynamic allocation and load balancing.
[0016] As a further improvement to the above technical solution: The computing power resource management module includes a resource acquisition unit, a resource abstraction and encapsulation unit, and a computing power profile construction unit; The resource acquisition unit is used to synchronously acquire the static attribute data and millisecond-level real-time running data of the nodes through distributed acquisition agents deployed on each computing power node, and upload the acquired data to the cluster management node after data cleaning and normalization preprocessing. The resource abstraction and encapsulation unit is used to uniformly abstract heterogeneous computing resources of different architectures such as CPU, GPU, NPU and FPGA through a standardized resource description protocol, generate standardized computing resource call interfaces and computing power quantification values, and shield the scheduling differences of the underlying hardware. The computing power profile building unit is used to construct a dynamic computing power profile in five dimensions, set corresponding quantitative evaluation indicators for each dimension, and adjust the weight coefficients of each dimension feature in real time based on the node's task execution success rate, response latency compliance rate and energy consumption efficiency under the corresponding business type through a sliding window weighted average algorithm.
[0017] The business demand analysis module includes a demand feature extraction unit, a time series prediction unit, and a business hierarchical management and control unit. The demand feature extraction unit is used to perform feature engineering processing on the full historical operation data and real-time task request data of the business, and to extract and separate the periodic features, sudden features and trend features of the business computing power demand. The time-series prediction unit has an improved time-series fusion Transformer model built in, which is used to perform weighted fusion of the separated multi-dimensional demand features through a multi-branch attention mechanism, and at the same time to learn online from real-time task request data through an incremental sliding window mechanism, generating computing power demand prediction results at multiple time granularities such as minute, hour, and day. The business classification and control unit has a built-in business classification matrix model, which is used to set weight coefficients for the three dimensions of business type, SLA level and task urgency. It generates a priority score for each business task through weighted calculation, and divides them into at least three priority levels according to the score from high to low. At the same time, it configures and manages the dedicated computing power resource redundancy pool corresponding to high priority tasks. The resource quota of the redundancy pool is not less than 5% of the total computing power of the cluster.
[0018] The global computing power scheduling module includes a multi-objective optimization modeling unit, a deep reinforcement learning scheduling unit, and a pre-allocation execution unit; The multi-objective optimization modeling unit is used to construct a multi-objective optimization function with hard constraints. First, it converts each optimization objective into an index of the same dimension in the 0-1 interval through minimum-maximum normalization processing, sets configurable weight coefficients for each optimization objective, and adds a hard constraint condition of 100% SLA compliance rate for high-priority business. The deep reinforcement learning scheduling unit incorporates a deep reinforcement learning scheduling model based on Markov decision processes. It uses the cluster's global computing power resource status, business demand prediction results, business priority distribution, current cluster load distribution, and node energy consumption data as the state space, computing power resource quota allocation actions, business task and node group mapping actions, and resource redundancy threshold adjustment actions as the action space, and the weighted calculation results of the multi-objective optimization function as the reward function. When the SLA of high-priority business fails to meet the standard, a penalty term is added, and the optimal global computing power resource pre-allocation scheme is output. The pre-allocation execution unit is used to map computing resources to business task node groups according to the generated global computing resource pre-allocation scheme, and set a dynamically adjustable resource redundancy threshold for each node group to cope with sudden fluctuations in business demand.
[0019] The load balancing optimization module includes a distributed load monitoring unit, an adaptive load balancing unit, and an anomaly orientation correction unit; the closed-loop iterative optimization module includes a scheduling effect evaluation unit and a model iterative optimization unit. The distributed load monitoring unit is used to collect load data of each computing node in the group in real time through distributed probes deployed in each node group, calculate the comprehensive load index of the node, and identify and report abnormal load data in real time. The adaptive load balancing unit is used to adopt a decentralized scheduling strategy to elect a master node within the node group to be responsible for scheduling and management within the group. When the comprehensive load index of the nodes in the group exceeds the preset upper limit threshold, the stateless task with the lowest priority and shortest runtime on that node is migrated to an idle node in the group whose comprehensive load is lower than the preset lower limit threshold, so as to achieve load balancing optimization without the business's awareness. The anomaly orientation correction unit is used to classify and identify three types of load anomaly nodes: overload, idle, and fault. For overload anomaly nodes, it performs task migration and temporary resource quota increase; for idle anomaly nodes, it performs idle resource reclamation and unified task reallocation in the cluster; and for fault anomaly nodes, it performs real-time task isolation and seamless switching of backup nodes in the redundancy pool. The scheduling effect evaluation unit is used to collect average utilization rate of computing resources, average response time of business tasks, SLA compliance rate, energy consumption per unit of computing power of the cluster, load balance degree and tenant resource allocation fairness index according to a preset period. It performs weighted calculation on each operating indicator based on the entropy weight method to generate a comprehensive scheduling effect score and evaluation report. The model iterative optimization unit is used to perform incremental iterative optimization of the hyperparameters and network structure of the time-series prediction model and the deep reinforcement learning scheduling model based on the feedback data of the evaluation report, and at the same time update the feature weights of the dynamic computing power profile to complete the closed-loop optimization of the scheduling strategy.
[0020] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. This invention uses a standardized resource description protocol to uniformly abstract and encapsulate heterogeneous computing resources with different hardware architectures, completely shielding the differences in underlying hardware. Upper-layer services can achieve unified invocation of heterogeneous computing power without special adaptation development, significantly reducing business adaptation costs. At the same time, it constructs a dynamic computing power profile covering five dimensions: computing power performance, resource load, energy efficiency, network transmission, and operational stability. Through a sliding window weighted average algorithm, it updates feature weights in real time, accurately quantifies the full-dimensional computing power characteristics of nodes, and provides accurate data support for scheduling decisions. This fundamentally solves the problem of computing power resource mismatch and can improve the effective utilization rate of heterogeneous computing power by more than 40%.
[0021] 2. This invention, through an improved temporal fusion Transformer model, introduces a multi-branch attention mechanism to extract and fuse the periodic, trend, and bursty features of business computing power demand. Combined with an incremental sliding window online learning mechanism, it controls the prediction error of computing power demand at multiple time granularities to within 8%, achieving advanced pre-allocation of computing power resources and completely solving the core contradiction of "resource idleness and resource shortage." At the same time, through a business hierarchical matrix model, it achieves refined hierarchical management of business priorities, configuring dedicated computing power resource redundancy pools for high-priority businesses, and achieving a rigid guarantee of 100% SLA compliance rate for P0-level core businesses, completely avoiding the problem of low-priority tasks preempting core business resources.
[0022] 3. This invention constructs a multi-objective optimization function with the core objectives of maximizing computing resource utilization, minimizing business response latency, minimizing energy consumption per unit of computing power in the cluster, and ensuring fairness in tenant resource allocation. It eliminates the influence of different metrics through min-max normalization, supports user-defined weight configuration, and incorporates the SLA compliance rate of high-priority services as an insurmountable hard constraint into the optimization model, fundamentally avoiding the problem of sacrificing core business service quality in pursuit of a single metric. Combined with a deep reinforcement learning scheduling model based on the PPO algorithm, it can quickly solve the globally optimal pre-allocation scheme in large-scale cluster scenarios, achieving multi-dimensional balanced optimization of resource utilization, business performance, energy consumption costs, and tenant fairness while ensuring the SLA of core services.
[0023] 4. This invention adopts a decentralized scheduling strategy with node groups as the smallest scheduling unit, which completely solves the performance bottleneck and single point of failure problem of the central node in centralized scheduling. Localized scheduling within the group greatly reduces cluster jitter caused by large-scale task migration across nodes. It achieves accurate quantification of node load status through multi-dimensional weighted comprehensive load indicators and realizes load balancing optimization without business awareness by using container hot migration technology.
[0024] 5. This invention constructs a closed-loop optimization mechanism across the entire chain. It achieves quantitative and comprehensive evaluation of scheduling performance through the entropy weight method. Based on the evaluation feedback data, it performs incremental iterative optimization on the time-series prediction model and the deep reinforcement learning scheduling model. At the same time, it updates the feature weights of the dynamic computing power profile and the scheduling strategy parameters, realizing closed-loop optimization of the entire chain of computing power profile, demand prediction, global scheduling, and load balancing. This enables the entire scheduling system to adapt to the dynamic changes in business scenarios and cluster status, maintain optimal scheduling performance in the long term, and avoid the problem of declining performance of static scheduling strategies over a long period of time.
[0025] 6. This invention is fully compatible with mainstream big data and resource scheduling frameworks such as Hadoop, Spark, Flink, Kubernetes, and YARN. It can be quickly deployed without large-scale modifications to the existing cluster architecture. It supports elastic scaling of cluster size from tens of nodes to thousands of nodes and can be adapted to all types of big data business scenarios such as offline batch processing, real-time stream computing, AI model training and inference, and interactive querying. It has low deployment cost, strong adaptability, and extremely high engineering application value and market promotion value.
[0026] The above description is merely an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described in detail below with reference to the accompanying drawings. Attached Figure Description
[0027] Figure 1 This is a flowchart of this embodiment. Detailed Implementation
[0028] The present invention will now be further described in conjunction with the accompanying drawings and specific embodiments. It should be noted that, without conflict, the various embodiments or technical features described below can be arbitrarily combined to form new embodiments.
[0029] It should be noted that when a component is described as "fixed to" another component, it can be directly on the other component or may have a component in between. When a component is considered "connected to" another component, it can be directly connected to the other component or may have a component in between. When a component is considered "set on" another component, it can be directly set on the other component or may have a component in between. The terms "vertical," "horizontal," "left," "right," and similar expressions used in this document are for illustrative purposes only.
[0030] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.
[0031] I. Applicable Scenarios and Preliminary Explanations for the Implementation Examples The big data computing power resource dynamic allocation and load balancing optimization method and apparatus disclosed in this specific embodiment are applied to big data distributed clusters containing heterogeneous computing power nodes. They are compatible with mainstream big data and resource scheduling frameworks such as Hadoop, Spark, Flink, Kubernetes, and YARN, and the cluster size supports elastic expansion from tens of nodes to thousands of nodes.
[0032] The underlying heterogeneous computing nodes of the cluster in this embodiment cover: x86 / ARM architecture CPU nodes, NVIDIA / AMD architecture GPU nodes, Ascend / Cambricon NPU nodes, and FPGA acceleration nodes. It can support multiple business scenarios such as offline batch processing, real-time stream computing, AI model training and inference, and big data interactive query, and fully covers all the technical features of the claims. Those skilled in the art can completely reproduce the technical solution of the present invention based on this embodiment.
[0033] II. Specific Implementation Examples of Dynamic Allocation and Load Balancing Optimization Methods for Big Data Computing Resources The complete execution flow of this method consists of 5 core steps, and the specific implementation details, algorithm parameters, and execution rules for each step are as follows: Step S1: Unified Abstraction of Heterogeneous Computing Resources and Construction of Multi-Dimensional Dynamic Computing Profiling Step S11 uses a lightweight distributed data collection agent daemon process developed in Go to collect data. This agent is deployed on the host operating system of each heterogeneous computing node in the cluster. Static attribute data is collected and reported once when a node first connects to the cluster. Specifically, it includes: the hardware architecture of the computing node, the number of computing cores, memory capacity, storage bandwidth, network bandwidth, peak computing performance, and hardware firmware version. Real-time running data is collected at high frequency with a 500ms millisecond interval. After collection, the data is first cleaned and outlier filtered locally (to remove abnormal data caused by network jitter and instantaneous peaks), and then reported to the cluster management node through an encrypted gRPC channel. Specifically, this includes: CPU utilization, memory usage, GPU / accelerator card utilization, video memory usage, disk I / O usage, network throughput, task queue length, node real-time power consumption, task execution success rate, and node heartbeat status.
[0034] Step S12 completes the unified abstraction and encapsulation of heterogeneous computing resources through a standardized resource description protocol based on Kubernetes CRD (Custom Resource Definition) extensions: For computing nodes with different hardware architectures, a standardized "computing resource object" is uniformly abstracted. This object contains five core fields: computing power type, total computing power quota, available computing power quota, hardware feature tag, and scheduling policy tag. Among them, the computing power capability is quantified using **TFLOPS (trillion floating-point operations per second)** as the unified benchmark unit. The total TFLOPS of CPU nodes is calculated based on the single-core floating-point performance, while the total TFLOPS of GPU / NPU / FPGA accelerated nodes is calculated based on the hardware's nominal peak computing power and actual available computing power. This completely shields the scheduling differences of the underlying heterogeneous hardware. Upper-layer services do not need to pay attention to the hardware architecture and only need to apply for the corresponding computing power quota through the standardized interface.
[0035] Step S13, based on the collected full-dimensional data, constructs a dynamic computing power profile across five dimensions: computing power performance, resource load, energy efficiency, network transmission, and operational stability. The quantitative evaluation indicators for each dimension are as follows: ; Step S14, based on the node's historical operational data and business adaptability, updates the weight coefficients of each dimension's features in real time using a 7-day sliding window weighted average algorithm. The sliding window step size is 1 hour, and the weight update formula is as follows: Wn(t) = Wn(t-1)*α + Rn(t)*(1-α) Where Wn(t) is the weight coefficient of the nth dimension at the current time, Wn(t-1) is the weight coefficient of the previous time, α is the smoothing coefficient (fixed value of 0.7), and Rn(t) is the business adaptability score of the nth dimension in the current sliding window (calculated based on the normalized weighted value of task execution success rate, response latency compliance rate, and energy consumption efficiency). Finally, the sum of the weight coefficients of the five dimensions is always 1.
[0036] Step S2: Multi-feature fusion time-series prediction and priority-based management of business computing power requirements Step S21 involves collecting all historical operational data (retaining at least 6 months) and real-time task request data from the big data business. Through feature engineering, it extracts and separates the periodic, bursty, and trend characteristics of the business's computing power requirements, including: Periodic characteristics: the daily, weekly, and hourly fluctuation patterns of business computing power demand; Trend characteristics: Long-term growth / decline trend in business computing power demand; Sudden characteristics: sudden surges in business traffic and sudden changes in computing power demand caused by temporary emergency tasks.
[0037] Step S22 generates multi-time-granularity business computing power demand prediction results based on the improved time-series fusion Transformer model. The specific implementation details of the model are as follows: Based on the traditional Transformer encoder, the model adds three parallel feature extraction branches: periodic feature branch, trend feature branch, and burst feature branch. Each branch adopts an independent multi-head attention mechanism. The periodic feature branch adopts a 7-day / 1-day / 1-hour multi-scale periodic attention mechanism, the trend feature branch adopts a causal convolution + long-distance attention mechanism, and the burst feature branch adopts a local sensitive attention mechanism. The features extracted from the three branches are weighted and fused by the fusion layer and then input into the decoder to generate prediction results. At the same time, an incremental sliding window mechanism is introduced, with a sliding window size of 1 hour. Incremental online learning is performed every hour based on newly added real-time task data, without the need for full data retraining. The model ultimately generates computing power demand prediction results at four time granularities: 5 minutes, 30 minutes, 1 hour, and 24 hours. During the pre-training phase, it uses 6 months of historical data from the cluster for training. After convergence, the prediction error can be controlled within 8%.
[0038] Step S23 utilizes a business hierarchy matrix model to manage business priorities. The specific implementation rules are as follows: The model uses business type, SLA level, and task urgency as core dimensions, with weight coefficients of 0.2, 0.5, and 0.3 respectively. Each dimension has a quantitative scoring standard of 0-100 points, and the weighted sum is used to obtain the total priority score of the business task. Priority ranking rules: A total score of 90 or above is P0 (highest priority), 70-89 is P1 (medium-high priority), and below 70 is P2 (normal priority). Configure a dedicated redundant computing resource pool for P0-level high-priority tasks. The resource quota of the redundant pool is fixed at 8% of the total computing power of the cluster. All nodes in the pool are highly available heterogeneous computing power nodes. They can only be used for sudden computing power supplementation for P0-level tasks and task takeover of faulty nodes. It is strictly forbidden to allocate them to low-priority tasks.
[0039] Step S3: Multi-objective optimization with hard constraints, global computing power pre-allocation Step S31 aims to maximize computing resource utilization, minimize service response latency, minimize energy consumption per unit of cluster computing power, and ensure fair allocation of tenant resources. It constructs a multi-objective optimization function with hard constraints, and the specific implementation details are as follows: Quantification expressions for the four core optimization objectives: The objective for maximizing the utilization of computing resources is f1: f1 = total actual computing power used by the cluster / total available computing power of the cluster, and the optimization direction is max(f1); The objective f2 is to minimize the business response latency: f2 = average response latency of all business tasks / preset standard response latency, and the optimization direction is min(f2); The target for minimizing the energy consumption per unit computing power of the cluster is f3: f3 = total real-time power consumption of the cluster / total actual output computing power of the cluster, and the optimization direction is min(f3); The objective for fairness in tenant resource allocation is f4: f4 = 1 - Gini coefficient for resource allocation among tenants, and the optimization direction is max(f4).
[0040] Normalization: The four objectives are converted into same-dimensional indicators in the 0-1 range through minimum-maximum normalization. The maximization and minimization objectives are respectively normalized using the corresponding normalization formulas to eliminate the influence of dimensions.
[0041] The final expression of the multi-objective optimization function: F = w1*f'1 + w2*f'2 + w3*f'3 + w4*f'4 w1, w2, w3, and w4 are the weight coefficients of the four optimization objectives, with a sum of 1. The default configuration is w1=0.3, w2=0.3, w3=0.2, and w4=0.2, but users can customize and adjust them according to their cluster operation needs.
[0042] Hard constraints: It is mandatory that the SLA compliance rate of P0-level high-priority services must be 100%, and the single task response delay must not exceed 120% of the preset threshold. Pre-allocation schemes that do not meet this constraint are directly judged as invalid schemes.
[0043] Step S32 generates a global computing resource pre-allocation scheme using a deep reinforcement learning scheduling model based on the PPO (Proximal Policy Optimization) algorithm (based on Markov decision processes). The core configuration of the model is as follows: State space S: A 256-dimensional feature vector containing six categories of features: cluster global computing power resource status, multi-granularity prediction results of business requirements, business priority distribution, current cluster load distribution, node energy consumption data, and network status data. Action Space A: Continuous action space, which includes three major categories: computing power resource quota allocation actions, business task and node group mapping actions, and resource redundancy threshold adjustment actions. The action output range is 0-1, corresponding to the resource quota allocation ratio and threshold adjustment ratio. Reward function R: R=F-λ*C, where F is the result of the multi-objective optimization function calculation, C is the penalty term (C=1 when the SLA of P0-level business is not met, otherwise C=0), λ is the penalty coefficient (fixed value of 10), and the model optimization objective is to maximize the cumulative reward value.
[0044] The S33 model employs an offline pre-training + online fine-tuning approach. Pre-training uses six months of historical data from the cluster to achieve convergence. After going live, a global scheduling decision is executed every 5 minutes to output the optimal pre-allocation scheme. Based on the scheme, computing resources are mapped and allocated to dedicated node groups corresponding to each business task (tasks of the same priority and business type are grouped into the same node group). At the same time, dynamic resource redundancy thresholds are set for node groups: 20% by default for P0 level node groups, 15% by default for P1 level, and 10% by default for P2 level, which can be adjusted in real time according to business prediction results.
[0045] Step S4: Distributed adaptive load balancing and precise anomaly correction within the node group Step S41 uses node groups as the smallest scheduling unit and adopts a decentralized scheduling strategy to achieve load balancing within the group: each node group elects one master node based on the Raft consensus protocol, which is responsible for summarizing load data, issuing scheduling instructions, and identifying abnormal nodes within the group; when the master node fails, the remaining nodes in the group are automatically re-elected without the need for cluster management node intervention, thus avoiding single points of failure.
[0046] S42 This step uses distributed probes to monitor the load status of nodes within the group in real time, and defines the comprehensive node load index L, calculated using the following formula: L = 0.4 * CPU utilization + 0.3 * Memory usage + 0.2 * Accelerator card utilization + 0.1 * Normalized network throughput The upper limit threshold for the comprehensive load index is set to 85%, and the lower limit threshold is set to 30%. When the L value of a node in the group exceeds the upper limit threshold of 85% for three consecutive collections, load balancing scheduling is triggered: the master node prioritizes the stateless tasks with the lowest priority and shortest runtime on the node, generates a migration list, and migrates the tasks to idle nodes in the group with an L value of less than 30% through container hot migration technology. The migration process is uninterrupted and imperceptible to the business. After the migration is completed, the node's computing power profile and load status are updated in real time.
[0047] S43 This step identifies three types of load anomaly nodes in real time: overload, no-load, and fault. It then executes targeted correction strategies for different anomaly types, with the specific rules as follows: Overload abnormal node: A node whose L value exceeds 90% for 5 consecutive minutes. The correction strategy is to prioritize the migration of low-priority tasks within the group. If the load still does not decrease, temporarily increase the upper limit of the resource quota of the adjustment point, and at the same time apply to the control node for the redundant pool to supplement computing power. Idle abnormal nodes: Nodes whose L value is below 10% for 30 minutes have the following correction strategy: reclaim the node's idle resources and add them to the cluster's global resource pool. The global scheduling module will then redistribute tasks to avoid resource idleness. Faulty or abnormal nodes: Nodes that experience three consecutive heartbeat timeouts, hardware failure alarms, or task execution success rates below 60% for 10 consecutive minutes will be subject to the following corrective measures: immediately isolate the node, prohibit new task scheduling, seamlessly switch all tasks on the node to the redundant pool backup node, keep the failover time within 30 seconds, and notify maintenance personnel to investigate the fault.
[0048] Step S5: Closed-loop feedback of end-to-end scheduling effect and iterative optimization of the model S51 This step collects full operational metrics of computing resource allocation and load balancing at a preset 12-hour cycle. Core metrics include: average utilization rate of computing resources, average response time of business tasks, overall SLA compliance rate, SLA compliance rate of P0-level business, energy consumption per unit of computing power in the cluster, cluster load balancing degree, and tenant resource allocation fairness index.
[0049] S52 This step uses the entropy weight method to perform weighted calculations on each operational indicator to generate a comprehensive scheduling performance score (0-100 points). The specific process is as follows: First, each indicator is standardized to eliminate dimensions, and then the entropy value of each indicator is calculated. The smaller the entropy value, the greater the dispersion of the indicator, the more information it contains, and the higher its weight. Finally, the comprehensive score is obtained by weighted summation based on the entropy values. The higher the score, the better the scheduling performance.
[0050] S53 This step, based on the feedback data from the scoring and evaluation reports, performs a closed-loop optimization operation: For the temporal fusion Transformer model and the PPO deep reinforcement learning scheduling model, incremental iterative training was performed using the running data of the past 12 hours to adjust the model hyperparameters and network weights, thereby improving prediction accuracy and scheduling performance. Based on node task execution performance data, update the feature weight coefficients of the five dimensions of the dynamic computing power profile to improve the adaptability of the computing power profile to the business. Adaptively adjust load balancing thresholds, task migration strategy parameters, and redundancy pool resource quotas to complete full-link closed-loop optimization, ensuring a steady improvement in the overall scheduling performance score after each iteration.
[0051] III. Specific Implementation Examples of Big Data Computing Resource Dynamic Allocation and Load Balancing Optimization Device This device is deployed using a distributed microservice architecture, compatible with x86 / ARM architecture servers, and can be deployed on physical machines, virtual machines, and container environments. It is divided into three layers: management and control layer, computing layer, and data layer. The management and control layer is deployed on two highly available management and control nodes in the cluster (master-slave mode), the computing layer is deployed on all heterogeneous computing power nodes in the cluster, and the data layer is deployed using the distributed time-series database InfluxDB and the relational database MySQL.
[0052] The core of the device includes a computing resource management module, a business demand analysis module, a global computing power scheduling module, a load balancing optimization module, and a closed-loop iterative optimization module. Each module interacts with data and links commands through RESTful APIs and gRPC interfaces. The specific implementation details of each module and its built-in units are as follows: 1. Computing resource management module This module is deployed on the main node of the control layer and has built-in resource acquisition unit, resource abstraction and encapsulation unit, and computing power profile construction unit, corresponding to all the functions of method step S1.
[0053] Resource Acquisition Unit: Connects to the distributed acquisition agent of all computing nodes, receives the collected static attribute data and real-time running data, completes data preprocessing, storage and distribution, supports 100,000-level data writing per second, and stores the data in the InfluxDB time series database with a retention period of 6 months; Resource Abstraction and Encapsulation Unit: Based on Kubernetes CRD, it implements a standardized resource description protocol, completes the unified abstraction and encapsulation of heterogeneous computing resources, generates standardized computing resource objects, and provides a unified interface for requesting, querying, and releasing computing resources, while being compatible with the resource request protocols of mainstream scheduling frameworks. Computing power profiling unit: Based on data collected from all dimensions, a dynamic computing power profile with five dimensions is constructed. The profile feature data and weight coefficients are updated every hour. The generated computing power profile is stored in a MySQL database, providing core data support for global scheduling and load balancing.
[0054] 2. Business Requirements Analysis Module This module is deployed on the main node of the control layer and has built-in demand feature extraction unit, time series prediction unit, and business hierarchical control unit, corresponding to all the functions of method step S2.
[0055] Demand Feature Extraction Unit: Connects to the big data platform task scheduling system, collects all historical operational data and real-time task request data of the business, completes feature engineering processing, extracts and separates periodic, sudden and trend features, and provides feature input for time series prediction; Temporal prediction unit: It has a built-in improved temporal fusion Transformer model, uses TensorRT for model inference acceleration, supports multi-time granularity computing power demand prediction, and the prediction results are synchronized to the global computing power scheduling module in real time. The incremental training of the model is performed once per hour. Business Hierarchy Management Unit: Built-in business hierarchy matrix model, supports user-defined dimension weights and priority division rules, automatically completes the priority classification of business tasks, and manages a dedicated computing power resource redundancy pool, monitors the resource usage of the redundancy pool in real time, and ensures that resources are only used for high-priority tasks.
[0056] 3. Global computing power scheduling module This module is deployed on the primary and backup nodes of the management and control layer. When the primary node fails, it automatically switches to the backup node. It has built-in multi-objective optimization modeling unit, deep reinforcement learning scheduling unit, and pre-allocation execution unit, which correspond to all the functions of method step S3.
[0057] Multi-objective optimization modeling unit: Supports user-defined optimization objective weight coefficients and hard constraints, automatically constructs multi-objective optimization functions with hard constraints, and provides optimization objectives for deep reinforcement learning scheduling models; Deep reinforcement learning scheduling unit: It has a built-in deep reinforcement learning scheduling model based on the PPO algorithm, uses GPU to accelerate model inference, executes a global scheduling decision every 5 minutes to generate the optimal computing power resource pre-allocation scheme, and performs incremental model fine-tuning every 12 hours. Pre-allocation execution unit: It interfaces with the underlying container orchestration system and resource scheduling framework, completes the allocation of computing power resource quotas, node group creation and task mapping according to the pre-allocation scheme, sets the resource redundancy threshold of the node group, and monitors the execution status of the pre-allocation scheme in real time.
[0058] 4. Load balancing optimization module This module is deployed in a distributed manner on the master node of each node group, and has built-in distributed load monitoring unit, adaptive load balancing unit, and anomaly orientation correction unit, corresponding to all the functions of method step S4.
[0059] Distributed load monitoring unit: Through distributed probes within the group, load data of each node in the group is collected in real time, comprehensive load index is calculated, load data is reported to the main node in the group every 1 second, and abnormal load data is identified and alarmed in real time. Adaptive Load Balancing Unit: Based on a decentralized scheduling strategy, it performs dynamic migration of tasks within the group and secondary resource scheduling. It uses container hot migration technology to achieve load balancing without the business's awareness, avoiding cluster jitter caused by large-scale task migration across node groups. Anomaly Targeted Correction Unit: Built-in anomaly classification and recognition model, automatically executes corresponding targeted correction strategies for three types of anomaly nodes: overload, idle, and fault. Fault node switching time is controlled within 30 seconds to ensure high service availability.
[0060] 5. Closed-loop iterative optimization module This module is deployed on the master node of the control layer and has a built-in scheduling effect evaluation unit and model iteration optimization unit, corresponding to all the functions of method step S5.
[0061] Scheduling effect evaluation unit: Performs a full scheduling effect evaluation every 12 hours, calculates a comprehensive score based on the entropy weight method, generates a standardized evaluation report, and supports the visualization of the operation trend and optimization effect of each indicator; Model Iteration and Optimization Unit: Based on the feedback data from the evaluation report, it automatically performs incremental iterative training of the time series prediction model and the deep reinforcement learning scheduling model, while updating the feature weights of the computing power profile and the scheduling strategy parameters to complete closed-loop optimization. The training process is performed offline and does not affect the normal operation of online business.
[0062] IV. Complete Implementation Cases of Typical Application Scenarios This embodiment is applied to the big data cluster of a leading e-commerce platform in China. The cluster contains 1,200 heterogeneous computing nodes (800 CPU nodes, 200 GPU training nodes, 150 NPU inference nodes, and 50 FPGA acceleration nodes), with a peak total computing power of 200 PFLOPS. It carries services such as real-time transaction calculation, user behavior analysis, AI recommendation model training and inference, and offline batch processing of orders. During peak promotion periods, the peak computing power demand is 6 times that of normal days. Traditional scheduling methods have problems such as low resource utilization, high business response latency, and unstable SLA compliance rate.
[0063] The complete execution process and effects after using the apparatus and method of the present invention are as follows: Cluster access phase: Deploy distributed acquisition agents on all computing power nodes, complete the unified abstraction of heterogeneous computing power resources and the construction of initial computing power profiles, ensure compatibility with the existing Kubernetes and Flink big data platforms, and complete the deployment and joint debugging of device modules; During daily operation: Based on one year of historical data, the time series prediction model is pre-trained, generating daily multi-granularity computing power demand prediction results, completing business priority classification, and configuring a redundant pool of 8% of the total cluster computing power for transaction and payment P0 businesses; the global scheduling module generates a pre-allocation scheme, divides exclusive node groups for different businesses, and the load balancing module performs real-time scheduling within the group and anomaly correction to ensure stable cluster operation. Pre-sale period: 7 days before the big sale, based on historical big sale data and pre-sale traffic, hourly computing power demand forecast results are generated for the big sale period; the global scheduling module adjusts the pre-allocation plan in advance, expands the node group for P0 level transaction business, temporarily increases the redundancy pool quota to 15%, and completes the computing power pre-allocation and pre-warming in advance to avoid scheduling jitter during the big sale. Peak period of the promotion: Traffic reaches its peak at 0:00 on the day of the promotion. The load balancing module monitors the node load in real time, migrates low-priority tasks of overloaded nodes in real time, and seamlessly switches faulty nodes. The redundant pool computing power is supplemented to the P0 business node group in real time, ultimately achieving a 100% SLA compliance rate for transaction business and controlling the response latency within 200ms. Closed-loop optimization phase: 12 hours after the end of the promotion, complete the full-cycle scheduling effect evaluation, perform incremental iterative optimization of the prediction model and scheduling model based on the running data, update the computing power profile weight coefficient, and complete the full-link closed-loop optimization.
[0064] The above embodiments are merely preferred embodiments of the present invention and should not be construed as limiting the scope of protection of the present invention. Any non-substantial changes and substitutions made by those skilled in the art based on the present invention shall fall within the scope of protection claimed by the present invention.
Claims
1. A method for dynamic allocation and load balancing optimization of big data computing resources, characterized in that, Includes the following steps: S1. Unified abstraction of heterogeneous computing power resources and construction of multi-dimensional dynamic computing power profile: Through distributed collection agent, it connects to heterogeneous computing power nodes in the big data cluster, synchronously collects static attribute data and millisecond-level real-time running data of each computing power node, and uses standardized resource description protocol to uniformly abstract and encapsulate heterogeneous computing power resources with different hardware architectures to shield the differences in underlying hardware. At the same time, based on the collected full-dimensional data, it constructs a dynamic computing power profile including five dimensions: computing power performance, resource load, energy efficiency, network transmission, and operation stability, and updates the weight coefficients of each dimension feature in real time based on the node's historical running data and business adaptability. S2. Multi-feature fusion time-series prediction and priority hierarchical management of business computing power demand: Collect full historical operation data and real-time task request data of big data business, extract periodic, sudden and trend characteristics of business computing power demand, generate multi-time granularity business computing power demand prediction results based on the improved time-series fusion Transformer model, and build a business hierarchical matrix model with business type, SLA level and task urgency as core dimensions. Generate priority scores of business tasks through weighted calculation, divide priority levels according to scores, and configure dedicated computing power resource redundancy pools for high priority tasks. S3. Multi-objective optimization of global computing power pre-allocation with hard constraints: Taking the maximization of computing power resource utilization, the minimization of business response latency, the minimization of energy consumption per unit of computing power in the cluster, and the fairness of tenant resource allocation as optimization objectives, a multi-objective optimization function with hard constraints on the SLA of high-priority business is constructed by combining dynamic computing power profile, computing power demand prediction results and business priorities. Through a deep reinforcement learning scheduling model based on Markov decision process, a global computing power resource pre-allocation scheme is generated, and the corresponding computing power resources are mapped and allocated to the dedicated node groups corresponding to each business task. At the same time, a dynamically adjustable resource redundancy threshold is set for each node group. S4. Distributed adaptive load balancing and precise anomaly correction within node groups: Using node groups as the smallest scheduling unit, the load status of each computing node in the group is monitored in real time through distributed probes. When the node load index exceeds the preset upper limit threshold, the dynamic migration of low-priority tasks and secondary resource scheduling within the group are triggered. At the same time, three types of load anomaly nodes, namely overload, idle, and fault, are identified in real time. Targeted correction strategies are generated for different anomaly types to complete the real-time optimization of node-level load balancing. S5. Closed-loop feedback and iterative optimization of end-to-end scheduling effect: Collect all operational indicators of computing power resource allocation and load balancing according to a preset period, perform weighted calculation of each operational indicator based on the entropy weight method, generate a comprehensive score and evaluation report of scheduling effect, and perform incremental iterative optimization of the time series prediction model and deep reinforcement learning scheduling model based on the feedback data of the evaluation report. At the same time, update the feature weights of the dynamic computing power profile to complete the end-to-end closed-loop optimization of dynamic allocation and load balancing.
2. The method for dynamic allocation and load balancing optimization of big data computing resources according to claim 1, characterized in that, In step S1, the static attribute data includes the hardware architecture of the computing node, the number of computing cores, memory capacity, storage bandwidth, network bandwidth, and peak computing performance; the real-time running data includes CPU utilization, memory usage, GPU / accelerator card utilization, video memory usage, disk I / O usage, network throughput, task queue length, node real-time power consumption, and task execution success rate. Each dimension of the dynamic computing power profile is set with a corresponding quantitative evaluation index. The update rule of the weight coefficient is: based on the task execution success rate, response latency compliance rate and energy consumption efficiency of the node under the corresponding business type, the weight ratio of each dimension is adjusted in real time through the sliding window weighted average algorithm.
3. The method for dynamic allocation and load balancing optimization of big data computing resources according to claim 1, characterized in that, In step S2, the improved temporal fusion Transformer model introduces a multi-branch attention mechanism to extract and weightedly fuse the periodic, bursty, and trend characteristics of business computing power demand. At the same time, it learns online from real-time task request data through an incremental sliding window mechanism to generate computing power demand prediction results at multiple time granularities, such as minute, hour, and day. In the business tier matrix model, the highest weight coefficient is set for the SLA level. The resource quota of the dedicated computing power resource redundancy pool corresponding to high-priority tasks is not less than 5% of the total computing power of the cluster, and it can only be used to supplement the sudden computing power demand of high-priority tasks and take over the tasks of faulty nodes.
4. The method for dynamic allocation and load balancing optimization of big data computing resources according to claim 1, characterized in that, In step S3, the multi-objective optimization function first converts each optimization objective into a 0-1 range index of the same dimension through minimum-maximum normalization processing, sets configurable weight coefficients for each optimization objective, and adds a hard constraint condition of 100% SLA compliance rate for high-priority business. The state space of the deep reinforcement learning scheduling model includes the global computing power resource status of the cluster, the business demand prediction results, the business priority distribution, the current cluster load distribution and node energy consumption data. The action space includes computing power resource quota allocation actions, business task and node group mapping actions, and resource redundancy threshold adjustment actions. The reward function is the weighted calculation result of a multi-objective optimization function, and a penalty item is set when the SLA of high-priority business fails to meet the standard.
5. The method for dynamic allocation and load balancing optimization of big data computing resources according to claim 1, characterized in that, In step S4, the distributed adaptive load balancing within the node group adopts a decentralized scheduling strategy. Each node group elects a master node to be responsible for summarizing the load data within the group and issuing scheduling instructions. When the comprehensive load index of the nodes within the group exceeds the preset upper limit threshold, the master node prioritizes migrating the stateless task with the lowest priority and shortest runtime on that node to an idle node within the group whose comprehensive load is lower than the preset lower limit threshold. The normal execution of business tasks is not interrupted during the migration process. The targeted correction strategy is as follows: for overloaded abnormal nodes, perform task migration and temporary resource quota increase; for idle abnormal nodes, perform idle resource reclamation and unified task reallocation in the cluster; for faulty abnormal nodes, perform real-time task isolation and seamless switching of backup nodes in the redundancy pool.
6. A device for dynamic allocation and load balancing optimization of big data computing resources, characterized in that, It includes a computing resource management module, a business demand analysis module, a global computing power scheduling module, a load balancing optimization module, and a closed-loop iterative optimization module; The computing power resource management module is used to connect to heterogeneous computing power nodes in the big data cluster through a distributed collection agent, synchronously collect static attribute data and millisecond-level real-time operation data of each computing power node, and uniformly abstract and encapsulate heterogeneous computing power resources with different hardware architectures through a standardized resource description protocol to shield the underlying hardware differences. At the same time, based on the collected full-dimensional data, a dynamic computing power profile is constructed, which includes five dimensions: computing power performance, resource load, energy efficiency, network transmission, and operation stability. The weight coefficients of each dimension feature are updated in real time based on the node's historical operation data and business adaptability. The business demand analysis module is used to collect all historical operation data and real-time task request data of big data business, extract the periodic characteristics, sudden characteristics and trend characteristics of business computing power demand, generate multi-time granularity business computing power demand prediction results based on the improved time series fusion Transformer model, and construct a business hierarchical matrix model with business type, SLA level and task urgency as core dimensions. The module generates priority scores for business tasks through weighted calculation, divides priority levels according to scores, and configures a dedicated computing power resource redundancy pool for high-priority tasks. The global computing power scheduling module is used to optimize the utilization of computing power resources, minimize business response latency, minimize energy consumption per unit of computing power in the cluster, and ensure fairness in tenant resource allocation. It combines dynamic computing power profiles, computing power demand prediction results, and business priorities to construct a multi-objective optimization function with hard constraints on the SLA of high-priority businesses. Through a deep reinforcement learning scheduling model based on Markov decision processes, it generates a global computing power resource pre-allocation scheme, maps and allocates the corresponding computing power resources to the dedicated node groups corresponding to each business task, and sets a dynamically adjustable resource redundancy threshold for each node group. The load balancing optimization module is used to monitor the load status of each computing node in the group in real time through distributed probes, with the node group as the smallest scheduling unit. When the node load index exceeds the preset upper limit threshold, it triggers the dynamic migration of low-priority tasks and secondary resource scheduling in the group. At the same time, it identifies three types of abnormal load nodes in real time: overload, idle, and fault. It generates targeted correction strategies for different abnormal types to complete the real-time optimization of node-level load balancing. The closed-loop iterative optimization module is used to collect all operational indicators of computing power resource allocation and load balancing at a preset period, perform weighted calculations on each operational indicator based on the entropy weight method, generate a comprehensive score and evaluation report of scheduling effect, and perform incremental iterative optimization of the time series prediction model and deep reinforcement learning scheduling model based on the feedback data of the evaluation report, while updating the feature weights of the dynamic computing power profile, and completing the full-link closed-loop optimization of dynamic allocation and load balancing.
7. The big data computing power resource dynamic allocation and load balancing optimization device according to claim 6, characterized in that, The computing power resource management module includes a resource acquisition unit, a resource abstraction and encapsulation unit, and a computing power profile construction unit; The resource acquisition unit is used to synchronously acquire the static attribute data and millisecond-level real-time running data of the nodes through distributed acquisition agents deployed on each computing power node, and upload the acquired data to the cluster management node after data cleaning and normalization preprocessing. The resource abstraction and encapsulation unit is used to uniformly abstract heterogeneous computing resources of different architectures such as CPU, GPU, NPU and FPGA through a standardized resource description protocol, generate standardized computing resource call interfaces and computing power quantification values, and shield the scheduling differences of the underlying hardware. The computing power profile building unit is used to construct a dynamic computing power profile in five dimensions, set corresponding quantitative evaluation indicators for each dimension, and adjust the weight coefficients of each dimension feature in real time based on the node's task execution success rate, response latency compliance rate and energy consumption efficiency under the corresponding business type through a sliding window weighted average algorithm.
8. The big data computing power resource dynamic allocation and load balancing optimization device according to claim 6, characterized in that, The business demand analysis module includes a demand feature extraction unit, a time series prediction unit, and a business hierarchical management and control unit. The demand feature extraction unit is used to perform feature engineering processing on the full historical operation data and real-time task request data of the business, and to extract and separate the periodic features, sudden features and trend features of the business computing power demand. The time-series prediction unit has an improved time-series fusion Transformer model built in, which is used to perform weighted fusion of the separated multi-dimensional demand features through a multi-branch attention mechanism, and at the same time to learn online from real-time task request data through an incremental sliding window mechanism, generating computing power demand prediction results at multiple time granularities such as minute, hour, and day. The business classification and control unit has a built-in business classification matrix model, which is used to set weight coefficients for the three dimensions of business type, SLA level and task urgency. It generates a priority score for each business task through weighted calculation, and divides them into at least three priority levels according to the score from high to low. At the same time, it configures and manages the dedicated computing power resource redundancy pool corresponding to high priority tasks. The resource quota of the redundancy pool is not less than 5% of the total computing power of the cluster.
9. The big data computing power resource dynamic allocation and load balancing optimization device according to claim 6, characterized in that, The global computing power scheduling module includes a multi-objective optimization modeling unit, a deep reinforcement learning scheduling unit, and a pre-allocated execution unit; The multi-objective optimization modeling unit is used to construct a multi-objective optimization function with hard constraints. First, it converts each optimization objective into an index of the same dimension in the 0-1 interval through minimum-maximum normalization processing, sets configurable weight coefficients for each optimization objective, and adds a hard constraint condition of 100% SLA compliance rate for high-priority business. The deep reinforcement learning scheduling unit incorporates a deep reinforcement learning scheduling model based on Markov decision processes. It uses the cluster's global computing power resource status, business demand prediction results, business priority distribution, current cluster load distribution, and node energy consumption data as the state space, computing power resource quota allocation actions, business task and node group mapping actions, and resource redundancy threshold adjustment actions as the action space, and the weighted calculation results of the multi-objective optimization function as the reward function. When the SLA of high-priority business fails to meet the standard, a penalty term is added, and the optimal global computing power resource pre-allocation scheme is output. The pre-allocation execution unit is used to map computing resources to business task node groups according to the generated global computing resource pre-allocation scheme, and set a dynamically adjustable resource redundancy threshold for each node group to cope with sudden fluctuations in business demand.
10. The big data computing power resource dynamic allocation and load balancing optimization device according to claim 6, characterized in that, The load balancing optimization module includes a distributed load monitoring unit, an adaptive load balancing unit, and an anomaly orientation correction unit; the closed-loop iterative optimization module includes a scheduling effect evaluation unit and a model iterative optimization unit. The distributed load monitoring unit is used to collect load data of each computing node in the group in real time through distributed probes deployed in each node group, calculate the comprehensive load index of the node, and identify and report abnormal load data in real time. The adaptive load balancing unit is used to adopt a decentralized scheduling strategy to elect a master node within the node group to be responsible for scheduling and management within the group. When the comprehensive load index of the nodes in the group exceeds the preset upper limit threshold, the stateless task with the lowest priority and shortest runtime on that node is migrated to an idle node in the group whose comprehensive load is lower than the preset lower limit threshold, so as to achieve load balancing optimization without the business's awareness. The anomaly orientation correction unit is used to classify and identify three types of load anomaly nodes: overload, idle, and fault. For overload anomaly nodes, it performs task migration and temporary resource quota increase; for idle anomaly nodes, it performs idle resource reclamation and unified task reallocation in the cluster; and for fault anomaly nodes, it performs real-time task isolation and seamless switching of backup nodes in the redundancy pool. The scheduling effect evaluation unit is used to collect average utilization rate of computing resources, average response time of business tasks, SLA compliance rate, energy consumption per unit of computing power of the cluster, load balance degree and tenant resource allocation fairness index according to a preset period. It performs weighted calculation on each operating indicator based on the entropy weight method to generate a comprehensive scheduling effect score and evaluation report. The model iterative optimization unit is used to perform incremental iterative optimization of the hyperparameters and network structure of the time-series prediction model and the deep reinforcement learning scheduling model based on the feedback data of the evaluation report, and at the same time update the feature weights of the dynamic computing power profile to complete the closed-loop optimization of the scheduling strategy.