An intelligent resource allocation method and system for a railway cloud platform
By using a three-layer LSTM prediction model and a dual DQN decision model, the railway cloud platform has achieved prediction and dynamic scheduling of resource demand, solved the problems of resource waste and system stability under high load conditions, and improved resource utilization and business service quality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA RAILWAY DESIGN GRP CO LTD
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-12
AI Technical Summary
Under high load conditions, railway cloud platforms struggle to achieve reasonable coordination and efficient scheduling among multiple services, leading to resource waste and system stability issues.
A three-layer LSTM prediction model and a dual DQN decision model are adopted to predict future demand through resource feature sequences, establish a comprehensive system state vector, optimize resource allocation decisions, and perform dynamic scheduling using online Q-networks and target Q-networks.
This improved the resource utilization rate of the railway cloud platform, reduced resource waste and scheduling conflict risks, and ensured the service quality and system stability of high-priority services.
Smart Images

Figure CN121979692B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of cloud computing resource management and intelligent scheduling technology, and in particular to an intelligent resource allocation method and system for railway cloud platforms. Background Technology
[0002] With the continuous development of cloud computing and virtualization technologies, cloud platforms have become an important infrastructure supporting the operation of large-scale business systems. They can provide on-demand resource support for various railway businesses, thereby improving resource utilization efficiency and reducing system operating costs.
[0003] Currently, the railway cloud platform serves as a core information infrastructure, centrally supporting multiple critical business systems such as transportation scheduling, passenger ticketing, passenger services, video surveillance, and equipment maintenance. These businesses are characterized by high concurrency, stringent real-time requirements, and significant load fluctuations. Some of these businesses have stringent requirements for system response time and continuous operation capabilities, directly impacting the safety and service quality of railway transportation. Furthermore, railway operations exhibit clear temporal regularities and unpredictable events; for example, peak passenger periods, concentrated holiday travel, and emergency response scenarios can all significantly impact cloud platform resources within a short period. Under these complex business scenarios, the cloud platform not only needs to ensure the stable operation of critical businesses under high load conditions but also needs to achieve reasonable coordination and efficient scheduling among multiple businesses under resource constraints. Summary of the Invention
[0004] In view of this, the purpose of this invention is to provide an intelligent resource allocation method and system for railway cloud platforms, which can improve the overall resource utilization rate of railway cloud platforms.
[0005] In a first aspect, embodiments of the present invention provide an intelligent resource allocation method for a railway cloud platform, comprising: acquiring resource indicator data, preprocessing the resource indicator data, and obtaining a resource feature sequence;
[0006] A three-layer LSTM prediction model is established, and the model is optimized and trained using resource feature sequences.
[0007] Define the resource feature sequence at the current time based on the resource feature sequence, and use the resource feature sequence at the current time in combination with a three-layer LSTM prediction model to perform forward calculation to obtain the resource feature sequence at the future time.
[0008] Define the historical time resource feature sequence based on the resource feature sequence, and establish a comprehensive system state vector based on the historical time resource feature sequence, the current time resource feature sequence, and the future time resource feature sequence.
[0009] A dual-DQN decision model consisting of an online Q-network and a target Q-network is established. The integrated system state vector at time t is defined based on the integrated system state vector, and the resource index data at time t+1 is obtained. An empirical data quadruple is established based on the integrated system state vector at time t and the resource index data at time t+1. The dual-DQN decision model is optimized and trained by inputting the integrated system state vector at time t into the empirical data quadruple.
[0010] The dual DQN decision model performs forward computation and outputs the optimal resource allocation decision.
[0011] Perform decision verification on the optimal resource allocation decision.
[0012] Furthermore, optimizing the training of the three-layer LSTM prediction model using resource feature sequences includes:
[0013] Define the resource feature sequence as The resource feature sequence is input into a three-layer LSTM prediction model, which outputs a sample resource demand prediction result with 6 future time steps. The result is then compared with the actual value of the sample resource demand and the prediction error is calculated.
[0014] The prediction error is quantified by the first mean squared error loss function, and the network parameters of the three-layer LSTM prediction model are optimized by the backpropagation algorithm. After multiple rounds of iterative updates until the first mean squared error loss function converges, the three-layer LSTM prediction model optimization training is completed.
[0015] Furthermore, during the optimization training of the three-layer LSTM prediction model, a Dropout mechanism with a dropout rate of 0.1 is introduced to improve the generalization ability of the three-layer LSTM prediction model.
[0016] Furthermore, based on the resource feature sequence, the current time-time resource feature sequence is defined. Using the current time-time resource feature sequence combined with a three-layer LSTM prediction model for forward computation, the future time-time resource feature sequence is obtained, including:
[0017] Based on the resource feature sequence, the resource feature sequence at the current moment is defined as follows: The resource feature sequence at the current moment The input is fed into a three-layer LSTM prediction model for forward computation, and the output is the resource demand prediction result at the current time step with 6 future time steps. Select the last time step. As the resource feature sequence at the current moment The predicted value one hour later is the resource feature sequence for the future time.
[0018] Furthermore, based on the resource feature sequences, the establishment of a comprehensive system state vector includes defining historical resource feature sequences, current resource feature sequences, and future resource feature sequences.
[0019] Define the historical time-specific resource feature sequence based on the resource feature sequence, and set the historical time-specific resource feature sequence as follows: The resource feature sequence at the current moment is The resource feature sequence at future moments is By concatenating the resource feature sequences from historical moments, the current moment, and the future moments along the feature dimensions, a comprehensive system state vector with a dimension of 3m is obtained, and its expression is:
[0020] .
[0021] Furthermore, based on the integrated system state vector at time t, the integrated system state vector at time t+1 is defined, and the resource index data at time t+1 is obtained. Based on the integrated system state vector at time t and the resource index data at time t+1, an empirical data quadruple is established. The dual DQN decision model is then optimized and trained using the empirical data quadruple, including:
[0022] According to the definition of the integrated system state vector at time t, the integrated system state vector is: The integrated system state vector at time t is input into the dual DQN decision model to obtain the resource scheduling action at time t. Reward function at time t .
[0023] Resource index data at time t+1 is acquired, and preprocessed to obtain the resource feature sequence at time t+1. The resource feature sequence at time t+1 is then used in conjunction with a three-layer LSTM prediction model for forward computation to obtain the predicted resource feature sequence at time t+2. Based on the resource feature sequences at time t, time t+1, and the predicted resource feature sequence at time t+2, a comprehensive system state vector at time t+1 is established. .
[0024] Based on the integrated system state vector at time t, The integrated system state vector at time t+1 Resource scheduling actions at time t Instant rewards at time t Establish empirical data quadruples The empirical data quadruples are stored in the empirical replay pool. A portion of the empirical data quadruples are randomly selected from the empirical replay pool as training data. The online Q-network in the dual DQN decision model is iteratively updated based on the training data.
[0025] The overall system state vector at time t Input the online Q network and output the resource scheduling action at time t. And the first predicted Q value; the integrated system state vector at time t+1 Input the online Q network and output the resource scheduling action at time t+1. And the second predicted Q value.
[0026] Let the online Q-network select the resource scheduling action at time t+1 with the largest second predicted Q value. The resource scheduling action at time t+1 is performed by the target Q-network for the entity with the largest second predicted Q value. Perform value assessment and compare it with the reward function at time t. The target Q value is obtained by summing the results.
[0027] The error between the first predicted Q value and the target Q value is minimized, and the network parameters of the online Q network are updated through the backpropagation algorithm. Then, the network parameters of the online Q network are synchronized to the target Q network according to a preset period, and the network parameters of the target Q network are updated.
[0028] Furthermore, the online Q network is defined as... The target Q network is The expression for calculating the objective Q value using the Bellman update formula is:
[0029]
[0030] In the formula, This is a discount factor used to balance immediate and long-term returns; and These are the network parameters for the online Q-network and the target Q-network, respectively.
[0031] Furthermore, the integrated system state vector at time t is input into the dual DQN decision model for forward computation, and the optimal resource allocation decision is output, including: inputting the integrated system state vector at time t into the dual DQN decision model for forward computation to obtain the resource scheduling actions at each time t. The predicted Q-value distribution is used to select the resource scheduling action at time t with the largest first predicted Q-value. As the optimal resource allocation decision, and output the optimal resource allocation decision.
[0032] Secondly, embodiments of the present invention provide an intelligent resource allocation system for a railway cloud platform, comprising:
[0033] The data processing unit is used to acquire resource indicator data, preprocess the resource indicator data, and obtain resource feature sequences.
[0034] The three-layer LSTM prediction model optimization training unit is established, and the three-layer LSTM prediction model is optimized and trained using resource feature sequences.
[0035] The predictive resource feature sequence acquisition unit is used to define the resource feature sequence at the current time based on the resource feature sequence, and to perform forward calculation using the resource feature sequence at the current time in combination with a three-layer LSTM prediction model to obtain the resource feature sequence at the future time.
[0036] The integrated system state vector acquisition unit is used to define the historical time resource feature sequence according to the resource feature sequence, and to establish the integrated system state vector based on the historical time resource feature sequence, the current time resource feature sequence, and the future time resource feature sequence.
[0037] The dual-DQN decision model optimization and training unit is used to establish a dual-DQN decision model composed of an online Q-network and a target Q-network. It defines the integrated system state vector at time t based on the integrated system state vector and obtains the resource index data at time t+1. Based on the integrated system state vector at time t and the resource index data at time t+1, it establishes an empirical data quadruple and optimizes the dual-DQN decision model through the empirical data quadruple.
[0038] The optimal resource allocation decision acquisition unit is used to input the integrated system state vector at time t into the dual DQN decision model for forward calculation and output the optimal resource allocation decision.
[0039] The decision verification unit is used to verify the optimal resource allocation decision.
[0040] The beneficial effects of the embodiments of the present invention are as follows:
[0041] This invention enables advance prediction and dynamic scheduling of resource demands for railway cloud platforms, improves the overall resource utilization rate of railway cloud platforms, reduces resource waste and scheduling conflict risks, and ensures the service quality and system operation stability of high-priority railway services. It has high engineering practical value and promotion significance.
[0042] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention are realized and obtained in accordance with the structures particularly pointed out in the description, claims and drawings.
[0043] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0044] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0045] Figure 1 This is a flowchart illustrating an intelligent resource allocation method for a railway cloud platform, provided as an embodiment of the present invention. Detailed Implementation
[0046] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0047] Example 1
[0048] To facilitate understanding of this embodiment, in conjunction with Figure 1 This invention provides a detailed description of an intelligent resource allocation method for a railway cloud platform, as disclosed in this embodiment. The intelligent resource allocation method for a railway cloud platform provided in this embodiment includes the following steps:
[0049] S1. Obtain resource indicator data, preprocess the resource indicator data, and obtain resource feature sequences.
[0050] Resource indicator data is collected from the railway cloud platform. Based on the actual needs of railway operations and the operation and maintenance experience of the railway cloud platform, the resource indicator data includes CPU utilization, CPU load change rate, memory utilization, available memory, disk I / O latency, and network bandwidth utilization. Table 1 provides a detailed explanation of the resource indicator data.
[0051] Table 1 provides a detailed explanation of the resource indicator data.
[0052]
[0053] In S1, the resource indicator data is preprocessed to obtain the resource feature sequence, including: normalizing the resource indicator data to obtain the resource time series, and performing time window processing on the resource time series to obtain the resource feature sequence.
[0054] For example, the normalization process for resource indicator data is as follows: the Min-Max normalization method is used to linearly map the original data to obtain the resource time series, ensuring that the resource time series falls within the normalization interval. In this embodiment, normalization can reduce the impact of extreme values on data distribution and improve data stability and comparability.
[0055] The method for obtaining resource feature sequences by applying time window processing to resource time series is as follows: Define the resource time series as... When performing time window processing on resource time series, if the input time window is set to 8 hours and the sampling interval is 10 minutes, a resource feature sequence with 48 time steps and m dimensions can be obtained. The preferred value for m is 6 to 10.
[0056] Furthermore, if we select the 2.5th percentile of each resource indicator data as the smallest original data, the 97.5th percentile of each resource indicator data as the largest original data, and the remaining 95% of each resource indicator data as the original data, then the expression for the Min-Max normalization method is:
[0057] ;
[0058] In the formula, This is a resource time series, where x is the original data. max For the largest original data, x min The minimum original data is [0, 1], which is the normalized interval.
[0059] S2. Establish a three-layer LSTM prediction model and optimize and train the three-layer LSTM prediction model using resource feature sequences.
[0060] In S2, the three-layer LSTM prediction model is a three-layer Long Short-Term Memory (LSTM) prediction model, consisting of a first-layer LSTM unit, a second-layer LSTM unit, and a third-layer LSTM unit. The number of hidden layer features in the first, second, and third layers of the LSTM unit are set to 64, 128, and 64, respectively. The first-layer LSTM unit is used to capture the basic temporal patterns of resource feature sequence data. The second-layer LSTM unit is used to establish nonlinear temporal relationships based on the basic temporal patterns. The third-layer LSTM unit is used to compress and integrate the nonlinear temporal relationships to obtain high-dimensional features.
[0061] Furthermore, the network structure expressions for the three-layer LSTM prediction model are as follows:
[0062] ;
[0063] In the formula, W represents the Forgotten Gate. f Let b represent the forget gate weight matrix. f h represents the forget gate bias vector. t-1 Indicates the hidden state in the previous time step, x t This indicates the input at the current time.
[0064] ;
[0065] In the formula, W represents the input gate. i Let b represent the input gate weight matrix. i This represents the input gate bias vector.
[0066] ;
[0067] In the formula, W represents the state of the input candidate memory cell. c b represents the candidate memory cell state weight matrix. c This represents the state bias vector of the candidate memory cell.
[0068] ;
[0069] In the formula, C represents the updated state of the memory cell. t-1 It represents the updated state of the memory cell from the previous moment.
[0070] ;
[0071] In the formula, Indicates the output gate activation value. This represents the Sigmoid activation function, b o The bias vector represents the activation value of the output gate.
[0072] ;
[0073] In the formula, This indicates the hidden state of the output gate at the current moment.
[0074] In S2, the optimization training of the three-layer LSTM prediction model using resource feature sequences includes:
[0075] S201, Define the resource characteristic sequence as follows: The resource feature sequence is input into a three-layer LSTM prediction model, which outputs a sample resource demand prediction result with 6 future time steps. The result is then compared with the actual value of the sample resource demand and the prediction error is calculated.
[0076] Among them, the resource feature sequence and the sample resource demand prediction structure can constitute a time series sample. The expression for the sample resource demand forecast result is:
[0077] ;
[0078] In the formula, i represents different resource indicator data.
[0079] S202. The prediction error is quantified by the first mean square error loss function. The network parameters of the three-layer LSTM prediction model are optimized by the backpropagation algorithm. After multiple rounds of iterative updates until the first mean square error loss function converges, the three-layer LSTM prediction model optimization training is completed.
[0080] When the prediction error is quantified using the first mean square error loss function, its expression is:
[0081] ;
[0082] Where N is the number of time series samples.
[0083] Furthermore, during the optimization training of the three-layer LSTM prediction model, a Dropout mechanism with a dropout rate of 0.1 is introduced to improve the generalization ability of the three-layer LSTM prediction model.
[0084] S3. Define the resource feature sequence at the current time based on the resource feature sequence, and use the resource feature sequence at the current time in combination with the three-layer LSTM prediction model to perform forward calculation to obtain the resource feature sequence at the future time.
[0085] In S3, the resource feature sequence at the current time is defined based on the resource feature sequence. The resource feature sequence at the current time is then used in conjunction with a three-layer LSTM prediction model for forward computation to obtain the resource feature sequence at the future time. This includes defining the resource feature sequence at the current time based on the resource feature sequence. The resource feature sequence at the current moment The input is fed into a three-layer LSTM prediction model for forward computation, and the output is the resource demand prediction result at the current time step with 6 future time steps. Select the last time step. As the resource feature sequence at the current moment The predicted value one hour later is the resource feature sequence for the future time.
[0086] S4. Define the historical resource feature sequence based on the resource feature sequence, and establish a comprehensive system state vector based on the historical resource feature sequence, the current resource feature sequence, and the future resource feature sequence.
[0087] In S4, establishing the integrated system state vector by defining historical time resource feature sequences, current time resource feature sequences, and future time resource feature sequences based on resource feature sequences includes: defining historical time resource feature sequences based on resource feature sequences, and setting the historical time resource feature sequences as follows: The resource feature sequence at the current moment is The resource feature sequence at future moments is By concatenating the resource feature sequences from historical moments, the current moment, and the future moments along the feature dimensions, a comprehensive system state vector with a dimension of 3m is obtained, and its expression is:
[0088] ;
[0089] The expression for concatenating the resource feature sequences from historical moments, the current moment, and the future moment along the feature dimension is as follows:
[0090] ;
[0091] S5. Establish a dual DQN decision model consisting of an online Q-network and a target Q-network. Define the integrated system state vector at time t based on the integrated system state vector and obtain the resource index data at time t+1. Establish an empirical data quadruple based on the integrated system state vector at time t and the resource index data at time t+1. Optimize and train the dual DQN decision model using the empirical data quadruple.
[0092] Among them, the Double Deep Q-Network (DQN) decision model is a resource allocation decision model. In the Double Deep Q-Network decision model, the online Q-network and the target Q-network have the same network structure, but their network parameters are independent of each other. The online Q-network is used for action selection, and the target Q-network is used for action value evaluation to reduce the training instability problem caused by Q-value overestimation.
[0093] Furthermore, the state space in the dual DQN decision model is an n-dimensional state space. In this embodiment, resource index data is regarded as resource dimension features, so the value of n is preferably 18, that is, each n-dimensional state space consists of 6 resource dimension features in each of the three time periods: time t-1, time t, and the predicted time t+1. Time t-1 refers to one hour ago, time t refers to the current time, and the predicted time t+1 refers to one hour after the prediction.
[0094] In S5, the integrated system state vector at time t is defined based on the integrated system state vector, and resource index data at time t+1 is obtained. Based on the integrated system state vector at time t and the resource index data at time t+1, an empirical data quadruple is established. The dual-DQN decision model is then optimized and trained using the empirical data quadruple, including:
[0095] S501. Based on the definition of the integrated system state vector at time t, the integrated system state vector is: The integrated system state vector at time t is input into the dual DQN decision model to obtain the resource scheduling action at time t. Reward function at time t .
[0096] S502. Obtain resource index data at time t+1, preprocess the resource index data at time t+1 to obtain the resource feature sequence at time t+1; use the resource feature sequence at time t+1 combined with a three-layer LSTM prediction model for forward computation to obtain the predicted resource feature sequence at time t+2; establish the integrated system state vector at time t+1 based on the resource feature sequence at time t, the resource feature sequence at time t+1, and the predicted resource feature sequence at time t+2. .
[0097] S503, Based on the integrated system state vector at time t, is The integrated system state vector at time t+1 Resource scheduling actions at time t Instant rewards at time t Establish empirical data quadruples The empirical data quadruples are stored in the empirical replay pool. A portion of the empirical data quadruples are randomly selected from the empirical replay pool as training data. The online Q-network in the dual DQN decision model is iteratively updated based on the training data.
[0098] S504, synthesize the system state vector at time t. Input the online Q network and output the resource scheduling action at time t. And the first predicted Q value; the integrated system state vector at time t+1 Input the online Q network and output the resource scheduling action at time t+1. And the second predicted Q value.
[0099] The first predicted Q value is the resource scheduling action at time t. The corresponding predicted Q value; the second predicted Q value is the resource scheduling action at time t+1. The corresponding predicted Q value.
[0100] S505, Instruct the online Q-network to select the resource scheduling action at time t+1 that has the largest second predicted Q value. The resource scheduling action at time t+1 is performed by the target Q-network for the entity with the largest second predicted Q value. Perform value assessment and compare it with the reward function at time t. The target Q value is obtained by summing the results.
[0101] Among them, the online Q network is defined as The target Q network is The expression for calculating the objective Q value using the Bellman update formula is:
[0102]
[0103] In the formula, This is a discount factor used to balance immediate and long-term returns; and These are the network parameters for the online Q-network and the target Q-network, respectively. In this embodiment, The value is 0.8.
[0104] Furthermore, The expression is:
[0105]
[0106] In the formula, This is a weighted average of the utilization rates of various resources. For task response time, This represents the change in energy consumption per unit time. In this expression, , , All were normalized. and These represent the actual value and safety threshold of each resource utilization rate, respectively. Resource utilization rate refers to the resource scheduling action at time t that yields the maximum first predicted Q value. Resource index data at time t afterward.
[0107] S506. Minimize the error between the first predicted Q value and the target Q value, and update the network parameters of the online Q network through the backpropagation algorithm. Then, synchronize the network parameters of the online Q network to the target Q network according to a preset period, and then update the network parameters of the target Q network.
[0108] Specifically, the error between the first predicted Q value and the target Q value is calculated using the second mean square error loss function. The expression for the second mean square error loss function is as follows:
[0109]
[0110] S6. Input the integrated system state vector at time t into the dual DQN decision model for forward calculation and output the optimal resource allocation decision.
[0111] In S6, the integrated system state vector at time t is input into the dual DQN decision model for forward computation, and the optimal resource allocation decision is output, including: inputting the integrated system state vector at time t into the dual DQN decision model for forward computation to obtain the resource scheduling actions at each time t. The predicted Q-value distribution is used to select the resource scheduling action at time t with the largest first predicted Q-value. As the optimal resource allocation decision, and output the optimal resource allocation decision.
[0112] In this embodiment, the resource scheduling action at time t is represented by the action space. The action space is a 10-dimensional action space, including 4 types of operations: resource expansion, resource contraction, resource migration, and resource maintenance. Among them, the three types of operations, resource expansion, resource contraction, and resource migration, can be further subdivided into three discrete levels: small adjustment level, medium adjustment level, and large adjustment level, so as to make precise decisions for different production environments.
[0113] S7. Validate the optimal resource allocation decision.
[0114] Specifically, SLA verification and physical resource verification are used to validate the optimal resource allocation decision. SLA verification is used to determine whether the resource allocation decision meets business performance indicators, including service response time, request success rate, and SLA default rate. Physical resource verification is used to constrain the current resource indicator data.
[0115] Based on the above, to verify the performance advantages of the proposed three-layer LSTM prediction model, a comparative experiment was conducted with a traditional model. Both the three-layer LSTM prediction model and the traditional model were trained on the same dataset and used to predict resource demand for the next hour. According to Table 2, the experimental results show that the root mean square error (RMSE) of the proposed three-layer LSTM prediction model is 0.109, which is approximately 23% lower than the random forest model and approximately 34% lower than the linear regression model. Furthermore, the current CPU utilization and current memory utilization are also significantly lower than the comparative model, indicating that the three-layer LSTM prediction model can more effectively capture the temporal characteristics and nonlinear changes in resource demand, and has higher prediction accuracy.
[0116] Table 2. Performance Comparison of the Three-Layer LSTM Prediction Model with Other Models
[0117] Root Mean Square Error (RMSE) Current CPU utilization Current memory utilization Three-layer LSTM prediction model 0.109 0.095 0.084 Linear regression model 0.165 0.138 0.122 Random Forest Model 0.141 0.116 0.103
[0118] To verify the effectiveness of the proposed dual-DQN decision model, a comparative experiment was conducted with the traditional threshold-based resource allocation method under nearly identical business load conditions. As shown in Table 3, the experimental results demonstrate that, after introducing the dual-DQN decision model, the utilization rates of various resources on the railway cloud platform are significantly improved compared to the traditional threshold-based resource allocation method. Furthermore, the current CPU utilization, current memory utilization, current network bandwidth utilization, and current disk I / O latency all tend to be balanced, while the average disk I / O latency is reduced by 34ms. This verifies the effectiveness of the proposed resource allocation decision method in improving resource utilization efficiency and system performance.
[0119] Table 3. Performance Comparison of the Dual DQN Decision Model with Other Models
[0120] \ Current CPU utilization (%) Current memory utilization (%) Current network bandwidth utilization (%) Current disk I / O wait time (ms) Dual DQN decision model 68 75 61 135 Threshold-based traditional resource allocation methods 52 57 44 169
[0121] In summary, this invention can effectively improve the utilization efficiency of comprehensive resources of railway cloud platforms, effectively avoid resource waste and system load risks, and has good practicality, stability and promotion value. It is suitable for the complex computing environment of multi-service concurrent operation of railway cloud platforms.
[0122] Example 2
[0123] An intelligent resource allocation system for railway cloud platforms disclosed in this invention includes:
[0124] The data processing unit is used to acquire resource indicator data, preprocess the resource indicator data, and obtain resource feature sequences.
[0125] The three-layer LSTM prediction model optimization training unit is established, and the three-layer LSTM prediction model is optimized and trained using resource feature sequences.
[0126] The predictive resource feature sequence acquisition unit is used to define the resource feature sequence at the current time based on the resource feature sequence, and to perform forward calculation using the resource feature sequence at the current time in combination with a three-layer LSTM prediction model to obtain the resource feature sequence at the future time.
[0127] The integrated system state vector acquisition unit is used to define the historical time resource feature sequence according to the resource feature sequence, and to establish the integrated system state vector based on the historical time resource feature sequence, the current time resource feature sequence, and the future time resource feature sequence.
[0128] The dual-DQN decision model optimization and training unit is used to establish a dual-DQN decision model composed of an online Q-network and a target Q-network. It defines the integrated system state vector at time t based on the integrated system state vector and obtains the resource index data at time t+1. Based on the integrated system state vector at time t and the resource index data at time t+1, it establishes an empirical data quadruple and optimizes the dual-DQN decision model through the empirical data quadruple.
[0129] The optimal resource allocation decision acquisition unit is used to input the integrated system state vector at time t into the dual DQN decision model for forward calculation and output the optimal resource allocation decision.
[0130] The decision verification unit is used to verify the optimal resource allocation decision.
[0131] It should be noted that the intelligent resource allocation system for railway cloud platforms provided in this embodiment can realize all the contents of the intelligent resource allocation method for railway cloud platforms provided in Embodiment 1, and will not be repeated here.
[0132] The above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and are not intended to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for intelligent resource allocation for railway cloud platform, characterized in that, include: Obtain resource indicator data, preprocess the resource indicator data, and obtain resource feature sequences; Resource metrics include CPU utilization, CPU load change rate, memory utilization, available memory, disk I / O latency, and network bandwidth utilization. A three-layer LSTM prediction model is established, and the model is optimized and trained using resource feature sequences. Define the resource feature sequence at the current time based on the resource feature sequence, and use the resource feature sequence at the current time in combination with a three-layer LSTM prediction model to perform forward calculation to obtain the resource feature sequence at the future time. Define the historical time resource feature sequence based on the resource feature sequence, and establish a comprehensive system state vector based on the historical time resource feature sequence, the current time resource feature sequence, and the future time resource feature sequence. A dual-DQN decision model consisting of an online Q-network and a target Q-network is established. The integrated system state vector at time t is defined based on the integrated system state vector, and the resource index data at time t+1 is obtained. An empirical data quadruple is established based on the integrated system state vector at time t and the resource index data at time t+1. The dual-DQN decision model is then optimized and trained using the empirical data quadruple. Define the online Q-network as and the target Q-network as The expression for computing the target Q-value using the Bellman update formula is then wherein is a discount factor to balance immediate and long-term rewards; and are network parameters of online and target Q-networks, respectively; a is a resource scheduling action; is the comprehensive system state vector at time t+1 The expression is: In the formula, This is a weighted average of the utilization rates of various resources. For task response time, The change in energy consumption per unit time is given in this expression. , , All were normalized. and These represent the actual value and safety threshold of each resource utilization rate, respectively. Resource utilization rate refers to the resource scheduling action at time t that yields the maximum first predicted Q value. Resource index data at time t after; The integrated system state vector at time t is input into the dual DQN decision model for forward calculation, and the optimal resource allocation decision is output. Perform decision verification on the optimal resource allocation decision.
2. The intelligent resource allocation method for railway cloud platforms according to claim 1, characterized in that, Optimizing the training of a three-layer LSTM prediction model using resource feature sequences includes: Define the resource feature sequence as The resource feature sequence is input into a three-layer LSTM prediction model, which outputs a sample resource demand prediction result with 6 future time steps. The result is then compared with the actual value of the sample resource demand and the prediction error is calculated. The prediction error is quantified by the first mean squared error loss function, and the network parameters of the three-layer LSTM prediction model are optimized by the backpropagation algorithm. After multiple rounds of iterative updates until the first mean squared error loss function converges, the three-layer LSTM prediction model optimization training is completed.
3. The intelligent resource allocation method for railway cloud platforms according to claim 2, characterized in that, During the optimization training of the three-layer LSTM prediction model, a Dropout mechanism with a dropout rate of 0.1 is introduced to improve the generalization ability of the three-layer LSTM prediction model.
4. The intelligent resource allocation method for railway cloud platforms according to claim 1, characterized in that, Based on the resource feature sequence, the resource feature sequence at the current time is defined. Then, using the current resource feature sequence combined with a three-layer LSTM prediction model for forward computation, the resource feature sequence at the future time is obtained, including: Based on the resource feature sequence, the resource feature sequence at the current moment is defined as follows: The resource feature sequence at the current moment The input is fed into a three-layer LSTM prediction model for forward computation, and the output is the resource demand prediction result at the current time step with 6 future time steps. Select the last time step. As the resource feature sequence at the current moment The predicted value one hour later is the resource feature sequence for the future time.
5. The intelligent resource allocation method for railway cloud platforms according to claim 1, characterized in that, Based on the resource feature sequence, a comprehensive system state vector is established by defining the resource feature sequence at the historical moment, the resource feature sequence at the current moment, and the resource feature sequence at the future moment, including: Define the historical time-specific resource feature sequence based on the resource feature sequence, and set the historical time-specific resource feature sequence as follows: The resource feature sequence at the current moment is The resource feature sequence at future moments is By concatenating the resource feature sequences from historical moments, the current moment, and the future moments along the feature dimensions, a comprehensive system state vector with a dimension of 3m is obtained, and its expression is: 。 6. The intelligent resource allocation method for railway cloud platforms according to claim 1, characterized in that, Based on the defined state vector of the integrated system at time t, and the resource index data at time t+1, an empirical data quadruple is established based on the integrated system state vector at time t and the resource index data at time t+1. The dual DQN decision model is then optimized and trained using the empirical data quadruple, including: According to the definition of the integrated system state vector at time t, the integrated system state vector is: The integrated system state vector at time t is input into the dual DQN decision model to obtain the resource scheduling action at time t. Reward function at time t ; Resource index data at time t+1 is acquired, and preprocessed to obtain the resource feature sequence at time t+1. The resource feature sequence at time t+1 is then used in conjunction with a three-layer LSTM prediction model for forward computation to obtain the predicted resource feature sequence at time t+2. Based on the resource feature sequences at time t, time t+1, and the predicted resource feature sequence at time t+2, a comprehensive system state vector at time t+1 is established. ; Based on the integrated system state vector at time t, The integrated system state vector at time t+1 Resource scheduling actions at time t Instant rewards at time t Establish empirical data quadruples The empirical data quadruples are stored in the empirical replay pool. A portion of the empirical data quadruples are randomly selected from the empirical replay pool as training data. The online Q-network in the dual DQN decision model is iteratively updated based on the training data. The overall system state vector at time t Input the online Q network and output the resource scheduling action at time t. And the first predicted Q value; the integrated system state vector at time t+1 Input the online Q network and output the resource scheduling action at time t+1. And the second predicted Q value; Let the online Q-network select the resource scheduling action at time t+1 with the largest second predicted Q value. The resource scheduling action at time t+1 is performed by the target Q-network for the entity with the largest second predicted Q value. Perform value assessment and compare it with the reward function at time t. The summation yields the target Q value; The error between the first predicted Q value and the target Q value is minimized, and the network parameters of the online Q network are updated through the backpropagation algorithm. Then, the network parameters of the online Q network are synchronized to the target Q network according to a preset period, and the network parameters of the target Q network are updated.
7. The intelligent resource allocation method for railway cloud platforms according to claim 1, characterized in that, The integrated system state vector at time t is input into the dual-DQN decision model for forward computation, and the optimal resource allocation decision is output, including: inputting the integrated system state vector at time t into the dual-DQN decision model for forward computation to obtain the resource scheduling actions at each time t. The predicted Q-value distribution is used to select the resource scheduling action at time t with the largest first predicted Q-value. As the optimal resource allocation decision, and output the optimal resource allocation decision.
8. An intelligent resource allocation system for railway cloud platforms, characterized in that, include: The data processing unit is used to acquire resource indicator data, preprocess the resource indicator data, and obtain resource feature sequences. Resource metrics include CPU utilization, CPU load change rate, memory utilization, available memory, disk I / O latency, and network bandwidth utilization. The three-layer LSTM prediction model optimization training unit is established, and the three-layer LSTM prediction model is optimized and trained using resource feature sequences. The predictive resource feature sequence acquisition unit is used to define the resource feature sequence at the current time based on the resource feature sequence, and to perform forward calculation using the resource feature sequence at the current time in combination with a three-layer LSTM prediction model to obtain the resource feature sequence at the future time. The integrated system state vector acquisition unit is used to define the historical time resource feature sequence according to the resource feature sequence, and to establish the integrated system state vector based on the historical time resource feature sequence, the current time resource feature sequence, and the future time resource feature sequence. The dual-DQN decision model optimization and training unit is used to establish a dual-DQN decision model composed of an online Q-network and a target Q-network. It defines the integrated system state vector at time t based on the integrated system state vector and obtains the resource index data at time t+1. Based on the integrated system state vector at time t and the resource index data at time t+1, it establishes an empirical data quadruple and optimizes the dual-DQN decision model through the empirical data quadruple. Define the online Q network as The target Q network is The expression for calculating the objective Q value using the Bellman update formula is: In the formula, This is a discount factor used to balance immediate and long-term returns; and These are the network parameters for the online Q network and the target Q network, respectively; 'a' represents the resource scheduling action. This is the combined system state vector at time t+1; The expression is: In the formula, This is a weighted average of the utilization rates of various resources. For task response time, The change in energy consumption per unit time is given in this expression. , , All were normalized. and These represent the actual value and safety threshold of each resource utilization rate, respectively. Resource utilization rate refers to the resource scheduling action at time t that yields the maximum first predicted Q value. Resource index data at time t after; The optimal resource allocation decision acquisition unit is used to input the integrated system state vector at time t into the dual DQN decision model for forward calculation and output the optimal resource allocation decision. The decision verification unit is used to verify the optimal resource allocation decision.