Spectral sparsity estimation algorithm based on long short-term memory network
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2023-01-03
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to accurately estimate sparsity values in broadband spectrum sensing, leading to signal reconstruction failures or increased system overhead. Furthermore, they fail to effectively address the heterogeneity of the broadband spectrum, resulting in inaccurate sparsity estimation.
A sparsity estimation algorithm based on Long Short-Term Memory (LSTM) network is adopted. The channel is grouped by OPTICS clustering, and the LSTM parameters are optimized by combining particle swarm optimization and gradient descent algorithm (PSO-GD). The spectrum blocks are grouped by channel idleness and correlation, and the sparsity time series is predicted by LSTM.
It improves the accuracy and robustness of sparsity estimation, reduces time and performance overhead, adapts to the heterogeneity of broadband spectrum, and achieves more efficient sparsity prediction.
Smart Images

Figure CN116566521B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of broadband spectrum compressed sensing, specifically an algorithm for estimating sparsity values in broadband spectrum. Background Technology
[0002] With social development and the improvement of people's living standards, the demand for wireless spectrum from various wireless devices and services has expanded rapidly, making the effective use of spectrum increasingly important. Therefore, cognitive radio technology has been proposed as a key solution to address this insufficient spectrum access efficiency. Its core idea is to rely on spectrum sensing technology to locate unused channels, allowing secondary users to utilize these idle channels opportunistically without affecting primary users.
[0003] Many techniques have been proposed to improve spectrum sensing, but they are mainly used for narrowband spectrum signals. Research on wideband spectrum sensing has received relatively little attention. Performing wideband spectrum sensing at high frequencies requires overcoming the limitations of traditional sampling theorems at high sampling rates. Therefore, most wideband spectrum sensing techniques utilize the sparsity of wideband signals and employ compressed sensing, which performs only random subsampling of the signal, effectively reducing the sampling rate. However, in wideband compressed sensing, signal sparsity is a crucial parameter, reflecting not only spectrum occupancy but also determining the number of observations and the reconstruction algorithm. In reality, wideband signals are dynamically changing, making it very difficult to obtain accurate sparsity values. Typically, the sparsity value of a wideband signal is set to a fixed value; however, if the set value is lower than the true value, signal reconstruction will fail, while if it is higher, it will introduce unnecessary overhead to the system. Therefore, estimating the sparsity of wideband signals is of great significance.
[0004] Current research on sparsity estimation has yielded some noteworthy results. The paper [MELopes, "Unknown Sparsity in Compressed Sensing: Denoising and Inference," in IEEE Transactions on Information Theory, vol. 62, no. 9, pp. 5145-5166, Sept. 2016.] proposes a numerical estimation model to estimate the lower bound of sparsity. The paper [B. Khalfi, A. Zaid and B. Hamdaoui, "When machine learning meets compressive sampling for wideband spectrum sensing," (IWCMC), Valencia, Spain, 2017, pp. 1120-1125.] uses a supervised learning algorithm to train a sparsity prediction model in wideband compressed spectrum sensing using observations and other features. References [Y. Gao, Y. Si, B. Zhu and Y. Wei, "Sparsity Order Estimation Algorithm In Compressed Sensing by Exploiting Slope Analysis," (IWCMC), Limassol, Cyprus, 2018, pp. 753-756.] employ principal component analysis and slope analysis, utilizing the eigenvalue distribution of the covariance matrix to estimate signal sparsity. References [Ma Bin, Wang Hongming, Xie Xianzhong. Improved Broadband Compressed Spectrum Detection Scheme Based on Binomial Distribution [J]. Journal of Electronics, 2020(02): 243-248.] propose a method for simultaneously and accurately estimating the upper and lower bounds of sparsity, utilizing the precise confidence interval of the binomial distribution and the observation vector to estimate the upper and lower bounds of sparsity. This achieves the goal of accurately estimating the upper and lower bounds of signal sparsity given the sensing matrix and observation vector.
[0005] Most of the above research focuses on a homogeneous broadband spectrum, meaning the entire broadband spectrum is treated as a single spectral block with multiple frequency bands. Sparsity estimation is then performed across the entire broadband spectrum, assuming the sparsity level is uniform, which is inconsistent with reality. Furthermore, when discussing broadband spectrum grouping, the aforementioned literature treats grouping information as prior knowledge, artificially pre-setting or distinguishing different channel groups. None of these studies consider that in reality, the occupancy characteristics of different frequency bands in a broadband spectrum may differ significantly, and the sparsity of different spectral blocks may also vary. Summary of the Invention
[0006] This invention aims to solve the problems of the prior art. It proposes a spectral sparsity estimation algorithm based on Long Short-Term Memory (LSTM) networks. The technical solution of this invention is as follows:
[0007] An algorithm for estimating spectral sparsity based on long short-term memory networks includes the following steps:
[0008] 101. First, for a heterogeneous broadband spectrum access system, it is considered to contain n channels. Based on their characteristics, channels with similar activity characteristics are clustered into a spectrum block. The characteristics used for clustering are channel idleness and channel correlation. The channel clustering and block division process for the broadband spectrum is planned in three steps: First, the channels in the broadband spectrum are coarsely grouped in the frequency domain according to the channel idleness of each channel; second, after coarse grouping, the channels within the group are further subdivided according to their channel correlation in the time domain, so that the channels in the group obtained by the second grouping have highly similar user activity characteristics; finally, the channels that have not been clustered are randomly added to the adjacent frequency domain spectrum block as noise points.
[0009] 102. Based on the broadband spectrum segmentation results obtained in step 101, firstly, the time series of sparsity value changes of each spectrum block is obtained as the input of LSTM. The parameters of LSTM are optimized by particle swarm optimization (PSO) and gradient descent (GD) algorithm. If the model training is completed and cross-validation is passed, the model is put online to use real-time data to predict the results. If the prediction effect deteriorates or the model training fails to pass validation, PSO-GD is re-performed to optimize the parameters, so as to finally achieve the goal of accurate prediction and estimation of real-time data.
[0010] Furthermore, before performing clustering and grouping in step 101, the acquired broadband spectrum historical data should first be formatted and preprocessed. That is, based on the power spectrum magnitude collected in the data, if it exceeds a threshold, the channel is considered to be occupied, denoted as x. i =1. Channel idleness describes the degree of idleness of a channel; the higher the idleness, the lower the probability that the channel is occupied. Channel idleness h can be expressed as:
[0011]
[0012] Among them, T f I represents the idle time of the channel, and T is the total time. I represents the idle state of the channel, that is, the channel is idle during x. iThe time value is 0. M is the total number of time slots. Channel correlation measures the probability of state consistency between two channels; that is, the higher the correlation between two channels, the greater the probability of state consistency or opposition between them, and the more synchronous the changes in occupied states. Its definition is shown in the formula:
[0013]
[0014] Where p ij It is the channel correlation between the i-th channel and the j-th channel. and Let A and B be the channel occupancy states of the i-th and j-th channels in time slot m, respectively. The function I(A) is the discriminant function; if A is true, then I(A) = 1. Clearly, when p... ij As p approaches 0.5, channels i and j become less correlated; when p... ij The closer p is to 1, the more consistent the state transitions of the two channels become; ij If the value is closer to 0, it indicates that the transformation behavior between the two channels tends to be consistent, but the states are opposite.
[0015] Furthermore, the clustering scheme used in step 101 is the OPTICS algorithm, specifically:
[0016] By treating the channel as an object in the spectrum space and distributing the characteristic values of each channel as data points onto the corresponding broadband spectrum frequency domain, we can obtain the sample point set X = {x1, x2, ..., x...}. m The distances between points obtained can be used as the distances between channels during clustering. Simultaneously, by setting the initial ε and MinPts parameters, channels can be clustered and grouped based on density. The specific steps are as follows:
[0017] 1) Initialize the core object collection
[0018] 2) Iterate through the elements of X. If an element is a core object, add it to the core object collection Ω.
[0019] 3) If all elements in the core object set Ω have been processed, the algorithm ends; otherwise, proceed to step four.
[0020] 4) In the core object set Ω, randomly select an unprocessed core object o, first mark o as processed, and push it into the ordered list p. Finally, put the unvisited points in o's ε-neighborhood into the seed set seeds in order according to the reachability distance (calculate the reachability distance from the unvisited neighbor points to o).
[0021] 5) If the seed set Jump to step three. Otherwise, select the seed with the closest reachability from the seed set seeds, mark it as visited and processed, and push the seed into the ordered list p. Then, determine whether the seed is a core object. If it is, add the unvisited neighbor points in the seed to the seed set and recalculate the reachability distance.
[0022] To obtain more clustering results, the neighborhood parameter can be broadened as much as possible to cluster more channels, thereby selecting more idle channels, since many channels with similar features are occupied. To improve the resolution of the clustering channels, multiple clustering operations can be performed, making the bandwidth of the clustering channels narrower.
[0023] Furthermore, the LSTM-related attributes of step 102 are as follows:
[0024] Use tanh as the activation function:
[0025]
[0026] Candidate variables generated by the tanh operation in cell input As obtained from the formula:
[0027]
[0028] Where b c This represents the bias amount of the state unit.
[0029] The values generated by the input gate, forget gate, and output gate operations all use sigmoid as the activation function in this paper.
[0030] Γ u =σ(W u [a t-1 y t ]+b u )
[0031] Γ f =σ(W f [a t-1 y t ]+b f )
[0032] Γ o =σ(W o [a t-1 y t ]+b o )
[0033]
[0034] Among them, W u Wf and W o b represents the weight of the corresponding unit. u b f and b o It is also the bias quantity of the corresponding unit.
[0035] Meanwhile, cell state c t From the Gate of Oblivion f Compared with the previous time slot cell state and input gate Γ u With candidate variables The output quantity a is jointly determined. t It is the output gate Γ o With candidate vector c t The product of hyperbolic tangents.
[0036]
[0037] a t =Γ o *tanh(c t )
[0038] Furthermore, in step 102, the step of optimizing the LSTM parameters using PSO-GD is as follows:
[0039] GD section:
[0040] 1) Set hyperparameters, including the structure of the LSTM, the dimensions of the LSTM input and output, represented as input_dim and output_dim, the learning rate, the time step, and the number of iterations.
[0041] 2) Select the gradient descent algorithm, input the time series to be calculated and the model parameters, i.e. the weights and biases of the LSTM, and train the LSTM for T iterations, denoted as W(t), t=1,2,...,T.
[0042] PSO section:
[0043] 1) Calculate the total number of parameters that need to be optimized. For a standard LSTM, which consists of an input layer with input_dim, a hidden layer with H units, and a fully connected layer with output_dim, the number of parameters can be calculated as 4*((H+input_dim)*H+H)+(H+output_dim). Based on the structure of the LSTM, this can be represented as W=(W f W i W c W o ,b f ,b i ,b c ,b o ).
[0044] 2) Initialize the PSO population using W(t) obtained from the GD part. The PSO population is initialized by considering the convergence information and local escape requirements involved in W(t). Some optimized parameters are randomly selected as part of the initial particles, or parameters are selected based on the prediction error. To potentially cover more local areas, an alternative selection method that generates competing particles is used. The number of training iterations T is divided into M batches, and the parameter set... The initial particles are selected as the basis. That is, the weight parameters obtained from different iterations will be selected as the initial position of each particle. One-third of the population size of particles are directly copied from these parameter sets, another one-third of the particles are generated by adding Gaussian perturbations to the obtained parameter set W(T), and the last one-third of the particles are randomly generated within the range of W(T) and W(TT / M) considering the weight convergence process in GD.
[0045] 3) Calculate the particle fitness. Each particle corresponds to an LSTM network. This paper uses the root mean square error (RMSE) between the LSTM output and the true value as the particle fitness.
[0046] 4) Specify the number of particles as i, the number of iterations per particle as s, and then operate the PSO algorithm according to the following formula:
[0047] V i (s+1)=ω*V i (s)+r1c1(pbest i -W i )+r2c2(gbest-W i W i (s+1)=W i (s)+V i (s+1)
[0048] Where ω is the inertial weight, controlling the influence of historical data on the new velocity, and c1 and c2 are the weights of each particle X. i The acceleration terms pulling towards its current best position and overall best position, r1 and r2 are two random numbers in the range [0,1], whose values are obtained through trial and error.
[0049] 5) The best-performing particle will serve as the starting point for the next round of GD.
[0050] The advantages and beneficial effects of this invention are as follows:
[0051] 1. According to step 101, current broadband spectrum sensing technology considers that the broadband spectrum is not homogeneous, and the occupancy characteristics of different frequency bands may differ significantly. Clustering is introduced to divide the broadband spectrum into multiple spectrum blocks, making the channels within each spectrum block have highly similar user activity characteristics. The variation of spectrum sparsity values also shows more regularity, reducing random variations caused by unrelated channels and facilitating further sparsity estimation.
[0052] 2. Following step 102, the time series of historical sparsity value changes are first obtained from the clustered spectral blocks. The parameters of the LSTM are continuously optimized using POS-GD to obtain a predictive model that accurately estimates the time series. When the accuracy meets the requirements, the model is deployed online to predict real-time data. If, over time, the prediction accuracy no longer meets the requirements, the model is taken offline and retrained using historical data until a model that meets the requirements is obtained again. Ultimately, this will satisfy the user's goal of accurately estimating broadband spectral sparsity values. Attached Figure Description
[0053] Figure 1 This is a schematic diagram of a broadband heterogeneous network considered in this invention;
[0054] Figure 2 This is a flowchart of the spectrum sparsity estimation algorithm based on long short-term memory network of the present invention;
[0055] Figure 3 It is a 24-hour overlay of the power spectrum of the spectral data used in the specific implementation method;
[0056] Figure 4 A comparison of the fitting effects of different methods for estimating and predicting sparsity values;
[0057] Figure 5 Comparison of algorithmic time costs for different methods on training sets of different lengths;
[0058] Figure 6 To compare the fitting performance of estimation and prediction without using clustering and with clustering;
[0059] Figure 7 This is a comparison of the performance of models after deployment, with and without clustering grouping, before estimation and prediction. Detailed Implementation
[0060] The technical solutions of the embodiments of the present invention will be clearly and thoroughly described below with reference to the accompanying drawings. The described embodiments are merely some embodiments of the present invention.
[0061] The technical solution of the present invention to solve the above-mentioned technical problems is:
[0062] In compressed sensing of broadband spectrum, this method addresses the challenge of directly and accurately obtaining spectral sparsity values. Considering the non-homogeneous nature of broadband spectrum in real-world scenarios, it employs the OPTICS clustering method to group the broadband spectrum into blocks based on spectral occupancy and inter-band correlation. Furthermore, it estimates the time series of the grouped spectral sparsity using a Long Short-Term Memory (LSTM) network. This approach achieves superior accuracy while minimizing time and performance overhead compared to traditional sensing schemes.
[0063] The spectral sparsity estimation algorithm proposed in this invention includes the following steps:
[0064] Step 1: Format the historical broadband spectrum data into 0-1 spectrum status data, then calculate the spectrum idleness of each channel and the spectrum correlation between channels. Using these characteristics, perform OPTICS clustering to group the channels, ensuring that each channel in the grouped spectrum blocks has similar primary user activity characteristics, i.e., a high degree of similarity in spectrum occupancy, spectrum occupancy status, and changes.
[0065] Step 2: After dividing the broadband spectrum into blocks, obtain the time series of sparsity value changes for each spectrum block. Input this data into the LSTM network, and after continuous PSO-GD parameter optimization and cross-validation, train until a model with satisfactory estimation accuracy is obtained. Then, the model can be deployed online to predict real-time data.
[0066] To verify the performance of the LSTM sparsity estimation algorithm proposed in this invention, real measured spectrum data was used. The data consisted of broadband power spectrum data measured by the EU 5G-Xcast project in 2016. The measurement point was the Turku Spectroscopic Observatory in Finland, 40 meters above the ground. This dataset contained data in the range of 1200MHz-3000MHz, with a spectral resolution of 39.0625kHz, a scan time of 3s, and intensity units in dBm. According to the IEEE 802.22 specification for cognitive networks, the energy detection threshold was set to -107dBm / 200kHz. That is, if the collected power value was greater than this threshold, the channel was considered occupied and marked as 1; otherwise, it was considered idle and marked as 0.
[0067] During the simulation, the OPTICS clustering algorithm was set with parameters ε = 10 and MinPts = 20. The PSO-GD-LSTM parameters were set as follows: number of cells 5, learning rate 0.0001, initial training time 400s, particle swarm size 150, particle dimension 146, learning factors c1 = c2 = 2, ω = 0.8, and PSO iterations 50. To further highlight the superiority of this invention, it will be compared with prediction algorithms using Long Short-Term Memory (LSTM) networks without parameter optimization, LSTM networks using only PSO or GD optimization, and prediction algorithms based on traditional Neural Networks (NNs), comparing their estimation accuracy, computational time cost, and training time cost.
[0068] Figure 4 The results show the estimation and prediction performance of five methods on time series data with sparsity values. It can be seen that the PSO-GD-LSTM method used in this invention performs best, and both PSO-LSTM and GD-LSTM show improvements over the unoptimized LSTM. This indicates that the proposed method has certain advantages, but due to… Figure 5 It can be observed that this method has a higher time cost compared to other methods. This is because this method uses multiple parameters for optimization in each iteration, resulting in certain overhead. However, it can still be said that this method achieves better estimation, prediction, and fitting results without a significant increase in time consumption compared to traditional methods.
[0069] like Figure 5 As shown, the PSO-GD-LSTM method incurs significant time overhead during estimation and prediction. However, as the total estimation and prediction time increases, the overall time overhead gradually decreases. This is because, compared to other methods, the model trained using this method exhibits higher robustness. In a combined online and offline system, as the estimation and prediction time increases, this method requires fewer offline retraining sessions due to decreased accuracy. Consequently, as the total runtime increases, the time overhead of this method will be less than that of other methods.
[0070] like Figure 6 As shown, the sparsity estimation of the broadband spectrum after clustering is significantly better than the estimation result of treating it as a whole without grouping. This is because this method takes into account that the broadband spectrum is not homogeneous in reality. It successfully utilizes the results of clustering to estimate and predict the sparsity values of spectral blocks with different activity characteristics in the broadband spectrum separately. This avoids the possibility that the channel states with different activity characteristics in the entire broadband spectrum may affect each other, causing the sparsity values to tend towards a regression value, which in turn affects the estimation and prediction of LSTM.
[0071] like Figure 7As shown, even after training the un-clustered model and the clustered model to the same accuracy level, the un-clustered model will still lose prediction accuracy within a short period and will be unable to make sustained predictions. This is because during training, it treats the entire broadband spectrum as a whole, including numerous channels with varying activity characteristics, and trains them all. The sheer number of such chaotic variations in the broadband spectrum makes the resulting model more of a regression model, which is difficult to fit the constantly changing broadband signal.
[0072] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.
[0073] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0074] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0075] The above embodiments should be understood as illustrative only and not as limiting the scope of protection of the present invention. After reading the description of the present invention, those skilled in the art can make various alterations or modifications to the present invention, and these equivalent changes and modifications also fall within the scope defined by the claims of the present invention.
Claims
1. A spectral sparsity estimation algorithm based on long short-term memory network, characterized in that, Includes the following steps: 1) First, for a heterogeneous broadband spectrum access system, it is considered to contain n channels. Based on their characteristics, channels with similar activity characteristics are clustered into a spectrum block. The characteristics used for clustering are channel idleness and channel correlation. The channel clustering and block division process of the broadband spectrum is planned in three steps. First, the channels in the broadband spectrum are coarsely grouped in the frequency domain according to the channel idleness of each channel. Second, after coarse grouping, the channels in the group are further subdivided according to their channel correlation in the time domain, so that the channels in the group obtained by the second grouping have highly similar user activity characteristics. Finally, the channels that have not been clustered are randomly added to the adjacent frequency domain spectrum block as noise points. 2) Based on the broadband spectrum segmentation results obtained in the previous step, the time series of sparsity value changes of each spectrum block are first obtained as the input of LSTM. The parameters of LSTM are optimized by particle swarm optimization (PSO) and gradient descent (GD) algorithm. If the model training is completed and cross-validation is passed, the model is put online to use real-time data to predict the results. If the prediction effect deteriorates or the model training fails to pass the validation, PSO-GD is re-performed to optimize the parameters, so as to finally achieve the goal of accurate prediction and estimation of real-time data.
2. The long short-term memory network-based spectral sparseness estimation algorithm according to claim 1, wherein, The formatting of historical power spectrum data follows the IEEE 802.22 specification for cognitive networks. The energy detection threshold is set to -107dBm / 200khz. That is, when the collected power value is greater than this threshold, the channel is considered occupied and marked as 1; otherwise, it is considered idle and marked as 0. This converts the original power spectrum data into spectrum state data with x=0 or x=1.
3. The long short-term memory network-based spectral sparseness estimation algorithm according to claim 1, wherein, The density-based OPTICS clustering algorithm is used to divide the spectrum into blocks in the frequency domain based on occupancy and the correlation between channels in the time domain.
4. The spectral sparsity estimation algorithm based on long short-term memory networks according to claim 3, characterized in that, The basis for clustering and grouping channels is the channel's spectral occupancy and inter-channel correlation, among which, channel idleness... It can be represented as: ; in, It is the idle time of the channel. Total time This represents the channel idle state, that is, the channel is in... When it is 0, This represents the total number of time slots. Channel correlation is defined as shown in the formula: ; in It is the first The first channel and the first Channel correlation between channels and The first , Each channel in the time slot Channel occupancy state, function For the discriminant function, if If true, then =1; when As it approaches 0.5, , The less correlated the two channels are; when The closer the transition is to 1, the more consistent the state transitions of the two channels become; If the value is closer to 0, it indicates that the transformation behavior between the two channels tends to be consistent, but the states are opposite.
5. The spectral sparsity estimation algorithm for Long Short-Term Memory networks as described in claim 4, characterized in that, For the grouped spectral blocks, the time series of their sparsity value changes is obtained as input to the Long Short-Term Memory network, in order to achieve an accurate estimation of the sparsity value in the time series.
6. The spectral sparsity estimation algorithm for Long Short-Term Memory networks as described in claim 5, characterized in that, Particle swarm optimization (PSO) and gradient descent (GD) algorithms are used to optimize the parameters of the LSTM in the hope of obtaining a better prediction estimation model. The optimized parameters for PSO-GD are as follows: GD section: 1) Set hyperparameters, including the LSTM structure, the dimensions of the LSTM input and output, expressed as follows: and Learning rate, time step, and number of iterations; 2) Select the gradient descent algorithm, input the time series data to be calculated and the model parameters (i.e., the weights and biases of the LSTM), and train the LSTM for T iterations, denoted as T. , ; PSO section: 1) Calculate the total number of parameters that need to be optimized. For a standard LSTM, that is, it has... The input layer has The hidden layer of each unit and having The number of parameters in the output fully connected layer can be calculated as follows: Based on the structure of LSTM, it can be represented as ; 2) Obtained from the GD section Initialize the PSO population by considering The convergence information and local escape requirements involved in the initialization of the PSO population are used. Some optimized parameters are randomly selected as part of the initial particles, or parameters are selected based on the prediction error. To cover as much local area as possible, an alternative selection method for generating competing particles is used; the number of training iterations is also considered. Divide into equal parts Batch, parameter set Chosen as the basis of the initial particles The weight parameters obtained from different iterations will be selected as the initial position for each particle. One-third of the population size of particles will be directly copied from these parameter sets, while the remaining one-third of the particles will be obtained through the parameter sets. The final 1 / 3 of the particles are generated by adding Gaussian perturbations. Considering the weighted convergence process in GD, in and Randomly generated within the range specified; 3) Calculate the fitness of the particles. Each particle corresponds to an LSTM network. The root mean square error (RMSE) between the LSTM output and the true value is used as the fitness of the particles. 4) Specify the number of particles as The number of iterations in each particle is specified as Then, operate the PSO algorithm according to the following formula: ; in It is the inertia weight, which controls the influence of historical data on the new speed. and It is to make each particle The acceleration term pulling towards its current best position and the overall best position. and These are two random numbers in the range [0,1], whose values are obtained through trial and error. The best-performing particle will serve as the starting point for the next round of the GD section.