A prediction method and system for optimal decomposition of small sample data of ocean waves

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using an improved energy valley optimization algorithm and a multi-layer temporal convolutional network with a self-attention mechanism, combined with a bidirectional long short-term memory network, the wave prediction problem under small sample data conditions was solved, achieving high-precision and stable multi-step wave prediction to meet the high-frequency needs of marine development activities.

CN122241084APending Publication Date: 2026-06-19SHANDONG JINGHAI INSTR EQUIP CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHANDONG JINGHAI INSTR EQUIP CO LTD
Filing Date: 2026-03-18
Publication Date: 2026-06-19

Application Information

Patent Timeline

18 Mar 2026

Application

19 Jun 2026

Publication

CN122241084A

IPC: G06F18/20; G06F18/213; G06N3/096; G06N3/045; G06N3/0442; G06F18/22; G06F18/25; G06N3/049; G06N3/0464; G06F18/2433; G06F18/15; G06N3/082; G06N7/08; G06F18/23; G06F18/214; G06N3/006; G06F123/02

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing hybrid forecasting methods struggle to achieve high-precision wave prediction under small sample data conditions. Optimized algorithms suffer from low execution efficiency, uneven initial population distribution, premature convergence, and data decomposition quality that relies on subjective judgment. Furthermore, the stability of modal components is difficult to guarantee, failing to meet the high-frequency demands of marine development activities.

⚗Method used

An improved energy valley optimization algorithm is used for adaptive optimization. Combining variational mode decomposition and multi-layer temporal convolutional networks, a self-attention mechanism is embedded. A bidirectional long short-term memory network is used for optimal decomposition and prediction of ocean wave data. An initial population is generated by Holton sequence and Latin hypercube sampling. An augmented Lagrangian model is constructed for parameter optimization to achieve efficient capture of long-distance temporal dependencies and extraction of key features.

🎯Benefits of technology

It significantly improves the decomposition quality and prediction accuracy of small sample ocean wave data, overcomes the shortcomings of traditional optimization algorithms, and achieves efficient and stable multi-step wave prediction, providing reliable decision support for marine disaster early warning and marine engineering.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122241084A_ABST

Patent Text Reader

Abstract

This invention relates to the field of marine engineering technology, and in particular to a prediction method and system for optimal decomposition of small sample ocean wave data. The method includes acquiring hourly small sample wave feature data of a target sea area; adaptively optimizing the number of decomposition layers and penalty factor of a variational mode decomposition algorithm based on an improved energy valley optimization algorithm; performing variational mode decomposition on standardized time-series data according to the optimal decomposition parameters; constructing a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters; embedding a self-attention mechanism in each layer of the temporal convolutional network; inputting several intrinsic mode function components into the multi-layer temporal convolutional network embedded with the self-attention mechanism; and using a bidirectional long short-term memory network to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation. This invention provides reliable decision support for marine disaster early warning, fishery production, and marine engineering safety.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of marine engineering technology, and in particular to a prediction method and system for optimal decomposition of small sample data of ocean waves. Background Technology

[0002] Ocean waves, as a core dynamic element of the marine environment, have an irreplaceable impact on marine disaster early warning, fishery production, and marine engineering construction. Accurate wave prediction is key to ensuring the safety of offshore operations, improving operational efficiency, and reducing risk losses. With the increasing frequency of global marine development activities, traditional wave prediction methods based on physical laws, statistics, and numerical simulation have gradually evolved into hybrid intelligent forecasting systems that integrate physical models, numerical calculations, and artificial intelligence technologies. Among these, hybrid models that combine physical statistical methods and numerical models with deep learning architecture at their core have become the mainstream research direction in the field.

[0003] However, existing hybrid forecasting methods still have significant limitations: most methods rely on the correlation of multivariate spatiotemporal elements and long-term series observation data to achieve ideal prediction results. In practical scenarios such as remote offshore stations, temporary emergency observations, or the initial stage of new construction projects, there are often problems such as small data sample size, only single-source data, and missing key auxiliary variables. There is a severe lack of specialized forecasting methods for such scenarios. There is a lack of small sample data feature enhancement strategies, and the model structure has not been optimized for single data sources and variable scarcity, making it difficult to meet high-frequency practical needs. At the same time, although existing integrated forecasting models based on information decomposition and artificial intelligence have improved their generalization ability by combining networks such as LSTM and GRU with optimization algorithms such as GWO and SSA and decomposition algorithms such as EMD and VMD, they still have problems such as low execution efficiency of optimization algorithms, uneven initial population distribution, and premature convergence. Moreover, the quality of data decomposition depends on subjective judgment, and the modal component stability is difficult to guarantee, which cannot meet the high-precision and all-time requirements of wave prediction for marine development activities. At present, there is a need for a forecasting method and system for optimal decomposition of small sample data of ocean waves. Summary of the Invention

[0004] To address the problem of insufficient prediction accuracy across all time periods caused by weak optimization algorithms and incomplete data decomposition in existing hybrid forecasting methods, this invention provides a prediction method and system for optimal decomposition of small sample ocean wave data.

[0005] In a first aspect, the present invention provides a prediction method for optimal decomposition of small sample ocean wave data, which adopts the following technical solution: A prediction method for optimal decomposition of small sample ocean wave data includes: Hourly small sample wave characteristic data of the target sea area are obtained and standardized to obtain standardized time series data; The optimal decomposition parameters are obtained by adaptively optimizing the number of decomposition levels and the penalty factor of the variational mode decomposition algorithm based on the improved energy valley optimization algorithm. Variational mode decomposition is performed on standardized time series data based on the optimal decomposition parameters to obtain several intrinsic mode function components. Construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, and embed a self-attention mechanism in each layer of the temporal convolutional network to calculate the correlation weights between time steps in the temporal sequence and extract key temporal features; Several intrinsic mode function components are input into a multi-layer temporal convolutional network embedded with a self-attention mechanism to obtain a high-dimensional temporal feature representation; A bidirectional long short-term memory network is used to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capturing forward historical dependency features and backward trend features, and outputting the predicted value of effective wave height in the future multiple steps.

[0006] Furthermore, the adaptive optimization of the decomposition level and penalty factor of the variational mode decomposition algorithm includes generating an initial population for the energy valley optimization algorithm using an initialization strategy that combines Holden sequences and Latin hypercube sampling. Each particle in the initial population represents a set of variational mode decomposition parameter combinations containing the decomposition level and penalty factor. Holden low-discrepancy sequences are generated based on different prime cardinality, and randomness is enhanced by adding Gaussian random perturbations. The Holden low-discrepancy sequences are mapped to the solution space of the variational mode decomposition parameters to obtain the initial particle positions. The solution space is divided into several equally divided intervals for each dimension, and the particle distribution density in each interval is detected. Particles in intervals with particle densities exceeding a preset threshold are transferred to sparse intervals. The formula for calculating the value of the j-th particle in the d-th dimension of the Holden sequence is: , in, For population size index, For population dimension indexing, Let d be the base of the prime numbers arranged in prime order. To meet The largest integer, For j in cardinality The k-th digit below, This is a modulo operation.

[0007] Furthermore, the adaptive optimization of the number of decomposition levels and penalty factor in the variational mode decomposition algorithm further includes constructing a fitness function based on energy entropy. This fitness function quantifies the mode separation quality of the variational mode decomposition. Variational mode decomposition is performed on standardized time-series data to obtain K intrinsic mode function components. The energy proportion and energy entropy value of each intrinsic mode function component are calculated. A fitness function containing the energy entropy value, a decomposition level penalty term, and a low-energy mode penalty term is constructed. The fitness value of each particle is calculated based on the fitness function. The particle positions are iteratively updated based on an elite retention strategy, a neighbor particle average position update mechanism, and a normal distribution perturbation mechanism until the convergence condition is met, obtaining the optimal decomposition parameters. The expression for the fitness function is: , in, For energy entropy, The number of decomposition layers, To decompose the layer number penalty coefficient, The preset low energy threshold, For low energy penalty coefficient, Let be the energy of the j-th eigenmode function component.

[0008] Furthermore, the step of performing variational mode decomposition on the standardized time series data according to the optimal decomposition parameters includes establishing a constrained variational model for variational mode decomposition based on the optimal decomposition level K and the penalty factor in the optimal decomposition parameters, and using the standardized time series data as the input signal. And set each intrinsic mode function. For amplitude-modulated and frequency-modulated signals, the center frequencies of each eigenmode function are determined by solving the constrained variational model. The constraint variational model expression is given by a constraint that minimizes the sum of the estimated bandwidths of all intrinsic mode functions, and the sum of all intrinsic mode functions equals the input signal. , in, To determine the optimal number of decomposition layers, Let k be the intrinsic mode function. Let be the center frequency of the k-th eigenmode function. For input signal, It is a unit impulse function. For convolution operations, To obtain the partial derivative with respect to time, For constraint operators, It is a complex exponential modulation term. It is the imaginary unit.

[0009] Furthermore, the variational mode decomposition of standardized time-series data based on optimal decomposition parameters also includes incorporating Lagrange multipliers. and secondary penalty factor The constrained variational model is transformed into an unconstrained augmented Lagrangian model, where the quadratic penalty factor... As a penalty factor in the optimal decomposition parameters, the eigenmode functions are iteratively updated using the alternating direction multiplier method. Center frequency and Lagrange multipliers The optimal solutions for each eigenmode function are obtained, and then the components of the several eigenmode functions are obtained. The expression of the augmented Lagrange model is: , in, To augment the Lagrange function, As a secondary penalty factor, For Lagrange multipliers, This is for inner product operations.

[0010] Furthermore, the construction of a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters includes configuring the number of layers of the temporal convolutional network according to the optimal number of decomposition layers, and setting an exponentially increasing dilation rate for each layer of the temporal convolutional network. ; Based on the dilation rate, perform dilated causal convolution operation on the input time series sequence, so that the first... The receptive field of each layer expands exponentially with network depth to capture long-range temporal dependencies without increasing the convolution kernel parameters. The output of the dilated causal convolution operation at sequence element s is calculated according to the following formula: , in, For expansion rate, For filter size, Given the input time series sequence, For a one-dimensional convolution kernel of a temporal convolutional network, The position index of the input sequence. For convolution kernel index, This is the current output position. This represents the optimal number of decomposition layers.

[0011] Furthermore, the extraction of key temporal features includes embedding a self-attention layer after the dilated causal convolutional layer, performing global dependency modeling on the local temporal features output by the dilated causal convolutional layer, adding the output of the self-attention layer to the input of the dilated causal convolutional layer using residual connections, generating a query matrix Q, a key matrix K, and a value matrix V through linear transformation, calculating association weights based on a scaled dot product attention mechanism, and performing a weighted summation of the value matrix according to the association weights to obtain a key temporal feature representation that integrates global association information. This key temporal feature representation enhances the sensitivity to abnormal nodes and core trend segments in the time series.

[0012] Furthermore, obtaining the high-dimensional temporal feature representation includes inputting several intrinsic mode function components as multi-channel input data into a multi-layer temporal convolutional network, and stacking all intrinsic mode function components along the time dimension to form an input feature tensor; Temporal features are extracted layer by layer through the residual blocks of the multi-layer temporal convolutional network. Each residual block includes two dilated causal convolutional layers. After each dilated causal convolutional layer, a normalization layer, a spatial dropout layer, and a ReLU activation layer are connected in sequence. The output of the ReLU activation layer is connected to the self-attention layer. In the residual block, when the input and output dimensions do not match, the input features are linearly projected to achieve dimension matching. Then, the output of the self-attention layer is added element-wise to the dimension-matched input features. The residual blocks are stacked layer by layer until the last layer, and the high-dimensional temporal feature representation is output.

[0013] Furthermore, the output of the future multi-step effective wave height prediction value includes constructing a bidirectional long short-term memory network composed of a forward long short-term memory sub-network and a backward long short-term memory sub-network, and simultaneously inputting the high-dimensional temporal feature representation into the forward and backward long short-term memory sub-networks; The forward long short-term memory subnetwork traverses the high-dimensional temporal feature representation from left to right, captures the forward historical temporal dependency features of ocean wave data, and outputs the forward hidden state sequence. The backward long short-term memory subnetwork traverses the high-dimensional temporal feature representation from right to left, captures the backward future trend dependency features of ocean wave data, and outputs the backward hidden state sequence. The forward and backward hidden states at the corresponding time steps are concatenated and fused to obtain a fused bidirectional temporally dependent feature sequence, which is then input into a fully connected layer. Through linear transformation and activation mapping, the effective wave height prediction values for future multiple steps are obtained.

[0014] Secondly, a prediction system for optimal decomposition of small sample ocean wave data includes: The data acquisition module is configured to: acquire hourly small sample wave feature data of the target sea area, and obtain standardized time series data through missing value processing, outlier removal and standardization operations; The optimization module is configured to: adaptively optimize the number of decomposition layers and the penalty factor of the variational mode decomposition algorithm based on the improved energy valley optimization algorithm to obtain the optimal decomposition parameters; The transformation module is configured to perform variational mode decomposition on the standardized time series data according to the optimal decomposition parameters to obtain several intrinsic mode function components. The model module is configured to: construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, embed a self-attention mechanism in each layer of the temporal convolutional network, calculate the correlation weights between each time step in the temporal sequence, and extract key temporal features; The feature extraction module is configured to input several intrinsic mode function components into a multi-layer temporal convolutional network embedded with a self-attention mechanism to obtain a high-dimensional temporal feature representation. The output module is configured to: use a bidirectional long short-term memory network to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capture forward historical dependency features and backward trend features, and output the predicted value of the effective wave height in the future multiple steps.

[0015] In summary, the present invention has the following beneficial technical effects: 1. This invention generates the initial population for the energy valley optimization algorithm by using an initialization strategy that combines Holton sequence and Latin hypercube sampling. It also incorporates a fitness function based on energy entropy, decomposition level penalty term, and low-energy mode penalty term to adaptively optimize the decomposition level and penalty factor of variational mode decomposition. This achieves the globally optimal configuration of decomposition parameters for small sample ocean wave data, effectively overcoming the shortcomings of traditional optimization algorithms such as uneven initial population distribution, insufficient population diversity, and premature convergence. It significantly improves the decomposition quality and algorithm convergence speed, and avoids the computational redundancy and overfitting risks caused by blindly increasing the number of decomposition levels.

[0016] 2. This invention establishes a constrained variational model for variational mode decomposition, introduces Lagrange multipliers and a quadratic penalty factor to construct an augmented Lagrange model, and uses the alternating direction multiplier method for iterative solution to achieve optimal decomposition of small sample wave data. It obtains intrinsic mode function components with low mode aliasing, clear frequency band division, and high stability, effectively capturing the inherent modal characteristics of the signal, transforming the quality of data decomposition into quantifiable numerical indicators, realizing automatic optimization of decomposition parameters, and providing high-quality feature input for prediction.

[0017] 3. This invention constructs a multi-layer temporal convolutional network that matches the optimal number of decomposition layers, configures each layer with an exponentially increasing dilation rate, and uses dilated causal convolution operations to exponentially expand the receptive field without increasing the convolution kernel parameters. This achieves efficient capture of long-distance temporal dependencies, effectively avoids information blind spots, and alleviates the gradient vanishing problem of deep networks through residual connection mechanisms, significantly improving the training stability and feature extraction capability of the model.

[0018] 4. This invention embeds a self-attention mechanism in each layer of a multi-layer temporal convolutional network and uses scaled dot product attention to calculate the correlation weights between time steps in the temporal sequence. This enables collaborative modeling of local temporal features and global dependencies, enhances the sensitivity to key information such as data anomalies and core trend segments, effectively filters redundant noise, generates more discriminative high-level feature representations, and significantly improves the model's generalization ability and robustness to complex temporal patterns.

[0019] 5. This invention achieves high-precision multi-step prediction of ocean wave data under conditions of small sample size, single variable, and lack of auxiliary information by deeply integrating and coordinating improved energy valley optimization algorithm, variational mode decomposition, temporal convolutional network and bidirectional long short-term memory network. It effectively solves the technical problems of low prediction model configuration efficiency, rapid decay of long-term prediction accuracy, weak detail capture ability and insufficient extreme value prediction ability in existing technologies under limited sample size scenarios, and provides reliable decision support for marine disaster early warning, fishery production and marine engineering safety. Attached Figure Description

[0020] Figure 1 This is a schematic diagram of the overall process of a prediction method for optimal decomposition of small sample data of ocean waves according to an embodiment of the present invention.

[0021] Figure 2 This is an overall model structure diagram of a prediction method for optimal decomposition of small sample ocean wave data according to an embodiment of the present invention.

[0022] Figure 3 This is a schematic diagram of the residual block in the overall model structure of a prediction method for optimal decomposition of small sample ocean wave data according to an embodiment of the present invention.

[0023] Figure 4 This is a schematic diagram of the structure of a bidirectional long short-term memory network according to an embodiment of the present invention.

[0024] Figure 5 Figure 1 shows a comparison of kernel densities for different population initialization strategies in this invention. Figure 2 shows a population initialized by random initialization, Figure 3 shows a population initialized by improved chaotic mapping, Figure 4 shows a population initialized by Latin hypercube sampling, and Figure 5 shows a population initialized by the method described in this paper.

[0025] Figure 6 Figure 1 shows the spectrum distribution of the optimal decomposition layer number components in the variational mode decomposition according to an embodiment of the present invention. Figure 2 shows the spectrum distribution of IMF1, Figure 3 shows the spectrum distribution of IMF2, Figure 4 shows the spectrum distribution of IMF3, Figure 5 shows the spectrum distribution of IMF4, Figure 6 shows the spectrum distribution of IMF5, Figure 7 shows the spectrum distribution of IMF6, Figure 8 shows the spectrum distribution of IMF7, and Figure 9 shows the spectrum distribution of IMF8. Figure 10 shows the spectrum distribution of the residual component res.

[0026] Figure 7 The above are comparison diagrams of wave prediction based on traditional hybrid models in embodiments of the present invention. Figure (a) shows the LSTM-GRU model, and Figure (b) shows the CNN-LSTM model.

[0027] Figure 8 Figure (a) shows the wave prediction effect based on the TCN series network in an embodiment of the present invention, and Figure (b) shows the TCN-Attention model and the TCN-LSTM model.

[0028] Figure 9 The figures show a comparison of wave prediction based on the TCN series network and the method in this paper in the embodiments of the present invention. Figure (a) shows the TCN-BiLSTM-Attention model, and Figure (b) corresponds to the model in this embodiment. Detailed Implementation

[0029] The present invention will be further described in detail below with reference to the accompanying drawings.

[0030] Example 1 Reference Figure 1 This embodiment of a prediction method for optimal decomposition of small sample ocean wave data includes: Hourly small sample wave characteristic data of the target sea area are obtained and standardized to obtain standardized time series data; The optimal decomposition parameters are obtained by adaptively optimizing the number of decomposition levels and the penalty factor of the variational mode decomposition algorithm based on the improved energy valley optimization algorithm. Variational mode decomposition is performed on standardized time series data based on the optimal decomposition parameters to obtain several intrinsic mode function components. Construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, and embed a self-attention mechanism in each layer of the temporal convolutional network to calculate the correlation weights between time steps in the temporal sequence and extract key temporal features; Several intrinsic mode function components are input into a multi-layer temporal convolutional network embedded with a self-attention mechanism to obtain a high-dimensional temporal feature representation; A bidirectional long short-term memory network is used to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capturing forward historical dependency features and backward trend features, and outputting the predicted value of effective wave height in the future multiple steps.

[0031] Specifically, the prediction method for optimal decomposition of small sample ocean wave data described in this embodiment is deployed in a marine environmental monitoring and forecasting system. The system includes wave buoys or bottom-mounted acoustic wave measurement devices deployed in the target sea area, as well as shore-based data processing servers and edge computing devices. The specific steps are as follows: S1. Continuous observation of the target sea area is carried out using a marine observation instrument equipped with a high-precision wave height sensor, collecting hourly wave characteristic data. The core data collection indicator is significant wave height. The timestamp of each collection and auxiliary information such as the latitude and longitude of the observation station are recorded simultaneously. In this embodiment, the hourly significant wave height data has a total sample size of 711 data points. To verify the model's predictive performance, the first 615 sets of data are used as the training set for model parameter learning; the last 96 sets of data are used as the test set to verify the multi-step prediction effect for the next 96 hours (4 days). The collection frequency of significant wave height data is set to once per hour in this embodiment to ensure the temporal continuity of the data. Finally, the min-max normalization method is used to map the data to the [0,1] interval to obtain standardized time-series data.

[0032] S2. Based on the improved energy valley optimization algorithm, the number of decomposition layers and the penalty factor of the variational mode decomposition algorithm are adaptively optimized to obtain the optimal decomposition parameters; First, we improved the random initialization of the population by using a combination of Holden sequence and Latin hypercube sampling to enhance the uniformity, ergodicity and low dissimilarity of the population in the solution space, and reduce the clustering phenomenon in the initial population. First, a Halton sequence is introduced to generate a low-discrepancy point set. A uniformly distributed point set is generated by using different prime cardinality values in different dimensions. In this optimization, the population dimension d=2, where d=1 corresponds to the decomposition level K, and d=2 corresponds to the penalty factor. Configure different prime number bases for each dimension Specifically, when d=1, the prime number 2 is chosen as the base. When choosing the prime number 3 as the base By configuring different prime cardinality values, the independence and low correlation of sequences in each dimension are ensured. A low-discrepancy set is generated according to the following Holden sequence calculation formula: , in, This is a population size index, with values ranging from 1 to 30 (consistent with the population size), used to distinguish different individual particles. This is the population dimension index, with values ranging from 1 to 2, corresponding to two optimization parameters. Let d be the base of the prime numbers arranged in prime order. To meet The largest integer is used to limit the termination condition of sequence summation, ensuring that the numerical calculation of each particle in the corresponding dimension conforms to the characteristics of a low-discrepancy sequence. For j in cardinality The k-th digit below, by pressing j... The base decomposition yields the core component that constitutes the numerical value of the Holden sequence, and its value range is... , For modulo operation, Let be the original sequence value of the j-th particle in the d-th dimension.

[0033] To enhance randomness while maintaining low dissimilarity, a Gaussian random perturbation is added, resulting in a perturbed low-dissimilarity sequence. ,in, Let be independent and identically distributed Gaussian random variables, following the... The perturbated low-discrepancy sequence is then mapped to the solution space of the optimization variables. The initial particle positions are generated using the following formula: , in, This is the lower bound for the d-th dimension variable. This is the upper bound of the d-th dimension variable. This mapping formula converts the numerical values of the low-difference sequence into initial particle positions that conform to the physical meaning of the optimization variable.

[0034] Next, LHS stratification correction is performed. For each dimension, the value range is divided into several intervals, and the number of points in each interval is checked. The calculation formula is as follows: , Where N=30 is the total population size. The normalized weight for dimension d is calculated as the ratio of the solution space range of this dimension to the maximum range of the solution space of all dimensions, and is used to balance the interval partitioning density of different dimensions. To determine the number of interval divisions, take... Ensure that each dimension is divided into at least 4 intervals. Let be the number of particles in the k-th interval of the d-th dimension. The solution space of this dimension is uniformly divided into... Divide the intervals into equal parts. Let be the number of particles in the k-th interval of dimension d. By counting the number of particles in each interval, we can determine whether the population distribution is uniform. If there exists an interval that satisfies , we can determine if the population distribution is uniform. (No particle distribution, information blind spots exist) or If there is excessive particle aggregation and insufficient diversity, then a sample from the interval with the most particles will be moved to the target interval (an interval with no particles or too few particles). The position of the moved particle will be calculated using the following formula: , in, The index of the sample to be moved. The center of the target interval, Let be a random variable that follows a continuous uniform distribution on the interval [0,1]. By introducing this random perturbation, the position of the moved particle becomes more random. Using the interval index with the most samples, multi-stage optimization gradually improves the population quality through three stages: large-amplitude perturbation, medium-amplitude perturbation, and small-amplitude perturbation. Multi-stage perturbation enhances the population's "maximum-min" property, improving uniformity. If the minimum distance of the population increases after perturbation, the new position is retained; otherwise, another individual is moved. The multi-stage optimization formula is: , in, The vector is a unit normal vector, obtained by normalizing a vector v that follows a d-dimensional standard normal distribution. It is a d-dimensional identity matrix. For the maximum disturbance amplitude, U(0,1) is a uniformly distributed random variable raised to the power of d, used to implement perturbations of different magnitudes. When U(0,1) takes a small value, it corresponds to a small perturbation, and when it takes a large value, it corresponds to a large perturbation. Through a three-stage iterative perturbation process, the uniformity of the population is gradually improved, and finally an initial population with uniform distribution and sufficient diversity is obtained.

[0035] Then, based on the initial population, the particle update mechanism of the EVO algorithm is improved through neighbor particle optimization, normal distribution perturbation, and elite preservation strategy. This solves the problems of local search instability, easy getting trapped in local optima, and easy loss of optimal solution in traditional updates, achieving a balance between local search and global exploration, and improving convergence speed and optimization accuracy. The specific update rules are as follows: Instead of updating the traditional single nearest neighbor, update the average position of three or more nearest neighbor particles. Sort the neighbors of each particle according to Euclidean distance, and select m ≥ 4 nearest neighbor particles. m is randomly selected from the uniform distribution U{1,n}. Calculate the average position of the nearest neighbor particles using the following formula: , in, Let be the sequence of neighbor indices sorted by Euclidean distance, satisfying , Let be the position vector of the j-th nearest neighbor particle after sorting.

[0036] Then, a normal distribution is used instead of a uniform distribution to generate the perturbation. By taking the average position of multiple nearest-neighbor particles as the update reference, a normal distribution is used to generate the perturbation vector. Its components follow a normal distribution N(0, 0.12), that is, a distribution with a mean of 0 and a standard deviation of 0.1. The positions of new candidate particles are generated according to the following formula: , , in, Let be the current position vector of the i-th particle. As a random perturbation vector, the perturbations of a normal distribution are more concentrated around the mean, allowing the particle to perform a more refined search around its current position.

[0037] Next, an elite retention strategy is used to update the population to avoid losing the optimal solution. The specific process is as follows: New particles generated in this iteration are merged with old particles from the previous iteration to form a temporary population of size n+m=60 (n=m=30); the fitness value of each particle in the temporary population is calculated and sorted in ascending order of fitness value (lower fitness values are better); the top 30 optimal particles after sorting are selected as the population for the next iteration, ensuring that the population quality does not decrease due to random perturbations. Simultaneously, by retaining some high-quality old particles, the optimal solution is avoided due to errors in a single update, thus accelerating the algorithm's convergence speed. The formulas for merging and filtering are as follows: , in, Let be the old particle position matrix after the t-th iteration. This is the fitness value vector corresponding to the old particles. For the newly generated particle position matrix, This is the fitness value vector corresponding to the new particle. This is the position matrix after the old and new particles are merged. This is the fitness value vector after the old and new particles are merged. To The index vector after ascending order. and Given the population position matrix and fitness value vector for the (t+1)th iteration, by selecting... The particles corresponding to the first n indices are obtained.

[0038] Finally, a fitness function is constructed to transform the VMD decomposition effect into a quantifiable numerical index, which serves as an evaluation standard for particle optimization. The optimal solution is gradually approached through iterative particle updates.

[0039] A fitness function is constructed based on energy entropy to quantify the VMD decomposition effect and provide an evaluation standard for particle optimization. First, the standardized time series data obtained in step S1 is processed according to the current particle's corresponding... The parameters are combined for VMD decomposition, yielding K intrinsic mode function (IMF) components. Each IMF component corresponds to the fluctuation characteristics of different frequency bands in the original data. The energy of each IMF component is then calculated. Where N is the length of the time series data, Given the magnitude of the j-th IMF component at time i, calculate the total energy of all IMF components. And obtain the energy percentage of each IMF component. Energy percentage is used to characterize the importance of each modal component in the original data; energy entropy is used to... To quantify the quality of modal separation, the smaller the energy entropy, the more concentrated the energy distribution of each IMF component, the more thorough the modal separation, and the higher the purity. Conversely, a larger energy entropy indicates the presence of modal aliasing or redundancy.

[0040] To balance decomposition effectiveness and computational efficiency, a decomposition level penalty term and a low-energy mode penalty term are introduced based on energy entropy. The final fitness function is as follows: , in, The energy entropy that is less than 0.1% of the threshold during each decomposition process. The number of decomposition layers, To decompose the layer number penalty coefficient, The preset low energy threshold, For low energy penalty coefficient, For the first The energy of each eigenmode function component This indicates the allowable deviation range. The fitness function transforms the quality of VMD decomposition into a quantifiable numerical indicator. Based on the principle of minimizing the fitness function value, the optimal decomposition is achieved with the fastest speed and the fewest stratifications. The threshold of 0.1% is based on the fact that when the energy entropy value or energy percentage of a single IMF component is below 0.1%, that component typically represents noise or spurious modes. The smaller the fitness value, the more pronounced the corresponding... The better the parameter combination, the higher the modal separation quality and the lower the computational cost.

[0041] S3. Perform variational mode decomposition on the standardized time series data according to the optimal decomposition parameters to obtain several intrinsic mode function components. Standardized time series data is used as the input signal. The input signal is fed into the variational mode decomposition model for signal decomposition. First, a constrained variational model for variational mode decomposition is established, which decomposes the input signal. Adaptively decomposed into An eigenmode function with finite bandwidth Each intrinsic mode function is set to the form of an amplitude-modulated frequency-modulated signal. ,in, For instantaneous amplitude envelope, Given the instantaneous phase, the center frequencies of each eigenmode function are determined by solving this constrained variational model. Given a bandwidth that minimizes the sum of the estimated bandwidths of all intrinsic mode functions and ensures that the sum of the sums of the intrinsic mode functions is strictly equal to the input signal, the mathematical expression of this constrained variational model is: , The constraints are: , To determine the optimal number of decomposition layers, Let k be the intrinsic mode function. Let be the center frequency of the k-th eigenmode function. For input signal, It is a unit impulse function. For convolution operations, To obtain the partial derivative with respect to time, It is a complex exponential modulation term. The imaginary unit is used to modulate the spectrum of the k-th modal component to a frequency centered at the center frequency. Using a baseband frequency as the reference, the bandwidth calculation for each mode has a unified benchmark. It is the square of the L2 norm.

[0042] Since the constrained variational model is difficult to solve directly, Lagrange multipliers need to be introduced. and secondary penalty factor (i.e., the optimal penalty factor obtained in step S2), which is then transformed into an unconstrained augmented Lagrange model. The optimal solution is then obtained iteratively using the alternating direction multiplier method (ADMM). The expression of the augmented Lagrange model is as follows: , in, The augmented Lagrangian function is the core objective function of unconstrained optimization. As a secondary penalty factor, For Lagrange multipliers, To constrain the L2 norm squared of the bias term, the error between the quantization decomposition result and the input signal is calculated. For Lagrange multipliers, which are inner product operations, the constraints are incorporated into the objective function, guiding the iterative process to converge in the direction that satisfies the constraints.

[0043] The iterative solution process is then executed, with the following specific steps: The first step is to initialize the parameters and set the initial values of the intrinsic mode functions. initial value of center frequency Initial values of Lagrange multipliers The iteration counter n=0, and the iteration constraint is set to relative tolerance. and absolute tolerance The maximum number of iterations is 1000. The second step is to update the intrinsic mode functions. In fixed and Under the premise of solving the augmented Lagrange function with respect to The minimum value is obtained. The update formula is obtained by transforming the time-domain problem into a frequency-domain solution using Fourier transform. Specifically, it involves performing a Fourier transform on both sides of the augmented Lagrange function and rearranging the result to obtain... The frequency domain expression is then converted back to the time domain using the inverse Fourier transform, thus achieving... Update; Step 3: Update the center frequency Based on the updated intrinsic mode functions The new center frequency is determined by calculating the average instantaneous frequency of its analytic signal, using the following formula: ,in for The Fourier transform result ensures that the center frequency always corresponds to the main frequency band of the modal component; Step 4: Update the Lagrange multipliers The gradient ascent method is used for updating, and the formula is as follows: ,in To update the step size, a value of 0.01 is set. This update strengthens the penalty for the constraints, accelerating convergence. The fifth step determines whether the iteration has converged by calculating two convergence criteria: one is the relative error index. Secondly, the absolute error index If both indicators are less than and absolute tolerance If the iteration count reaches the upper limit, stop the iteration; otherwise, set n=n+1 and return to the second step to repeat the update process.

[0044] After the iteration stops, output the final updated result. eigenmode functions , which refers to the several intrinsic mode function components obtained after standardizing time series data is subjected to optimal variational mode decomposition.

[0045] S4. Construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, and embed a self-attention mechanism in each layer of the temporal convolutional network to calculate the correlation weights between each time step in the temporal sequence and extract key temporal features. Based on the obtained optimal decomposition level To extract key temporal features, a multi-layer temporal convolutional network is constructed to match this optimal decomposition layer number. First, the depth of the temporal convolutional network is precisely configured based on this optimal decomposition layer number. Layers are used to ensure that the network structure is adapted to the inherent modal characteristics of the data, avoiding redundancy in the number of layers or insufficient feature extraction; an exponentially increasing inflation rate is set for each layer. This makes the receptive field of layer l... As network depth increases exponentially, among which, The dilation rate is set to the kernel size, thereby effectively capturing long-distance temporal dependencies without increasing the kernel parameters. Based on this dilation rate, dilated causal convolution operations are performed on the input temporal sequence to ensure that the output at the current time depends only on the input at the current and historical times, avoiding interference from future information.

[0046] The output of the dilated causal convolution operation at sequence element s is calculated according to the following formula: , in, For expansion rate, For filter size, Given the input time series sequence, For a one-dimensional convolution kernel of a temporal convolutional network, The position index of the input sequence. For convolution kernel index, This is the current output position. To optimize the number of decomposition layers, a self-attention mechanism is embedded after each dilated causal convolutional layer to construct residual blocks. Each residual block contains two dilated causal convolutional layers, followed by a weight normalization layer, a spatial dropout layer, and a ReLU activation function in sequence. The output of the second ReLU activation function is fed into the self-attention layer. This self-attention layer generates the query matrix Q and the key matrix through a linear transformation. The sum matrix V, i.e. ,in, For input features, , The weight matrix is learnable, and the correlation weights between time steps in the time series are calculated based on the scaled dot product attention mechanism. The output is represented as follows: , in, Using a scaling factor, the softmax function normalizes the attention scores into a probability distribution. This mechanism enhances sensitivity to anomalous time-series nodes and core trend segments by dynamically calculating attention weights. When the input and output dimensions of the residual block do not match, a scaling factor is used. Convolution performs a linear projection on the input features to achieve dimension matching. Then, the output of the self-attention layer is element-wise added to the dimension-matched input features to form a residual connection, i.e.: , in, For the first Key temporal features of the output of each residual block This is the operation function for the self-attention mechanism. This is the operation function for temporal convolutional networks. This function includes a series of operations such as dilated causal convolution, weight normalization, spatial dropout, and ReLU activation. When the dimensions are the same... As a unit mapping, this residual connection effectively alleviates the gradient vanishing problem in deep networks and enhances training stability; the above residual blocks are stacked layer by layer until the... The layer ultimately outputs a key temporal feature representation that integrates local temporal features and global dependencies. This representation has the ability to accurately capture long-distance non-local dependencies and adaptively weight key information, providing a high-quality coarse-grained feature foundation for subsequent bidirectional temporal modeling of bidirectional long short-term memory networks.

[0047] S5. Input several intrinsic mode function components into a multi-layer temporal convolutional network with a self-attention mechanism to obtain a high-dimensional temporal feature representation; like Figure 3 As shown, the intrinsic mode function components obtained in step S3 are input as multi-channel input data into the multi-layer temporal convolutional network constructed in step S4. Specifically, all intrinsic mode function components are aligned and stacked along the time dimension to form a three-dimensional input feature tensor. This input feature tensor first enters the first residual block, which contains two dilated causal convolutional layers. The first dilated causal convolutional layer dilates at a rate of... The input features are convolved with a kernel size of k. After the operation, the data passes through a weight normalization layer to stabilize the data distribution, a spatial dropout layer, and a ReLU activation function to introduce non-linearity. Then, it enters a second dilated causal convolutional layer, which further extracts features with the same dilation rate. This layer also undergoes normalization, dropout, and ReLU activation. The output of the second ReLU activation function is fed into a self-attention layer, which calculates the query matrix Q and the key matrix. Scaling dot product attention on the sum matrix V outputs weighted features. Then, it checks if the input and output dimensions match. If the input and output dimensions do not match, then... Convolution is obtained by linearly projecting the original input. If the dimensions match, then directly let ; Output of the self-attention layer Input after projection The output of the first-level residual block is obtained by adding the elements together. The output is used as the input to the second-layer residual block, and the above process is repeated, but the second layer uses the expansion rate. And so on, the $l$th layer adopts the expansion ratio Expanding the sensory field layer by layer; stacking layer by layer until the first... Each layer maintains gradient flow through residual connections, ultimately outputting a high-dimensional temporal feature representation. This representation integrates multi-scale local features of each intrinsic mode function component, long-distance global dependencies, and weighted information of key time steps. It possesses a deep abstraction capability for complex temporal patterns and high sensitivity to anomalous fluctuations, serving as input for subsequent bidirectional long short-term memory networks to perform refined bidirectional temporal modeling.

[0048] S6. Use a bidirectional long short-term memory network to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capture forward historical dependency features and backward trend features, and output the predicted value of effective wave height in the future multiple steps.

[0049] like Figure 4 As shown, the high-dimensional temporal feature representation obtained in step S5 is input into a bidirectional long short-term memory network for bidirectional temporal dependency modeling. This network consists of a forward long short-term memory sub-network and a backward long short-term memory sub-network connected in parallel. The two sub-networks have symmetrical structures but process in opposite directions. First, the high-dimensional temporal feature representation is simultaneously input into both the forward and backward long short-term memory sub-networks. The forward long short-term memory sub-network processes the feature vectors of each time step sequentially from left to right according to the time order. It selectively remembers and forgets information through gating mechanisms of input gate, forget gate, and output gate, extracting forward historical temporal dependency features step by step and outputting the forward hidden state sequence. , For time steps, including the forget gate Controlling the degree to which historical information is retained, input gate and candidate states Together they determine the storage of new information, cell state Update memory content, output gate and final hidden state Output the feature representation at the current time step, where, These are the weight matrices for the forget gate, input gate, candidate state, and output gate, respectively, used to perform a linear transformation on the input features and the hidden state from the previous time step. These are the bias terms for the corresponding gates, used to adjust the offset of the activation function. For the Gate of Oblivion For input gate, In cellular state, It is a candidate state.

[0050] Meanwhile, the backward long short-term memory subnetwork processes the same feature vectors from right to left in reverse time order, extracts backward future trend-dependent features step by step through the same gating mechanism, and outputs the backward hidden state sequence. This subnetwork captures the impact of future contextual information on the current moment, starting from the end of the sequence; at each time step t, the forward hidden state is... With backward hidden state Vector concatenation and fusion are performed to obtain bidirectional fused features. ,in This represents a vector concatenation operation, where the bidirectional fused feature simultaneously contains historical dependency information and future trend information. The bidirectional fused feature sequences at each time step are input into the fully connected layer, and processed by the weight matrix. and bias terms After performing a linear transformation and then applying an activation function mapping, the final output is the effective wave height prediction value for the next 96 hours (i.e., 4 days). The prediction process can employ recursive multi-step prediction or sequence-to-sequence direct prediction strategies. The bidirectional long short-term memory network, based on the coarse-grained key features provided by the temporal convolutional network, further mines bidirectional short-term dynamics and fine-grained temporal fluctuations, achieving feature complementarity and significantly improving the accuracy of extreme points and wave height abrupt changes.

[0051] Example 2 This embodiment provides a comparative experiment of a prediction method for optimal decomposition of small sample ocean wave data; This embodiment achieves small sample effective wave height prediction by using an improved EVO algorithm, a fitness function of VMD decomposition, and a deep fusion of TCN-Attention network and BiLSTM. Its core advantages are: 1. By combining the improved EVO algorithm with VMD, the optimal decomposition of small sample wave height historical data is achieved, enhancing the model's ability to process data in a refined manner. 2. The TCN depth is adaptively configured based on the optimal decomposition layer, a self-attention mechanism is embedded to focus on key features, and the exponential expansion rate is used to efficiently capture long-term temporal dependencies. 3. BiLSTM, as the end-refinement module, further mines bidirectional short-term dynamics to achieve feature complementarity. This architecture combines adaptive decomposition, accurate feature extraction, and efficient long-range dependency capture capabilities. When processing complex time-series data with multiple periods, multiple modalities, and noise, it can simultaneously take into account macro trends, key events, and micro fluctuations, ultimately improving the accuracy and robustness of predictions. It provides a high-precision and strong generalization solution for small-sample wave height time-series prediction.

[0052] like Figure 5As shown, to avoid clustering of the initial population during the decomposition of effective wave height data, a method combining Halton sequences and Latin hypercube sampling (LHS) is used to optimize the initial population, and the particle update strategy is improved, thereby improving the performance of the Energy Valley (EVO) optimization algorithm. Compared with random initialization, improved chaotic mapping, and LHS population initialization methods, this method performs best in terms of uniformity, ergodicity, and low variability in the solution space. Compared with the unimproved Energy Valley (EVO) algorithm, experimental results on four classic unconstrained optimization test functions, including the Sphere function and the Schwefel function, show that the improved algorithm minimizes both the mean and standard deviation of the final fitness, and significantly reduces the number of convergences. Therefore, the improved EVO optimization algorithm significantly enhances the search capability. Through uniform sampling population initialization and optimized particle update strategy, it effectively overcomes various complex constraints such as single-peak / multi-peak, linear / nonlinear, and noisy / noisy conditions, demonstrating strong robustness and adaptability. Figure 5 (a) To randomly initialize the population, Figure 5 (b) To improve the initialization population for chaotic mapping, Figure 5 (c) Initialize the population for Latin hypercube sampling. Figure 5 (d) Initialize the population using the method described in this paper.

[0053] To achieve optimal decomposition of effective wave height data, this paper proposes an improved energy paleo-optimization algorithm that combines the energy entropy of each decomposition layer component in Variational Mode Decomposition (VMD) to construct a corresponding fitness function, thereby realizing the optimal center frequency decomposition and extraction of wave height data. Analysis of the frequency distribution of each component's spectrum shows that the optimal decomposition achieves a reasonable energy distribution for each mode, clear frequency band division, and no mode overlap, ensuring accurate differentiation of each mode component. This method exhibits significant advantages in both decomposition accuracy and computational efficiency, with a decomposition time of only 9.04 seconds, significantly improving decomposition timeliness and effectively balancing decomposition effect and computational efficiency. While ensuring decomposition quality, it achieves the optimal selection of the number of VMD decomposition layers, avoiding computational redundancy and overfitting risks caused by blindly increasing the number of decomposition layers.

[0054] like Figure 6As shown, variational mode decomposition was performed on the effective wave height data based on optimal parameters, yielding eight intrinsic mode function components (IMF1-IMF8) and one residual component (res). Spectral analysis of each component shows that: IMF1 has a center frequency of approximately 0.45 Hz, covering high-frequency fluctuations and reflecting short-period random fluctuations of the waves; IMF2-IMF3 have center frequencies of approximately 0.28 Hz and 0.15 Hz, respectively, covering mid-to-high frequency components and reflecting swell-like characteristics of the waves; IMF4-IMF6 have center frequencies of approximately 0.08 Hz, 0.04 Hz, and 0.02 Hz, respectively, covering mid-frequency components and reflecting the medium-term evolution trend of the waves; IMF7-IMF8 have center frequencies of approximately 0.01 Hz and 0.005 Hz, respectively, covering low-frequency components and reflecting the long-term trend changes of the waves; the residual component (res) has an energy ratio of less than 1% and its spectrum is approximately DC, indicating that the effective information in the original signal has been fully extracted.

[0055] Combining the component extraction frequency band map of the optimal decomposition of VMD, where Figure 6 (a) to (h) correspond to the spectral distributions of IMF1 to IMF8, respectively. Figure 6 (i) The spectral distribution of the corresponding residual component res can be observed intuitively: the spectral peaks of IMF1 to IMF8 move to lower frequencies in sequence, the spectral waveforms of each component are clear, the bandwidth is limited and they are separated from each other, and there is no obvious spectral overlap area; the spectral energy of the residual component res is extremely low and the distribution is flat, indicating that there is very little residual information after decomposition, which verifies the sufficiency of decomposition.

[0056] like Figure 7 , Figure 8 , Figure 9 As shown, to verify the superiority of the prediction method of this invention in predicting small sample wave time series data, it was compared and analyzed with five mainstream hybrid model structures, including... Figure 7 (a) LSTM-GRU, Figure 7 (b) CNN-LSTM, Figure 8 (a) TCN-Attention Figure 8 (b) TCN-LSTM and Figure 9 (a) TCN-BiLSTM-Attention, with evaluation metrics including root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), and coefficient of determination (R²).

[0057] Figure 9(b) shows the implementation effect of this embodiment. Addressing the problems of rapid long-term prediction accuracy decay, weaker detail capture ability compared to short-term models, and insufficient extreme point prediction ability in most current models, this embodiment constructs a multi-layer TCN-Attention network architecture adapted to the optimal decomposition layer. This rapidly expands the receptive field of small-sample wave data, strengthens the correlation with key information such as data anomalies and confidence fragments, and finally utilizes BiLSTM to further mine bidirectional short-term dynamics, achieving feature complementarity and effectively improving the accuracy and robustness of wave prediction, especially the ability to predict extreme points. Experimental results show that among the five comparative models, the TCN-BiLSTM-Attention network architecture exhibits the highest good of fit, with an R² value of 0.7028. Its MAE, MAPE, MSE, and RMSE indices are all superior to LSTM-GRU, CNN-LSTM, TCN-Attention, and TCN-LSTM models.

[0058] As can be seen from the comparison chart of the predicted data. Figure 7 (a) and Figure 7 (b) The predicted curve can reflect the overall trend of the real wave well, but at the peak and trough stages, the fluctuation range of the predicted curve is significantly lower than that of the real curve, showing a certain smoothing characteristic. Overall, the trend is accurate but the details are weak. Figure 8 (a) and Figure 8 (b) shows strong trend consistency in its predicted curve, which is basically synchronized with the fluctuation rhythm of the actual data and can capture the main peak and trough changes of the waves. However, the fitting accuracy of the key peaks is still insufficient. Figure 9 The predicted curve in (a) is basically consistent with the actual curve in terms of its rising and falling trends and periodic changes, but its ability to capture key peaks is relatively weak. Figure 9 (b) The overall trend of the predicted data obtained by the method proposed in this paper is highly consistent with the real data. It can accurately capture the fluctuation pattern of the main peak and the low-wave high range in the long-cycle fluctuation. The two are almost completely overlapping. Among various methods, the prediction curve of this application has the highest matching degree with the real data peak. The effectiveness of this embodiment is due to the following: This application effectively configures the depth of the TCN-Self-Attention network according to the optimal decomposition layer number, ensuring that the network structure is compatible with the inherent modal features of the data, avoiding redundant calculations caused by too many layers or insufficient feature extraction caused by too few layers. Each layer of the network embeds a self-attention mechanism, which strengthens the information of key tasks such as abnormal time-series nodes and core trend segments by dynamically calculating the correlation weights of each time step in the time sequence, while filtering redundant noise and improving the pertinence and effectiveness of feature expression.

[0059] In addition, the TCN module adopts an expansion rate setting that grows exponentially with network depth. It does not require increasing parameters such as convolution kernel size or number of channels. It can quickly expand the receptive field simply by increasing the number of network layers. This avoids information blind spots and effectively solves the technical problems of weak detail performance and insufficient extreme value prediction ability in small sample scenarios of existing technologies without consuming additional computing resources.

[0060] Compared to the TCN-BiLSTM-Attention model, the method of this invention achieves significant improvements in all evaluation metrics: MAE is improved by 30.4%, MAPE by 33.8%, MSE by 44.2%, RMSE by 25.3%, and R² by 18.6%. This indicates that the present invention effectively improves prediction accuracy by improving the energy valley optimization algorithm and the optimal decomposition of VMD, and by deeply fusing the TCN-Self-Attention network with BiLSTM.

[0061] Example 3 The difference between this embodiment and Embodiment 1 is that this embodiment provides a prediction system for optimal decomposition of small sample ocean wave data, including: The data acquisition module is configured to: acquire hourly small sample wave feature data of the target sea area, and obtain standardized time series data through missing value processing, outlier removal and standardization operations; The optimization module is configured to: adaptively optimize the number of decomposition layers and the penalty factor of the variational mode decomposition algorithm based on the improved energy valley optimization algorithm to obtain the optimal decomposition parameters; The transformation module is configured to perform variational mode decomposition on the standardized time series data according to the optimal decomposition parameters to obtain several intrinsic mode function components. The model module is configured to: construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, embed a self-attention mechanism in each layer of the temporal convolutional network, calculate the correlation weights between each time step in the temporal sequence, and extract key temporal features; The feature extraction module is configured to input several intrinsic mode function components into a multi-layer temporal convolutional network embedded with a self-attention mechanism to obtain a high-dimensional temporal feature representation. The output module is configured to: use a bidirectional long short-term memory network to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capture forward historical dependency features and backward trend features, and output the predicted value of the effective wave height in the future multiple steps.

[0062] The above are all preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Therefore, all equivalent changes made in accordance with the structure, shape and principle of the present invention should be covered within the scope of protection of the present invention.

Claims

1. A prediction method for optimal decomposition of small sample data of ocean waves, characterized in that, include: Hourly small sample wave characteristic data of the target sea area are obtained and standardized to obtain standardized time series data; The optimal decomposition parameters are obtained by adaptively optimizing the number of decomposition levels and the penalty factor of the variational mode decomposition algorithm based on the improved energy valley optimization algorithm. Variational mode decomposition is performed on standardized time series data based on the optimal decomposition parameters to obtain several intrinsic mode function components. Construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, and embed a self-attention mechanism in each layer of the temporal convolutional network to calculate the correlation weights between time steps in the temporal sequence and extract key temporal features; Several intrinsic mode function components are input into a multi-layer temporal convolutional network embedded with a self-attention mechanism to obtain a high-dimensional temporal feature representation; A bidirectional long short-term memory network is used to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capturing forward historical dependency features and backward trend features, and outputting the predicted value of effective wave height in the future multiple steps.

2. The prediction method for optimal decomposition of small sample ocean wave data according to claim 1, characterized in that, The adaptive optimization of the decomposition level and penalty factor of the variational mode decomposition algorithm includes generating an initial population for the energy valley optimization algorithm using an initialization strategy that combines Holden sequences and Latin hypercube sampling. Each particle in the initial population represents a set of variational mode decomposition parameter combinations containing the decomposition level and penalty factor. Holden low-discrepancy sequences are generated based on different prime cardinality, and randomness is enhanced by adding Gaussian random perturbations. The Holden low-discrepancy sequences are mapped to the solution space of the variational mode decomposition parameters to obtain the initial particle positions. The solution space is divided into several equally divided intervals for each dimension, and the particle distribution density in each interval is detected. Particles in intervals with particle density exceeding a preset threshold are transferred to sparse intervals. The formula for calculating the value of the j-th particle in the d-th dimension of the Holden sequence is as follows: ， in, For population size index, For population dimension indexing, Let d be the base of the prime numbers arranged in prime order. To meet The largest integer, For j in cardinality The kth digit below, This is a modulo operation.

3. The prediction method for optimal decomposition of small sample ocean wave data according to claim 2, characterized in that, The adaptive optimization of the number of decomposition levels and penalty factors in the variational mode decomposition algorithm further includes constructing a fitness function based on energy entropy. This fitness function quantifies the mode separation quality of the variational mode decomposition. Variational mode decomposition is performed on standardized time-series data to obtain K intrinsic mode function components. The energy proportion and energy entropy value of each intrinsic mode function component are calculated. A fitness function containing the energy entropy value, a decomposition level penalty term, and a low-energy mode penalty term is constructed. The fitness value of each particle is calculated based on the fitness function. The particle positions are iteratively updated based on an elite retention strategy, a neighbor particle average position update mechanism, and a normal distribution perturbation mechanism until the convergence condition is met, yielding the optimal decomposition parameters. The expression for the fitness function is: ， in, For energy entropy, The number of decomposition layers, To decompose the layer number penalty coefficient, The preset low energy threshold, For low energy penalty coefficient, For the first The energy of each eigenmode function component.

4. The prediction method for optimal decomposition of small sample ocean wave data according to claim 3, characterized in that, The step of performing variational mode decomposition on the standardized time series data based on the optimal decomposition parameters includes establishing a constrained variational model for variational mode decomposition based on the optimal decomposition level K and the penalty factor in the optimal decomposition parameters, and using the standardized time series data as the input signal. And set each intrinsic mode function. For amplitude-modulated and frequency-modulated signals, the center frequencies of each eigenmode function are determined by solving the constrained variational model. The constraint variational model expression is given by a constraint that minimizes the sum of the estimated bandwidths of all intrinsic mode functions, and the sum of all intrinsic mode functions equals the input signal. ， in, To determine the optimal number of decomposition layers, Let k be the intrinsic mode function. Let be the center frequency of the k-th eigenmode function. For input signal, It is a unit impulse function. For convolution operations, To obtain the partial derivative with respect to time, For constraint operators, It is a complex exponential modulation term. It is the imaginary unit.

5. The prediction method for optimal decomposition of small sample ocean wave data according to claim 4, characterized in that, The variational mode decomposition of standardized time series data based on optimal decomposition parameters also includes incorporating Lagrange multipliers. and secondary penalty factor The constrained variational model is transformed into an unconstrained augmented Lagrangian model, where the quadratic penalty factor... As a penalty factor in the optimal decomposition parameters, the eigenmode functions are iteratively updated using the alternating direction multiplier method. Center frequency and Lagrange multipliers The optimal solutions for each eigenmode function are obtained, and then the components of the several eigenmode functions are obtained. The expression of the augmented Lagrange model is: ， in, To augment the Lagrange function, As a secondary penalty factor, For Lagrange multipliers, This is for inner product operations.

6. The prediction method for optimal decomposition of small sample ocean wave data according to claim 1, characterized in that, The construction of a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters includes configuring the number of layers in the temporal convolutional network according to the optimal number of decomposition layers, and setting an exponentially increasing dilation rate for each layer of the temporal convolutional network. ; Based on the dilation rate, perform dilated causal convolution operation on the input time series sequence, so that the first... The receptive field of each layer expands exponentially with network depth to capture long-range temporal dependencies without increasing the convolution kernel parameters. The output of the dilated causal convolution operation at sequence element s is calculated according to the following formula: ， in, For expansion rate, For filter size, Given the input time series sequence, For a one-dimensional convolution kernel in a temporal convolutional network, The position index of the input sequence. For convolution kernel index, This is the current output position. This represents the optimal number of decomposition layers.

7. The prediction method for optimal decomposition of small sample ocean wave data according to claim 6, characterized in that, The extraction of key temporal features includes embedding a self-attention layer after the dilated causal convolutional layer, performing global dependency modeling on the local temporal features output by the dilated causal convolutional layer, adding the output of the self-attention layer to the input of the dilated causal convolutional layer using residual connections, generating a query matrix, a key matrix, and a value matrix through linear transformation, calculating association weights based on a scaled dot product attention mechanism, and performing a weighted summation of the value matrix according to the association weights to obtain a key temporal feature representation that integrates global association information. The key temporal feature representation enhances the sensitivity to abnormal nodes and core trend segments in the time series.

8. A prediction method for optimal decomposition of small sample ocean wave data according to claim 7, characterized in that, The method for obtaining high-dimensional temporal feature representation includes inputting several intrinsic mode function components as multi-channel input data into a multi-layer temporal convolutional network, and stacking all intrinsic mode function components according to the time dimension to form an input feature tensor. Temporal features are extracted layer by layer through the residual blocks of the multi-layer temporal convolutional network. Each residual block includes two dilated causal convolutional layers. After each dilated causal convolutional layer, a normalization layer, a spatial dropout layer, and a ReLU activation layer are connected in sequence. The output of the ReLU activation layer is connected to the self-attention layer. In the residual block, when the input and output dimensions do not match, the input features are linearly projected to achieve dimension matching. Then, the output of the self-attention layer is added element-wise to the dimension-matched input features. The residual blocks are stacked layer by layer until the last layer, and the high-dimensional temporal feature representation is output.

9. A prediction method for optimal decomposition of small sample ocean wave data according to claim 1, characterized in that, The output of the future multi-step effective wave height prediction value includes constructing a bidirectional long short-term memory network composed of a forward long short-term memory subnetwork and a backward long short-term memory subnetwork, and simultaneously inputting the high-dimensional temporal feature representation into the forward and backward long short-term memory subnetworks; The forward long short-term memory subnetwork traverses the high-dimensional temporal feature representation from left to right, captures the forward historical temporal dependency features of ocean wave data, and outputs the forward hidden state sequence. The backward long short-term memory subnetwork traverses the high-dimensional temporal feature representation from right to left, captures the backward future trend dependency features of ocean wave data, and outputs the backward hidden state sequence. The forward and backward hidden states at the corresponding time steps are concatenated and fused to obtain a fused bidirectional temporally dependent feature sequence, which is then input into a fully connected layer. Through linear transformation and activation mapping, the effective wave height prediction values for future multiple steps are obtained.

10. A prediction system for optimal decomposition of small sample ocean wave data, executed according to the method of claim 1, characterized in that, include: The data acquisition module is configured to: acquire hourly small sample wave feature data of the target sea area, and obtain standardized time series data through missing value processing, outlier removal and standardization operations; The optimization module is configured to: adaptively optimize the number of decomposition layers and the penalty factor of the variational mode decomposition algorithm based on the improved energy valley optimization algorithm to obtain the optimal decomposition parameters; The transformation module is configured to perform variational mode decomposition on the standardized time series data according to the optimal decomposition parameters to obtain several intrinsic mode function components. The model module is configured to: construct a multi-layer temporal convolutional network that matches the optimal number of decomposition layers in the optimal decomposition parameters, embed a self-attention mechanism in each layer of the temporal convolutional network, calculate the correlation weights between each time step in the temporal sequence, and extract key temporal features; The feature extraction module is configured to input several intrinsic mode function components into a multi-layer temporal convolutional network embedded with a self-attention mechanism to obtain a high-dimensional temporal feature representation. The output module is configured to: use a bidirectional long short-term memory network to perform bidirectional temporal dependency modeling on the high-dimensional temporal feature representation, capture forward historical dependency features and backward trend features, and output the predicted value of the effective wave height in the future multiple steps.