Renewable energy power prediction method and device based on time sequence discrete marking

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By discretizing the multivariate time series in the renewable energy power prediction model into discrete time series labels and combining them with a pre-trained language model for autoregressive generation, the prediction error problem of existing models under extreme conditions is solved, and a more stable long-term prediction effect is achieved.

CN122292309APending Publication Date: 2026-06-26SOUTHEAST UNIV +2

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SOUTHEAST UNIV
Filing Date: 2026-03-25
Publication Date: 2026-06-26

Application Information

Patent Timeline

25 Mar 2026

Application

26 Jun 2026

Publication

CN122292309A

IPC: H02J3/00; G06F18/213; G06F18/214; G06F18/15; G06F18/27; G06N3/0455; G06N3/08; H02J103/50; G06F123/02

AI Tagging

Technology Topics

Semantic contextConditional autoregressive

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

An intelligent customer service system and a computer readable medium
CN122287851ASemantic contextEngineering
Language decoding method and device based on electroencephalogram signals and electronic equipment
CN121905152BNeural architectures Speech recognitionSyllableDecoding methods
File digitization flow privacy protection method and system based on zero trust architecture
CN121413022BAchieving Adaptive Matchingimprove securitySemantic contextPrivacy protection
Automobile sensitive data detection system based on multi-modal semantic context analysis
CN122309742AData pack Reliability engineering
Semantic contexts for language processing neural networks
US20260178817A1Semantic analysisDocument analysisProcessing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing deep learning models lack the ability to explicitly model the potential state structure of power sequences under different operating stages in renewable energy power prediction, resulting in severe accumulation of prediction errors and trajectory drift under extreme weather and abrupt operating conditions.

Method used

Multivariate time series are discretized into discrete time series labels. A conditional autoregressive generative model is constructed by combining time series labeling mapping with a pre-trained language model. An exponential moving average is used to maintain the codeword frequency state and is adaptively adjusted. Interpolation updates are performed by combining latent spatial anchors to generate autoregressive predictions.

Benefits of technology

It significantly reduces error accumulation in multi-step rolling prediction, improves the stability and goodness of fit of 24-hour equal-length time domain prediction, reduces training computing costs, and maintains the language generation prior and statistical stability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122292309A_ABST

Patent Text Reader

Abstract

This invention discloses a method and apparatus for renewable energy power prediction based on time-series discrete labeling, belonging to the field of renewable energy power prediction; it includes: acquiring historical observation multivariate time series of target power plants; constructing and training a time-series labeling mapping, dividing the multivariate time series into blocks and encoding them to obtain a continuous latent representation; normalizing the continuous latent representation and the introduced learnable codebook respectively, and allocating discrete indices using nearest neighbor search to obtain a discrete time-series label matrix; expanding the discrete time-series labels output by the trained time-series labeling mapping and incorporating them into the unified vocabulary of a pre-trained language model to construct a conditional autoregressive generative model, and performing fine-tuning training while freezing the backbone network parameters of the pre-trained language model; given the environmental semantic context and the historical time-series label sequence obtained through the time-series labeling mapping, generating a future discrete label sequence based on the trained model in an autoregressive manner, and inputting the generated future discrete label sequence into the decoder of the time-series labeling mapping to reconstruct a prediction sequence in the continuous domain.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of renewable energy power prediction technology, specifically relating to a renewable energy power prediction method and apparatus based on time-series discrete labeling. Background Technology

[0002] With the increasing penetration rate of wind and solar power, electricity output driven by meteorological conditions exhibits strong randomness and rapid fluctuations, posing a severe challenge to power balance and the safe and economical operation of modern power systems. Short-term and ultra-short-term forecasts directly serve reserve allocation, rolling dispatch, market clearing, and energy storage control, and their reliability during critical periods such as ramp-up and sudden changes has a significant impact on operational risks and dispatch costs.

[0003] Therefore, renewable energy power forecasting methods have gradually evolved from a statistically driven paradigm to a data-driven framework. Statistical models such as the Autoregressive Integral Moving Average (ARIMA) and Seasonal Autoregressive Integral Moving Average (SARIMA) models have clear structures and are computationally efficient. They are often combined with wavelet decomposition or variational mode decomposition to alleviate nonstationarity, but are limited by linear and weakly stationary assumptions, making them difficult to cope with strong nonlinearity and sudden disturbances under complex meteorological conditions. Machine learning methods such as Support Vector Regression (SVR), Random Forest, and Extreme Gradient Boosting (XGBoost) enhance multivariate modeling capabilities through nonlinear function approximation, but their generalization ability remains limited in scenarios involving distribution shifts such as cross-seasonal, cross-site, and extreme weather. In recent years, deep learning methods have been widely used due to their end-to-end representation learning capabilities: Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs) excel at sequence dependency modeling, Temporal Convolutional Networks (TCNs) are outstanding in capturing short-term fluctuations and stabilizing training, and Transformers enhance long-range dependency and multivariate interaction modeling capabilities through self-attention mechanisms, making them suitable for multi-step prediction and long-sequence scenarios. However, most existing deep learning models construct the problem as a continuous numerical regression, and exogenous factors are incorporated into the network through channel splicing, which causes the model to only learn implicit point-to-point mappings and lacks the ability to explicitly model the potential state structure of power sequences under different operating stages.

[0004] Large pre-trained language models (LLMs) accumulate powerful sequence generation and conditional reasoning capabilities through large-scale pre-training. Their inherent next-token autoregressive mechanism naturally possesses the ability to structurally model symbolic sequences. Mapping power sequences to discrete symbols theoretically provides a promising new approach to solving the aforementioned problems. This potential has driven the migration and application of LLMs to time series forecasting tasks. However, most existing research still follows the path of continuous value prediction, mapping continuous power sequences to the representation space of language models and optimizing them under continuous value supervision objectives such as mean absolute error and mean squared error. For example, the ultra-short-term photovoltaic power forecasting method based on GPT verifies the feasibility of applying large models to minute-level forecasts through linear adaptation and regression output; it introduces a cross-modal semantic alignment mechanism to alleviate the difference between numerical time series and language representations; Time-LLM maps time series to the input of the LLM through a reprogramming layer and generates continuous prediction results through a projection layer. In summary, the above methods use LLM as a combination of feature extractor and regression head, circumventing its core autoregressive modeling mechanism and failing to form structured symbolic representations corresponding to different operating states. The fundamental reason is that continuous power series lacks a discrete representation that can be stably aligned with the LLM unified vocabulary, making it difficult for the next token autoregressive mechanism to be directly applied to power forecasting tasks. Summary of the Invention

[0005] To address the shortcomings of existing technologies, the present invention aims to provide a method and apparatus for predicting renewable energy power based on time-series discrete labels, thereby solving the problems in the prior art.

[0006] The objective of this invention can be achieved through the following technical solutions: A renewable energy power prediction method based on time-series discrete labeling includes the following steps: Acquire historical observation multivariate time series of the target station, wherein the multivariate time series is formed by splicing power output and meteorological driving variables according to dimensions at each time step; Constructing and training a temporal labeling map includes: dividing a multivariate time series into blocks and encoding them to obtain a continuous latent representation; normalizing the continuous latent representation and the introduced learnable codebook respectively; and using nearest neighbor search to allocate discrete indices to obtain a discrete temporal labeling matrix. The discrete temporal tags output by the temporal tagging mapping obtained from training are expanded and incorporated into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model. The conditional autoregressive generative model is then fine-tuned by freezing the backbone network parameters of the pre-trained language model. Given the environmental semantic context and the historical time-series label sequence obtained through the time-series labeling mapping, a future discrete label sequence is generated based on the trained conditional autoregressive generative model in an autoregressive manner. The generated future discrete label sequence is then input into the decoder of the time-series labeling mapping to reconstruct the prediction sequence in the continuous domain.

[0007] Furthermore, during the training of the temporal tokenization mapping, utilization-aware dynamic codebook maintenance is performed, including: The empirical frequency of the codewords is statistically analyzed, and an exponential moving average is used to maintain the long-term frequency state of the codewords. ; Calculate the similarity score between the codeword and the normalized continuous latent representation and construct a sampling distribution. Based on the sampling distribution, obtain the anchor points in the latent space. ; According to the long-term frequency state Adaptive adjustment of modulation coefficient and using the modulation coefficient Combined with the anchor point Perform interpolation updates on low-frequency codewords.

[0008] Furthermore, an exponential moving average is used to maintain the long-term frequency state of the codeword. The formula is: The formula for the exponential moving average of the long-term frequency state is: in, For smoothing coefficients, For typing Experience frequency; The adaptively adjusted modulation coefficient satisfy: in, Where is the intensity constant. For numerically stable terms, The size of the codebook; The process of performing interpolation updates on low-frequency codewords is as follows: in, To represent the first in the codebook The embedding vector corresponding to each codeword.

[0009] Furthermore, the process of dividing and encoding multivariate time series data into blocks to obtain continuous latent representations includes: A reversible instance normalization process is applied to a multivariate time series to obtain a normalized sequence. The normalized sequence is then divided along the time dimension into segments of length [missing information]. Non-overlapping time segments; Apply a linear projection to each time segment to obtain the segment embedding tensor; The fragment is embedded into a tensor input of a Transformer encoder containing a multi-layer multi-head self-attention network. Attention is calculated along the fragment dimension, and the continuous latent representation is output.

[0010] Furthermore, the temporal tokenization mapping employs a training objective function that includes a reconstruction term, a codebook loss term, and a commitment loss term. Conduct training: in, The original sequence, This is the reconstructed sequence obtained after reconstruction by the temporal tokenization mapping decoder and inverse normalization processing. For continuous latent vectors, For quantized codeword vectors, To stop the gradient operator, M represents the commitment coefficient, M represents the variable dimension corresponding to each time segment, and N represents the number of time segments after partitioning.

[0011] Furthermore, the steps of expanding and incorporating discrete time-series tags into a unified vocabulary include: The base vocabulary of the pre-trained model Add boundary markers later , and time sequence symbol set Obtain the expanded vocabulary ; The statistical center based on the original word embedding parameters initializes the parameters of the newly added time-series symbols in the input embedding matrix and the output projection matrix, and adds perturbations.

[0012] Furthermore, the fine-tuning training process includes: Apply row-level gradient masking matrices to the input embedding matrix and output projection matrix of the pre-trained language model. This makes the effective gradient for training fine-tuning possible. satisfy: in, For the original gradient, The mask matrix represents the Hadamard product. Only in the corresponding newly added timing symbol set The element at the row position has a value of 1, and the elements at the other positions have a value of 0; Constructing heterogeneous causal input sequences: ;in, This represents the environmental semantic context composed of historical windows, forecast windows, day / night cycles, weather forecasts, and constraints. For historical time-series labeling sequences, To label the sequence for future targets; A label masking mechanism is employed, taking heterogeneous causal input sequences as input, and only targeting the time series sequence corresponding to the future target. target fragment location set Calculate the generation loss: in, Let i be the i-th symbol of the heterogeneous causal input sequence U. Let D be its prefix, and let D be the distribution of the training data.

[0013] Furthermore, when generating the future discrete label sequence in an autoregressive manner, a constrained decoding is applied to the generation process to limit the set of generated symbols: The unnormalized score vector output at each decoding step Applying constrained operators Then perform Softmax normalization; the constrained operator satisfies: in, The length of the time stamp corresponding to the prediction segment. This indicates that it will not belong to the candidate set. The score corresponding to the symbol is set to .

[0014] A renewable energy power prediction device based on time-series discrete labels performs the above method, including: A multivariate sequence construction module is used to obtain the historical observation multivariate time series of the target station. The multivariate time series is formed by splicing power output and meteorological driving variables according to dimensions at each time step. The temporal tagger construction module is used to construct and train the temporal tagging mapping, including: dividing the multivariate time series into blocks and encoding them to obtain a continuous latent representation; normalizing the continuous latent representation and the introduced learnable codebook respectively; and using nearest neighbor search to allocate discrete indices to obtain a discrete temporal tag matrix. The unified vocabulary autoregressive training module is used to expand and incorporate the discrete temporal tags output by the temporal tagging mapping obtained from the training into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model. Under the premise of freezing the backbone network parameters of the pre-trained language model, the conditional autoregressive generative model is fine-tuned. The constrained reasoning and reconstruction module is used to generate future discrete label sequences based on the trained conditional autoregressive generative model, given the environmental semantic context and the historical temporal label sequence obtained through the temporal labeling mapping, and input the generated future discrete label sequence into the decoder of the temporal labeling mapping to reconstruct the prediction sequence in the continuous domain.

[0015] A computer storage medium storing a readable program that, when executed, instructs a computing device to perform renewable energy power prediction based on time-series discrete tags as described above.

[0016] The beneficial effects of this invention are: 1. In the training process of discretizing continuous sequences, this invention introduces an exponential moving average to maintain the long-term frequency state of the codewords. And adaptively adjust the modulation coefficient according to this state. This invention combines potential spatial anchors to perform interpolation relocation on low-frequency or inactive codewords; it explicitly tracks the usage status of codewords and forcibly wakes up dead codes, enabling the nominal codebook size to be fully converted into effective symbol capacity. This ensures balanced model coverage of diverse operating modes (especially low-frequency rare modes such as extreme weather and sudden operating conditions), significantly reducing reconstruction errors during discretization.

[0017] 2. This invention abandons traditional continuous numerical regression (mean squared error loss) and appends discretized time series labels to the expanded vocabulary of a pre-trained Large Language Model (LLM). Within a unified symbolic space, and conditioned by prefix context, a label mask supervision mechanism is used to perform autoregressive generation training for the "next label." By making step-by-step autoregressive predictions within the discrete symbolic space, the model learns the underlying state flow patterns of the power sequence, rather than a simple point-to-point numerical mapping. Experiments show that this generative output effectively suppresses the error accumulation and trajectory drift phenomena that are prone to occur in multi-step rolling prediction, and significantly improves the stability and goodness of fit of 24-hour (96-step) time-domain predictions.

[0018] 3. This invention freezes the parameters of the LLM backbone network during the training phase, injecting low-rank increments (LoRA) only into key projection layers. Simultaneously, a mask matrix M is constructed for the input embedding matrix and the output projection matrix, ensuring that effective fine-tuning gradients only apply to newly added temporal symbol rows. This significantly reduces the computational cost of training a massive model with hundreds of billions of nodes for power time-series prediction. Furthermore, the strict row-level gradient masking prevents the temporal task from disrupting the model's original natural language symbol representation space. Combined with the initialization of centroid perturbations for newly added symbols, the temporal symbols quickly align with the pre-trained representation space in terms of scale and distribution, perfectly preserving the model's language generation prior and statistical stability.

[0019] 4. In the inference stage of this invention, before the unnormalized score of the decoding output at each step enters the Softmax function, a constrained operator is applied. Within the prediction step, scores not in the extended time-series symbol set are forcibly set to [value missing]. At the end of the prediction, only the end marker is allowed to be output. This hard mask constraint limits the effective set of generated symbols and the sequence length, ensuring that the generated marker sequence can be seamlessly and stably reconstructed into a continuous-domain power waveform by the decoder, thus guaranteeing the robustness of the prediction system in industrial applications. 5. When constructing heterogeneous causal input sequences, not only are historical time series labels input, but also environmental semantic context composed of natural language such as historical windows, prediction windows, day and night periods, weather forecasts, and constraints are concatenated. Utilizing the excellent text understanding capabilities of large models, the time series waveforms are aligned with external meteorological / scheduling semantics across modalities. Textual conditional information provides strong external prior constraints for the generation process, significantly enhancing the model's ability to perceive complex environmental changes and further reducing prediction errors. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a flowchart of the renewable energy power prediction method based on time-series discrete labeling of the present invention; Figure 2 This is a framework diagram for renewable energy power prediction in this invention; Figure 3 It is a natural language prompt word template structure; Figure 4 It is a prompt word instance generated based on a template. Detailed Implementation

[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0023] Example 1 like Figure 1 and Figure 2As shown, the renewable energy power prediction method based on time-series discrete labeling consists of a time-series labeler construction stage, a unified vocabulary autoregressive generation stage, and an inference prediction stage; specifically including: Step (1): Obtain historical observation multivariate time series of the target station Where L represents the length of the historical observation window, This represents the observation vector at the t-th historical time step. ,and This indicates that the observation vector consists of M variables; the prediction step size is set to O, and the prediction target is the future sequence. ,in This represents the target value (power scalar) for the t-th prediction step. ,and This indicates that it is a real number; The historical observation multivariate time series obtained in step (1) includes: historical power output data of the target station and exogenous meteorological driving variable data related to power; wherein the meteorological driving variables include, but are not limited to, temperature, humidity, wind speed, wind direction, air pressure, precipitation, and irradiance. The historical observation multivariate time series is formed by splicing the power output and the exogenous meteorological driving variables in a dimensional manner at each time step.

[0024] Step (1) includes the following prediction process: This method consists of a time series marker construction stage and a unified vocabulary autoregressive generation stage; wherein, in the time series marker construction stage, historical multivariate sequences are... (Future sequences are introduced during the training phase) Encoding is performed at the patch level, and the continuous latent representation is discretized using vector quantization to obtain a discrete sequence. and In the unified vocabulary autoregressive generation stage, the temporal tokens are incorporated into the expanded vocabulary of the pre-trained language model. Under the premise of freezing the backbone network parameters, efficient parameter fine-tuning is employed to learn the conditional distribution. The reasoning stage is given Generate future discrete sequences in an autoregressive manner under the condition of And by the timing marker Decoding and reconstruction yields continuous domain prediction results. .

[0025] Step (2): Constructing a temporal tokenization map Where Q represents the mapping function that discretizes a continuous multivariate time series into a labeled sequence. This indicates that the input is a real matrix consisting of R time steps, each with M-dimensional variables. The output is a discrete tag sequence of length T, where each tag takes values from 0 to K-1, where K is the codebook size and T is the tag sequence length. The mapping is used to map the historical sequence and the future sequence to discrete tags, denoted as... and ,in The discrete label sequence corresponding to the historical input. A discrete label sequence corresponding to the future target; Step (2) specifically includes the following steps: Step (21): Given a historical multivariate observation window Apply reversible instance normalization to each variable to obtain a normalized sequence. The normalized sequence is divided along the time dimension into... Given a non-overlapping time segment (patch) of length P, we obtain a block tensor. Applying a linear projection to each patch yields the patch embedding, where the linear projection matrix is... , obtain the embedding tensor The embedded tensor Input a Transformer encoder to obtain a patch-level continuous latent representation that satisfies: in, It consists of a multi-layer multi-head self-attention network and a feedforward network stacked together, and the self-attention is calculated along the patch dimension to model long-range dependencies across patches.

[0026] Step (22): Introduce a learnable codebook (in ), for continuous potential representations With coding Perform separately Normalization yields and For each position Discrete indexes are allocated using nearest neighbor search: The corresponding quantization vector is obtained by looking up a table. Thus, the discrete time series label matrix is obtained. .

[0027] Step (23): Given a discrete index matrix Z, calculate the empirical frequency of the codeword k. ,satisfy: To suppress intra-batch randomness and characterize long-term usage trends, a smoothing coefficient is introduced. The long-term frequency state of the codeword is maintained by an exponential moving average. ,satisfy: To perform anchor-guided relocation for low-frequency or deactivated codewords, the codeword k and the normalized latent representation set are calculated. Similarity score: Centralize the scores to obtain And construct the sampling distribution: Based on the long-term frequency state of the codeword Adaptively adjust the update step size and define the modulation coefficients. And perform interpolation updates on the codewords: in Where is the intensity constant. It is a numerically stable term, and when When larger Approaching 1 to suppress updates, when When smaller Reduce the size to achieve codeword relocation.

[0028] Step (24): To achieve continuous-domain recoverability of the discrete representation, a decoder is introduced. quantization representation Reconstructed into continuous segments: The above It consists of a multi-layer multi-head self-attention network and a feedforward network stacked together, and maps the latent representation back to a local fragment of the original numerical space through a linear regression head; The reconstructed patches are concatenated in chronological order to obtain the reconstruction sequence in the normalized space. For the reconstructed sequence Performing RevIN inverse normalization to recover physical dimensions yields a reconstructed sequence at the physical scale. And make each discrete token correspond to a reconfigurable local dynamic structure.

[0029] Step (25): Train the time-series marker using a vector quantization training objective function, which includes a reconstruction term, a codebook loss term, and a commitment loss term, satisfying: in The continuous latent vector output by the encoder. For quantized codeword vectors and sg(⋅) is the stopping gradient operator. The commitment coefficient is used; and a straight-through estimator (STE) is used to approximate the gradient propagation of the nearest neighbor quantization step to achieve end-to-end optimized training.

[0030] Step (3): Expand the discrete temporal tags and incorporate them into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model, and combine the conditional context C with the historical tags. Given the following, generate future discrete label sequences using an autoregressive method based on the next label. During the training phase, the backbone parameters of the pre-trained language model are frozen, and parameter efficient adaptation is adopted. Row-level gradient masks are applied to the parameters corresponding to newly added temporal symbols for training. Step (3) includes the following steps: Step (31): To incorporate the discrete time series labels of continuous power time series into the unified symbol domain of the pre-trained language model, the basic vocabulary of the pre-trained model is modified. Perform structured extensions: Add boundary markers later , and a time symbol set of size K (Where K is the same size as the codebook), thus obtaining the extended vocabulary. ,satisfy And index the codebook generated by the timing marker. Perform an offset mapping so that its index in the expanded vocabulary is... After completing the vocabulary expansion, the parameters of the newly added temporal symbols in the input embedding matrix and output projection matrix are numerically stabilized and initialized: the parameters of the newly added symbols are initialized based on the statistical center (centroid) of the original word embedding parameters, and a small perturbation is superimposed to align the newly added symbols with the pre-trained representation space in terms of scale and distribution.

[0031] Step (32): To preserve the generative prior of the pre-trained language model, the parameters of the pre-trained language model backbone network are frozen during the training phase, and low-rank increments are injected into the key linear projection layers of the Transformer using a parameter-efficient adaptation method; let the original linear layer weights be... The input vector is The transformed representation is: in Scaling factor It is a low rank value. and For low-rank matrices, only optimization is performed during training. and .

[0032] Step (33): For the newly added timing symbol set The input embedding matrix E and output projection matrix of the pre-trained language model Apply a row-level gradient mask so that only the row parameters corresponding to the newly added symbols participate in the update; let the original gradient be G, then the effective gradient satisfies in Let the Hadamard product be represented, and let the mask matrix M satisfy: This limits the fine-tuning gradient to the range of the input embedding row vector and the output projection row vector corresponding to the newly added time symbols.

[0033] Step (34): Under the conditional autoregressive generation paradigm of a unified vocabulary, label masking is used to supervise the training objective: constructing heterogeneous causal input sequences. Where C represents the context semantics. Historical time-series marker sequence, Label the future target time series sequence; due to C and Given the conditions, only for the corresponding target fragment location set Calculate the generation loss, satisfying: in, Let i be the i-th symbol of sequence U. Let D be its prefix, and let D be the distribution of the training data.

[0034] Step (4): Reasoning stage with C and Construct a generating prefix, apply constrained decoding to the generation process to limit the set of generated symbols and the generation length, and obtain the future discrete label sequence. ;Will The decoder reconstructs the continuous-domain prediction sequence through the temporal tokenization mapping. , and obtain the final prediction result.

[0035] Step (4) specifically includes the following steps: Step (41): Inference phase, given the environmental semantic context C and the historical temporal tag sequence Under the condition of constructing the guiding prefix , at the end Used to explicitly mark the start of the prediction segment to separate the condition segment from the generation segment.

[0036] Step (42): To limit the set of generated symbols and the generation length, the unnormalized score vector output from each decoding step is... Applying constrained operators ,satisfy: in To expand the index set of all temporal symbols in the vocabulary, The length of the time stamp corresponding to the prediction segment. Indicates the candidate set Set the score other than -∞ before proceeding. Normalization.

[0037] Step (43): After completing the constrained autoregressive generation, process the generated future time series labeled sequences. Perform offset removal to recover its corresponding codebook index, and input the recovered codebook index into the decoder of the timing tokenization mapping. Perform continuous domain reconstruction to obtain the continuous domain prediction sequence. .

[0038] Based on a similar inventive concept, embodiments of the present invention also provide a computer storage medium storing a readable program that, when run by a processor, can execute the above-described renewable energy power prediction method.

[0039] Based on a similar inventive concept, this invention provides an electronic device, including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other through the communication bus; The memory is used to store at least one executable instruction, which causes the processor to perform the operation corresponding to the above-described renewable energy power prediction method.

[0040] Based on a similar inventive concept, embodiments of the present invention also provide a computer program product, including computer instructions, which instruct a computing device to perform operations corresponding to the above-described renewable energy power prediction method.

[0041] Example 2 In this embodiment, the method of the present invention is illustrated through specific examples; A renewable energy power prediction method based on time-series discrete labeling and unified vocabulary autoregression includes the following steps: (1) The historical observation multivariate time series of the target power stations are derived from the publicly available China Renewable Energy Power Prediction Dataset (15-minute sampling interval), which includes power observations and corresponding meteorological observations of multiple photovoltaic and wind power stations. Specifically, the photovoltaic prediction task selects stations Solar_1, Solar_4, Solar_5, and Solar_8, and the wind power prediction task selects stations Wind_1, Wind_2, Wind_4, and Wind_5; for each station, its historical observation sequence is organized in chronological order, and the observation vector at any given time is... It is composed of the power value at that moment and the corresponding meteorological driving variables (i.e., multivariate input), with a length of Historical Window As input to the model, and setting the future Each time step is a prediction interval, and the corresponding power prediction sequence is output. .

[0042] The historical observation multivariate time series obtained in step (1) includes historical power output data of the target station and exogenous meteorological driving variables related to power; wherein the meteorological driving variables include, but are not limited to, temperature, humidity, wind speed, wind direction, air pressure, precipitation, and irradiance. The historical observation multivariate time series is formed by splicing the power output and the exogenous meteorological driving variables in a dimensional manner at each time step.

[0043] (2) Constructing a time-series tokenized map Where Q represents the mapping function that discretizes a continuous multivariate time series into a labeled sequence. This indicates that the input is a real matrix consisting of R time steps, each with M-dimensional variables. The output is a discrete tag sequence of length T, where each tag takes values from 0 to K-1, where K is the codebook size and T is the tag sequence length. The mapping is used to map the historical sequence and the future sequence to discrete tags, denoted as... and ,in The discrete label sequence corresponding to the historical input. A discrete label sequence corresponding to the future target; (3) Expand the discrete temporal tags and incorporate them into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model, and combine the conditional context C with the historical tags. Given the following, generate future discrete label sequences using an autoregressive method based on the next label. During the training phase, the backbone parameters of the pre-trained language model are frozen, and parameter efficient adaptation is adopted. Row-level gradient masks are applied to the parameters corresponding to newly added temporal symbols for training. (4) The reasoning stage uses C and Construct a generating prefix, apply constrained decoding to the generation process to limit the set of generated symbols and the generation length, and obtain the future discrete label sequence. ;Will The decoder reconstructs the continuous-domain prediction sequence through the temporal tokenization mapping. , and obtain the final prediction result.

[0044] Example 3 In the power prediction framework based on tokenized autoregressive generation described in this invention, continuous time series are first quantized into discrete time series labels to support conditional generation based on a pre-trained language model. The discretization effect mainly depends on two aspects: firstly, reconstruction fidelity, i.e., whether the discrete representation can maintain the numerical amplitude and morphological characteristics of the original time series after decoding and reconstruction; secondly, codebook utilization, i.e., whether the nominal codebook size can be effectively converted into the actual usable symbol capacity. If a large number of codewords become inactive for a long time, the discrete representation space will degenerate into a low-dimensional symbol subspace, thereby limiting the generative model's ability to represent diverse operating conditions.

[0045] To verify whether the codebook size can be converted into effective capacity, three methods were compared under a unified training setting: the vector quantization method of this invention (including a utilization-aware codebook maintenance mechanism), the conventional vector quantization method used in the comparison, and the vector quantization method based on optimal transmission distance constraints used in the comparison. The timing marker uses a 2-layer Transformer encoder and a 1-layer Transformer decoder with a hidden layer dimension of 512. Related experiments were conducted on photovoltaic power plants 4 and 8, wind power plants 1 and 2.

[0046] As shown in Table 1, the vector quantization discretization method of this invention achieves better reconstruction accuracy under different codebook sizes and test site conditions. When the codebook size is 256, the average absolute error on photovoltaic site 4 is 0.006, which is a reduction of 57.1% and 68.4% compared to the conventional vector quantization method and the vector quantization method based on optimal transmission distance constraints, respectively; on wind farm site 1, the corresponding reductions are 35.3% and 59.3%, respectively. When the codebook size increases from 128 to 2048, the average absolute error of the method of this invention on wind farm site 1 decreases by a cumulative 46.2%, while the improvement of the comparative method is not significant after the codebook size is greater than 512, indicating that it is difficult to effectively translate the expansion of the codebook size into higher quantization resolution and reconstruction accuracy.

[0047] To further explain the above differences, statistical analysis was conducted on the codebook activation during the training process. The results show that, under the condition that the codebook size is not less than 128, the codeword utilization rate of the method of this invention is higher than 98.83%, indicating that the nominal capacity of the codebook can be fully converted into an effective symbol set. In contrast, the comparative method exhibits a significant capacity collapse phenomenon as the codebook size increases. When the codebook size is 2048, the codeword activation rates of the conventional vector quantization method and the vector quantization method based on the optimal transmission distance constraint are only about 1% and 3.5%, respectively. Taking photovoltaic power station 4 as an example, when the codebook size increases by 16 times, the number of codewords actually involved in encoding by the conventional vector quantization method only increases from about 15 to 22, with an effective capacity increase of less than 50%. This mismatch between nominal and effective capacity restricts the discrete representation of the comparative method to a low-dimensional symbol subspace that does not significantly increase with the expansion of the codebook size, thus limiting the improvement of reconstruction performance under large codebook conditions.

[0048] Table 1. Reconstruction accuracy and codebook utilization (%) under different codebook sizes K.

[0049] Furthermore, in this embodiment, the portability analysis of the prediction method proposed in this invention across different large language model backbone networks is performed: To evaluate the generalization ability of the proposed prediction method on different large language model architectures, comparative experiments were conducted by replacing only the backbone network, under the premise of strictly controlling the discretization strategy, generative modeling process, and efficient parameter fine-tuning scheme. This embodiment selects the Qwen3 series (0.6B, 1.7B, 4B) and Llama-3.2 series (1B, 3B) as representatives, covering different model sizes and architecture types. As shown in Table 2, despite significant differences in the scale and structure of the backbone networks, the proposed prediction method achieved stable MAE, RMSE, and [missing data] in both photovoltaic and wind power scenarios. Performance. This indicates that the performance improvement mainly stems from reformulating power prediction as a conditional autoregressive generation task based on discrete time-series tokens, rather than relying on a specific large language model architecture. Furthermore, performance does not monotonically increase with model size; in some scenarios, medium-sized backbone networks achieve results comparable to or even better than larger models. This phenomenon aligns with the inherent randomness of renewable energy power sequences, suggesting that carefully designed time-series discretization and constraint decoding strategies are more crucial than simply increasing model size. Considering prediction performance, computational cost, and parameter size, all subsequent experiments in this invention use Qwen3-0.6B-Base as the backbone network.

[0050] Table 2. Portability of backbone networks across different large language models In this embodiment, a LoRA configuration sensitivity analysis is performed on the prediction method proposed in this invention: To analyze the impact of intrinsic dimensions in efficient parameter fine-tuning, this embodiment focuses on the LoRA rank. (Set scaling factor) The sensitivity of [the material] was evaluated. The results showed that performance [increased] with [various factors]. The value increases rapidly with the increase in size, and then gradually approaches saturation. On the Solar_1 dataset, when... When the value increased from 8 to 32, the MAE decreased by 20.5%. The performance improved from 0.920 to 0.943. On the Wind_4 dataset, the performance also improved significantly. When it reaches its peak ( ).when When further increased to 128, Wind_4's The rank value decreased to 0.622, indicating that over-parameterization may introduce computational redundancy and increase the risk of overfitting to noise. In summary, a small number of trainable parameters are sufficient to effectively adapt to pre-trained priors, and excessively large rank values do not provide sustained performance improvements. Considering both prediction performance and computational efficiency, unless otherwise specified, the LoRA rank will be fixed at [value missing] in all subsequent experiments of this invention. ( ).

[0051] Table 3 LoRA hyperparameters ( Sensitivity analysis of ) In this embodiment, the impact of vocabulary size on the prediction method proposed in this invention is analyzed: Codebook size This determines the nominal capacity of the discrete representation space in the prediction method proposed in this invention. To analyze the relationship between representation resolution and prediction performance, under the conditions of a fixed block partitioning strategy, backbone network architecture, and training process, [the following is discussed / analyzed]. Comparative experiments were conducted. As shown in Table 4, small-to-medium-scale codebooks achieved optimal performance. Solar_1 in... It performs best at [time], while Wind_4 performs best at [time]. The optimal timeframe indicates that a suitable codebook size is sufficient to cover the main operating modes and their nonlinear evolution. Performance does not necessarily improve with... The increase is monotonically increasing; when At times, the prediction error actually increases. This can be attributed to the activation sparsity and reduced utilization issues caused by large codebooks under limited data conditions, and the semantic instability of long-tail tokens, which expands the generated state space and exacerbates the accumulation of long-term errors. Nevertheless, the overall performance changes relatively steadily. For example, Solar_1's... The value consistently above 0.93 indicates that the model has a certain degree of robustness to hyperparameter selection. Considering both accuracy and efficiency, It can provide a more ideal performance and overhead tradeoff in the current task.

[0052] Table 4 Sensitivity analysis of codebook size In this embodiment, a block length sensitivity analysis is performed on the prediction method proposed in this invention: To evaluate the block length The impact of this invention's prediction method on performance was assessed through comparative experiments while keeping other configurations constant. The results show that a smaller block size leads to higher prediction accuracy and better performance. The market is already basically saturated, although Slightly better results were achieved on some metrics. Smaller blocks provide higher time resolution, which is helpful for modeling non-stationary fluctuations such as photovoltaic ramp-up and sudden wind speed changes, and can effectively reduce quantization errors. In contrast, when At the same time, the smoothing effect within each block weakens high-frequency information, leading to error accumulation and performance degradation. Considering both prediction accuracy and computational efficiency, This will significantly increase the sequence length and decoding overhead. Unless otherwise specified, all subsequent experiments in this invention will use [the following methods / technologies]. This is set as the default block length to achieve a more ideal balance between accuracy and computational efficiency.

[0053] Table 5 Sensitivity analysis of block length In this embodiment, the prediction method proposed in this invention is compared and verified. Table 6 presents the long-term prediction results of each method for a 24-hour (96-step) prediction task under a fixed input length of 96. It can be seen that the method of this invention achieves the best mean absolute error, root mean square error, and goodness of fit on all eight test sites, indicating its stable and consistent performance advantage in both photovoltaic and wind power scenarios. Using the best-performing comparative model at each site as a reference, the method of this invention reduces the mean absolute error from 0.111 to 0.082 (a reduction of 26.3%), the root mean square error from 0.175 to 0.131 (a reduction of 24.9%), and the goodness of fit from 0.560 to 0.734 (an absolute increase of 0.174). Furthermore, compared with the "large model prediction comparison method" that performed best at each site, the method of this invention still maintains a significant advantage: the mean absolute error decreased from 0.113 to 0.082 (a reduction of 27.1%), the root mean square error decreased from 0.182 to 0.131 (a reduction of 27.7%), and the goodness of fit increased from 0.497 to 0.734 (an absolute increase of 0.237). These results indicate that the performance improvement is not solely due to the large model backbone itself, but is closely related to the "consistent alignment between discrete time-series labeling representation and generative prediction mechanism" adopted in this invention. In the 24-hour multi-step rolling prediction scenario, this alignment mechanism can effectively suppress the prediction trajectory drift caused by the accumulation of multi-step errors, thereby achieving continuous and significant improvements in both error-related and fit-related indicators.

[0054] Table 6 shows the long-term prediction results for the next 96 steps (24 hours) when the input length is fixed at 96 steps. To verify the impact of discretization quality, generative decoding mechanism, pre-training initialization, and textual conditional information on prediction performance in the method of this invention, multiple sets of comparative experiments were set up under the conditions of the same dataset partitioning, the same input length and prediction step size, the same training rounds, and the same optimized configuration, and the prediction error and fitting effect were statistically compared respectively.

[0055] 1) Comparison of the relationship between codebook utilization and prediction performance: While maintaining a consistent end-to-end prediction framework, the "utilization-aware codebook maintenance quantization strategy" of this invention, Comparison Scheme 1 (conventional vector quantization strategy), and Comparison Scheme 2 (vector quantization strategy based on optimal transmission distance constraints) were compared. The experimental results are shown in Table 7. The quantization strategy of this invention achieved lower mean absolute error and root mean square error at all test sites. Compared with Comparison Scheme 1, at photovoltaic sites 4 / 8, the mean absolute error decreased by 15.6% / 21.9%, and the root mean square error decreased by 16.3% / 24.2%, respectively; at wind farm sites 1 / 2, the mean absolute error decreased by 13.5% / 13.7%, and the root mean square error decreased by 13.8% / 11.5%, respectively; and a consistent error reduction trend was also observed compared with Comparison Scheme 2. Therefore, this invention improves prediction performance by increasing the effective activation ratio of codewords, allowing the nominal codebook capacity to be more fully converted into effective symbol capacity.

[0056] Table 7. Impact of Discretization Strategy on Predictive Performance of Representative Solar and Wind Power Sites 2) Comparison between generative decoding and continuous regression output Under the condition of using the same discretized input, two output methods were set for comparison: Comparison Scheme 1 is a continuous regression output method, that is, the model output is directly regressed to the power sequence through linear mapping and trained with mean squared error loss; the present invention's scheme is a generative output method, that is, future time series symbols are generated autoregressively in a unified vocabulary space and continuously predicted values are obtained through inverse quantization and decoding reconstruction. The experimental results are shown in Table 8. The generative output method of the present invention is significantly better than the comparison scheme 3 in long-term multi-step prediction: the mean absolute error / root mean square error is reduced by 58.2% / 47.8% on photovoltaic station 4, and by 54.3% / 45.1% on photovoltaic station 5; by 31.6% / 24.8% on wind farm station 1, and by 32.9% / 24.1% on wind farm station 2. It can be seen that expressing the prediction process as autoregressive generation in a unified vocabulary and cooperating with discrete representation is beneficial to suppressing trajectory drift caused by multi-step error accumulation and improving the stability of long-term prediction.

[0057] Table 8. Impact of Output Header Architecture on Prediction Performance 3) Comparison of the impact of pre-training initialization Under the same network structure and training configuration, the prediction performance of pre-trained parameter initialization (the default setting of this invention) and random initialization (comparison scheme 1) were compared. The results are shown in Table 9. Pre-trained initialization achieved better performance in all sites: in the photovoltaic scenario, the mean absolute error and root mean square error were reduced by an average of about 26.5%, and the goodness of fit exceeded 0.90; the advantage was still obvious in the wind power scenario, for example, the root mean square error was reduced by 11.1% and the goodness of fit was improved by 30.5% in wind farm 5. It can be seen that pre-trained prior can improve the statistical stability of the autoregressive generation process and alleviate the error accumulation in long-term prediction.

[0058] Table 9. Impact of Pre-training Initialization on Prediction Performance 4) Comparison of the contributions of textual conditional information This invention provides templates for generating natural language prompt words, such as... Figure 3 As shown, this template organizes information such as the forecasting task, historical time windows, historical data patterns, future weather scenarios, and forecasting instructions into structured natural language text. Based on Figure 3 The template shown, combined with specific input data, can generate a complete example of prompt words, such as... Figure 4 As shown. In practical applications, large models are... Figure 4 The prompts shown in the diagram serve as contextual input, thereby enabling conditional prediction of future time series.

[0059] Based on the above-mentioned prompt word construction method, to evaluate the effect of text conditional information, a configuration including text conditional information (the present invention's scheme) and a configuration removing text conditional information (comparison scheme 1) were compared. The results are shown in Table 10. Introducing text conditional information brings consistent improvements in both photovoltaic and wind power scenarios: on photovoltaic site 8, the mean absolute error and root mean square error decreased by 24.7% and 32.0%, respectively, with a 5.7% improvement in goodness of fit; on wind power site 5, the mean absolute error and root mean square error decreased by 22.0% and 22.5%, respectively, with an approximately 13.7% improvement in goodness of fit. This demonstrates that text conditional information can enhance conditional constraints and stabilize the generation process, thereby further improving prediction accuracy and fitting ability.

[0060] Table 10 Impact of Text Prompts on Prediction Performance Example 4 In this embodiment, a renewable energy power prediction device based on time-series discrete labels is proposed, specifically including: A multivariate sequence construction module is used to obtain the historical observation multivariate time series of the target station. The multivariate time series is formed by splicing power output and meteorological driving variables according to dimensions at each time step. The temporal tagger construction module is used to construct and train the temporal tagging mapping, including: dividing the multivariate time series into blocks and encoding them to obtain a continuous latent representation; normalizing the continuous latent representation and the introduced learnable codebook respectively; and using nearest neighbor search to allocate discrete indices to obtain a discrete temporal tag matrix. The unified vocabulary autoregressive training module is used to expand and incorporate the discrete temporal tags output by the temporal tagging mapping obtained from the training into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model. Under the premise of freezing the backbone network parameters of the pre-trained language model, the conditional autoregressive generative model is fine-tuned. The constrained reasoning and reconstruction module is used to generate future discrete label sequences based on the trained conditional autoregressive generative model, given the environmental semantic context and the historical temporal label sequence obtained through the temporal labeling mapping, and input the generated future discrete label sequence into the decoder of the temporal labeling mapping to reconstruct the prediction sequence in the continuous domain.

[0061] The methods of the present invention can be implemented in hardware, firmware, or as software or computer code that can be stored in a recording medium (such as a CD-ROM, RAM, floppy disk, hard disk, or magneto-optical disk), or as computer code originally stored on a remote recording medium or a non-transitory machine-readable medium and subsequently stored on a local recording medium, downloaded via a network. Thus, the methods described herein can be processed by software stored on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware (such as an ASIC or FPGA). It is understood that the computer, processor, microprocessor controller, or programmable hardware includes storage components (e.g., RAM, ROM, flash memory, etc.) capable of storing or receiving software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the methods described herein. Furthermore, when a general-purpose computer accesses the code used to implement the methods shown herein, the execution of the code transforms the general-purpose computer into a dedicated computer for performing the methods shown herein.

[0062] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the claimed invention.

Claims

1. A renewable energy power prediction method based on time-series discrete labeling, characterized in that, Includes the following steps: Obtain historical observation multivariate time series of the target station, wherein the multivariate time series is formed by splicing power output and meteorological driving variables according to dimensions at each time step; Constructing and training a temporal labeling map includes: dividing a multivariate time series into blocks and encoding them to obtain a continuous latent representation; normalizing the continuous latent representation and the introduced learnable codebook respectively; and using nearest neighbor search to allocate discrete indices to obtain a discrete temporal labeling matrix. The discrete temporal tags output by the temporal tagging mapping obtained from training are expanded and incorporated into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model. The conditional autoregressive generative model is then fine-tuned by freezing the backbone network parameters of the pre-trained language model. Given the environmental semantic context and the historical time-series label sequence obtained through the time-series labeling mapping, a future discrete label sequence is generated based on the trained conditional autoregressive generative model in an autoregressive manner. The generated future discrete label sequence is then input into the decoder of the time-series labeling mapping to reconstruct the prediction sequence in the continuous domain.

2. The renewable energy power prediction method based on time-series discrete labeling as described in claim 1, characterized in that, During the training of the temporal tokenization mapping, utilization-aware dynamic codebook maintenance is performed, including: The empirical frequency of the codewords is statistically analyzed, and an exponential moving average is used to maintain the long-term frequency state of the codewords. ; Calculate the similarity score between the codeword and the normalized continuous latent representation and construct a sampling distribution. Based on the sampling distribution, obtain the anchor points in the latent space. ; According to the long-term frequency state Adaptive adjustment of modulation coefficient and using the modulation coefficient Combined with the anchor point Perform interpolation updates on low-frequency codewords.

3. The renewable energy power prediction method based on time-series discrete labeling according to claim 2, characterized in that, The long-term frequency state of codewords is maintained using an exponential moving average. The formula is: The formula for the exponential moving average of the long-term frequency state is: in, For smoothing coefficients, For typing Experience frequency; The adaptively adjusted modulation coefficient satisfy: in, Where is the intensity constant. For numerically stable terms, The size of the codebook; The process of performing interpolation updates on low-frequency codewords is as follows: in, To represent the first in the codebook The embedding vector corresponding to each codeword.

4. The renewable energy power prediction method based on time-series discrete labeling according to claim 1, characterized in that, The process of dividing a multivariate time series into blocks and encoding them to obtain a continuous latent representation includes: A reversible instance normalization process is applied to a multivariate time series to obtain a normalized sequence. The normalized sequence is then divided along the time dimension into segments of length [missing information]. Non-overlapping time segments; Apply a linear projection to each time segment to obtain the segment embedding tensor; The fragment is embedded into a tensor input of a Transformer encoder containing a multi-layer multi-head self-attention network. Attention is calculated along the fragment dimension, and the continuous latent representation is output.

5. The renewable energy power prediction method based on time-series discrete labeling according to claim 1, characterized in that, The temporal tokenization mapping employs a training objective function that includes a reconstruction term, a codebook loss term, and a commitment loss term. Conduct training: in, The original sequence, This is the reconstructed sequence obtained after reconstruction by the temporal tokenization mapping decoder and inverse normalization processing. For continuous latent vectors, For quantized codeword vectors, To stop the gradient operator, M represents the commitment coefficient, M represents the variable dimension corresponding to each time segment, and N represents the number of time segments after partitioning.

6. The renewable energy power prediction method based on time-series discrete labeling according to claim 1, characterized in that, The steps for expanding discrete time-series tags and incorporating them into a unified vocabulary include: The base vocabulary of the pre-trained model Add boundary markers later , and time sequence symbol set Obtain the expanded vocabulary ; The statistical center based on the original word embedding parameters initializes the parameters of the newly added time-series symbols in the input embedding matrix and the output projection matrix, and adds perturbations.

7. The renewable energy power prediction method based on time-series discrete labeling according to claim 6, characterized in that, The fine-tuning training process includes: Apply row-level gradient masking matrices to the input embedding matrix and output projection matrix of the pre-trained language model. This makes the effective gradient for training fine-tuning possible. satisfy: in, For the original gradient, The mask matrix represents the Hadamard product. Only in the corresponding newly added timing symbol set The element at the row position has a value of 1, and the elements at the other positions have a value of 0; Constructing heterogeneous causal input sequences: ;in, This represents the environmental semantic context composed of historical windows, forecast windows, day / night cycles, weather forecasts, and constraints. For historical time-series labeling sequences, To label the sequence for future targets; A label masking mechanism is employed, taking heterogeneous causal input sequences as input, and only targeting the time series sequence corresponding to the future target. target fragment location set Calculate the generation loss: in, Let i be the i-th symbol of the heterogeneous causal input sequence U. Let D be its prefix, and let D be the distribution of the training data.

8. The renewable energy power prediction method based on time-series discrete labeling according to claim 7, characterized in that, When generating the future discrete label sequence in an autoregressive manner, a constrained decoding is applied to the generation process to limit the set of generated symbols: The unnormalized score vector output at each decoding step Applying constrained operators Then perform Softmax normalization; the constrained operator satisfies: in, The length of the time stamp corresponding to the prediction segment. This indicates that it will not belong to the candidate set. The score corresponding to the symbol is set to .

9. A renewable energy power prediction device based on time-series discrete tags, comprising the method described in any one of claims 1-8, characterized in that, include: A multivariate sequence construction module is used to obtain the historical observation multivariate time series of the target station. The multivariate time series is formed by splicing power output and meteorological driving variables according to dimensions at each time step. The temporal tagger construction module is used to construct and train the temporal tagging mapping, including: dividing the multivariate time series into blocks and encoding them to obtain a continuous latent representation; normalizing the continuous latent representation and the introduced learnable codebook respectively; and using nearest neighbor search to allocate discrete indices to obtain a discrete temporal tag matrix. The unified vocabulary autoregressive training module is used to expand and incorporate the discrete temporal tags output by the temporal tagging mapping obtained from the training into the unified vocabulary of the pre-trained language model to construct a conditional autoregressive generative model. Under the premise of freezing the backbone network parameters of the pre-trained language model, the conditional autoregressive generative model is fine-tuned. The constrained reasoning and reconstruction module is used to generate future discrete label sequences based on the trained conditional autoregressive generative model, given the environmental semantic context and the historical temporal label sequence obtained through the temporal labeling mapping, and input the generated future discrete label sequence into the decoder of the temporal labeling mapping to reconstruct the prediction sequence in the continuous domain.

10. A computer storage medium storing a readable program, characterized in that, When the program is run, the program can instruct the computing device to perform renewable energy power prediction based on time-series discrete tags as described in any one of claims 1-8.