A short-term temperature prediction method and device based on quantum-enhanced Transformer and a medium

By using a quantum-enhanced Transformer model combined with a spatiotemporal encoder based on 3D convolution and periodically initialized position encoding, the problems of high computational resource consumption and slow prediction speed in short-term temperature prediction are solved, enabling efficient and accurate feature extraction and real-time prediction of temperature data.

CN122309975APending Publication Date: 2026-06-30上海量感智能科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
上海量感智能科技有限公司
Filing Date
2026-03-26
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies consume large amounts of computational resources and have slow prediction speeds in short-term temperature forecasting, making it difficult to meet real-time requirements. Furthermore, they are not accurate enough in capturing specific change patterns in temperature data and cannot effectively handle complex nonlinear relationships.

Method used

A lightweight model based on quantum-enhanced Transformer is adopted to design a spatiotemporal encoder that integrates three-dimensional convolution and periodic initialization position encoding. A quantum-enhanced feedforward network with residual connections is introduced, and features are extracted through a quantum convolution attention module to optimize the spatial and temporal features of temperature data.

Benefits of technology

It improves the ability to accurately extract the spatiotemporal characteristics and complex nonlinear relationships of temperature fields, reduces computational complexity and memory usage, meets the real-time requirements of short-term temperature forecasting, and is suitable for scenarios such as power dispatching, agricultural disaster prevention, and urban management.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309975A_ABST
    Figure CN122309975A_ABST
Patent Text Reader

Abstract

This invention relates to a method, device, and medium for short-term temperature prediction based on a quantum-enhanced Transformer. The method involves acquiring multi-source historical temperature data, constructing a spatiotemporal dataset for a single temperature mode, and inputting it into a quantum-enhanced lightweight Transformer model. A quantum convolutional attention module is used to extract features from the input spatiotemporal dataset, yielding quantum-enhanced attention features. These attention features are then input into a spatiotemporal encoder to extract spatial and temporal features of the temperature data, obtaining an encoded spatiotemporal feature representation. This encoded spatiotemporal feature representation is then input into a quantum-enhanced feedforward network, where parameterized quantum gates are used for feature optimization, resulting in optimized quantum-enhanced features. Finally, the output layer performs decoding and mapping to output the predicted temperature for the near future. Compared to existing technologies, this invention offers advantages such as high accuracy, strong adaptability, and high efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of meteorological forecasting technology, and in particular to a short-term temperature forecasting method, device and medium based on quantum-enhanced Transformer. Background Technology

[0002] Accurate short-term temperature forecasts are crucial in all aspects of modern society, including agricultural production, energy management, transportation, and disaster early warning. Traditional temperature forecasting methods primarily rely on numerical weather prediction models, which make predictions by solving complex physical equations. However, these methods are computationally expensive and slow, making it difficult to meet the real-time requirements of short-term temperature forecasts.

[0003] In recent years, with the development of artificial intelligence technology, temperature prediction methods based on deep learning have gradually become a research hotspot. Among existing technologies, researchers have proposed temperature prediction models based on convolutional neural networks and recurrent neural networks, such as ConvLSTM and TrajGRU, which capture temperature change patterns by processing time-series data. Other studies have attempted to apply the Transformer architecture to weather forecasting, utilizing attention mechanisms to capture long-range dependencies in meteorological data.

[0004] However, existing deep learning methods still have the following limitations in short-term temperature prediction applications: First, the model has a large number of parameters and high computational complexity, making it difficult to meet the needs of real-time prediction; second, it is not accurate enough in capturing specific change patterns in temperature data, and its feature extraction capabilities are limited; third, traditional neural networks cannot effectively handle the complex nonlinear relationships in temperature data, resulting in limited prediction accuracy. Therefore, how to construct a temperature prediction method that can efficiently handle the complex nonlinear relationships in temperature data and meet the real-time requirements of short-term prediction is a technical problem that needs to be solved. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of the existing technology and provide a short-term temperature prediction method, device and medium based on quantum-enhanced Transformer. Targeting the unique spatiotemporal correlation characteristics and periodic physical laws of temperature data, a lightweight spatiotemporal encoder integrating three-dimensional convolution and periodically initialized position encoding is designed in the quantum-enhanced Transformer model. Furthermore, a quantum-enhanced feedforward network is introduced to initialize the quantum gate rotation angle according to the daily temperature cycle and incorporate residual connections. This solves the technical problems of structural destruction, blind parameter initialization and unstable training in general Transformer architectures and quantum circuits when processing temperature data, achieving accurate extraction of the spatiotemporal characteristics and complex nonlinear relationships of the temperature field.

[0006] The objective of this invention can be achieved through the following technical solutions: According to one aspect of the present invention, a short-term temperature prediction method based on quantum-enhanced Transformer is provided, the specific steps of which include: S1. Obtain multi-source historical temperature data, perform data cleaning and normalization on the multi-source historical temperature data, and construct a spatiotemporal dataset for a single temperature mode. S2. Input the temperature data in the spatiotemporal dataset into the quantum-enhanced lightweight Transformer model, and extract features from the input spatiotemporal dataset through the quantum convolutional attention module to obtain the quantum-enhanced attention features; S3. Input the quantum-enhanced attention features into the spatiotemporal encoder in the quantum-enhanced lightweight Transformer model to extract the spatial and temporal features of the temperature data and obtain the encoded spatiotemporal feature representation. S4. Input the encoded spatiotemporal feature representation into the quantum enhancement feedforward network in the quantum-enhanced lightweight Transformer model, and perform feature optimization through the parameterized quantum gate circuits in the quantum enhancement feedforward network to obtain the optimized quantum-enhanced features. S5. Based on the optimized quantum enhancement features, the output layer performs decoding mapping to output the temperature prediction results for the near future.

[0007] Furthermore, in S1, the specific steps for constructing the spatiotemporal dataset of a single temperature mode include: Collect multi-source temperature data, including temperature observation data from ground meteorological stations, temperature inversion data from satellite remote sensing, and temperature field data from numerical weather prediction. The collected multi-source temperature data were cleaned, and a hybrid interpolation method combining spatial inverse distance weighted interpolation and time cubic spline interpolation was used to complete the missing data. The completed temperature data is then normalized. A fixed-length sliding window is used to sample continuous time series of temperature data to construct sample pairs corresponding to the input sequence and the prediction target, i.e., a spatiotemporal dataset of a single temperature mode.

[0008] Furthermore, in step S2, before inputting the spatiotemporal dataset into the quantum-enhanced lightweight Transformer model, a position information injection operation is performed on the spatiotemporal dataset, specifically including: Obtain sample data from the spatiotemporal dataset, wherein the dimensions of the sample data include sample batch size, sequence length, and feature dimension; The sample data is input into a linear projection layer, and the feature dimensions of the sample data are mapped to the hidden layer of a quantum-enhanced lightweight Transformer model to obtain the projected data. Learnable position encoding vectors are generated for each position in the sequence of sample data, and the dimension of the position encoding vectors matches the feature dimension of the projected data; The projected data and the learnable position encoding vector are added element by element to obtain the model input tensor after injecting position information; The model input tensor is fed into the quantum-enhanced lightweight Transformer model for subsequent feature extraction.

[0009] Furthermore, the specific steps of the quantum convolutional attention module in S2 to quantum-enhanced attention features of the spatiotemporal dataset include: Obtain temperature data from the spatiotemporal dataset, with dimensions including batch size, spatial length, spatial width, spatial height, and time step. The temperature data is input into the convolutional layer, and the spatial local features within each time step are extracted to obtain the convolutional feature map; The convolutional feature maps are stacked in the time dimension to maintain the independence of each time step; The stacked feature maps are input into the quantum-enhanced channel attention submodule. The dependencies between feature channels are modeled using parameterized quantum circuits to generate channel attention weights, which are then weighted with the original feature maps along the channel dimension. The channel-weighted feature maps are then input into the quantum-enhanced temporal attention submodule. The importance of different time steps is modeled using parameterized quantum circuits to generate temporal attention weights, which are then weighted with the original feature maps along the temporal dimension. Finally, the feature maps weighted by channel attention and temporal attention are fused to obtain the quantum-enhanced attention features.

[0010] Furthermore, the spatiotemporal encoder in S3 adopts a stacked encoder layer structure, uses a three-dimensional convolutional kernel to simultaneously extract the spatial features and temporal correlations of the temperature field, and uses an adaptive pooling layer to unify temperature data from different sources to a fixed size; the position encoding used in the spatiotemporal encoder is initialized as a function related to the daily and annual cycles of temperature; the specific steps for a single encoder layer in the spatiotemporal encoder to process the quantum-enhanced attention features include: The quantum-enhanced attention features are input into the multi-head attention module, and linear projection is performed through the parameter-shared query, key, and value weight matrix to obtain the shared projection attention features. The shared projected attention features are used to calculate the initial attention output using a multi-head attention mechanism. During training, the contribution of each attention head to the prediction loss is evaluated, the top 70% of attention heads are retained, the remaining attention heads are blocked, and the pruned attention output is obtained. The pruned attention output is residually concatenated with the quantum-enhanced attention feature to obtain the first residual output; The first residual output is subjected to layer normalization to obtain the normalized attention features; The normalized attention features are input into a quantum-enhanced feedforward network for processing to obtain quantum-enhanced feature output. The quantum-enhanced feature output is residually connected to the normalized attention feature to obtain the intermediate feature representation of the current encoder layer output; Multiple encoder layers are stacked, and the intermediate feature representation output by the last encoder layer is the encoded spatiotemporal feature representation in S3. Further, the quantum-enhanced feedforward network includes a classical-to-quantum encoding layer, multiple quantum layers, and a quantum-to-classical decoding layer; the classical-to-quantum encoding layer maps classical temperature features to a low-dimensional quantum space through linear layers; the multiple quantum layers include RX, RY, and RZ parameterized quantum gates and CNOT entanglement gates; the quantum-to-classical decoding layer maps quantum measurement results back to the classical feature space.

[0011] Furthermore, the processing of the multiple quantum layers includes: The low-dimensional quantum space features of the classical-to-quantum coding layer output are obtained, and the numerical range is limited by the tanh function; Initialize the quantum device and reset the quantum state; The restricted data is encoded into the phase of the quantum state through the RY gate; The RX, RY, RZ parameterization gates and CNOT entanglement gates are applied alternately in multiple quantum layers, with the rotation angle of the quantum gates initialized according to the typical period of temperature change. Measure the expected value of all qubits and combine them to obtain the quantum output characteristics; A residual connection is introduced between the quantum circuit and the quantum-to-classical decoding layer.

[0012] Furthermore, the quantum-enhanced lightweight Transformer model uses mean squared error and mean absolute error as joint loss functions for model training, and applies gradient accumulation and mixed precision training to accelerate model convergence, with a prediction time ranging from 0.5 to 2 hours. According to a second aspect of the invention, an electronic device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the program to implement the method described above.

[0013] According to a third aspect of the present invention, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the method described thereon.

[0014] Compared with the prior art, the present invention has the following beneficial effects: (1) Improvement effect on the mismatch between model architecture and temperature spatiotemporal data: In view of the unique spatiotemporal correlation characteristics of temperature data, this invention designs a dedicated lightweight spatiotemporal encoder in the quantum-enhanced Transformer model. It extracts the spatial distribution features and temporal evolution of the temperature field simultaneously through three-dimensional convolution kernels, and unifies the multi-source heterogeneous temperature data to a fixed size through adaptive pooling layers. Compared with the scheme of directly flattening input or using a general encoder, this invention can better preserve the spatial topology and temporal continuity of the temperature field and avoid information loss caused by destroying the original structure of the data. On this basis, the position encoding is initialized as a function related to the daily and annual cycles of temperature, so that the model has a priori knowledge of the periodic change of temperature in the early stage of training. This solves the structural destruction and computational redundancy problems of the general Transformer architecture when processing three-dimensional spatiotemporal temperature data, and improves the model's ability to extract the spatiotemporal features of the temperature field.

[0015] (2) Improvement effect on the disconnect between quantum circuit and meteorological data characteristics: This invention has customized the design of quantum enhancement feedforward network based on the physical characteristics of temperature data. In the process of constructing quantum circuit, the rotation angle of quantum gate is initialized according to the typical cycle of temperature change (such as the 24-hour daily cycle), so that quantum circuit can capture the intrinsic law of temperature evolution more quickly. At the same time, residual connection is introduced between quantum circuit and classical decoding layer to effectively resist the random disturbance brought by quantum measurement, stabilize the gradient flow in the training process, solve the technical obstacles of blind initialization of quantum circuit parameters and difficulty in integration with temperature data characteristics, so that quantum enhancement mechanism can accurately act on the nonlinear feature extraction of temperature data and improve the model's ability to fit complex temperature change patterns.

[0016] (3) Improvement effect on the contradiction between model computation efficiency and prediction real-time performance: This invention integrates multiple lightweight designs in the quantum-enhanced Transformer model and adopts parameter sharing technology in the multi-head attention mechanism to reduce the number of learnable parameters; it introduces a dynamic pruning strategy for attention heads based on the importance of temperature prediction tasks, evaluates the contribution of each attention head to the prediction loss during training, and shields heads with low contribution during inference; at the same time, the quantum-enhanced feedforward network utilizes the parallel characteristics of quantum computing to complete feature optimization in low-dimensional quantum space, avoiding the large number of parameter calculations in high-dimensional space of traditional fully connected layers, significantly reducing the computational complexity and memory occupation of the model while maintaining prediction accuracy, and solving the technical problems of large number of parameters, slow inference speed, and difficulty in meeting the real-time requirements of short-term temperature prediction in existing deep learning methods, so that this invention can be applied to temperature prediction scenarios with strict timeliness requirements such as power dispatching, agricultural disaster prevention, and urban management. Attached Figure Description

[0017] Figure 1 This is a flowchart of a short-term temperature prediction method based on quantum-enhanced Transformers. Detailed Implementation

[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0019] Current temperature forecasting technologies primarily rely on numerical weather prediction models. However, these models suffer from drawbacks such as high computational resource consumption, slow forecasting speed, and difficulty in meeting real-time temperature forecasting requirements for short periods, such as two hours. Temperature forecasting using deep learning algorithms also faces limitations, including a large number of model parameters, inaccurate capture of specific patterns in temperature data, and difficulty in effectively handling complex and unique nonlinear relationships within temperature data.

[0020] This embodiment optimizes a quantum-enhanced lightweight Transformer model for single-temperature data, achieving a balance between prediction accuracy and efficiency in short-term temperature prediction through quantum enhancement mechanisms and lightweight design. This embodiment constructs a spatiotemporal dataset of a single temperature mode by acquiring multi-source historical temperature data, and sequentially processes it through quantum convolutional attention module feature extraction, spatiotemporal encoder spatiotemporal feature extraction, quantum-enhanced feedforward network feature optimization, and output layer decoding mapping, ultimately outputting the temperature prediction result for the near future. Addressing the unique spatiotemporal correlation characteristics of temperature data, a complete quantum enhancement processing flow is constructed. The quantum convolutional attention module enhances the perception of local features of the temperature field, the spatiotemporal encoder preserves the spatial topology and temporal continuity of the temperature data, and the quantum-enhanced feedforward network utilizes parameterized quantum gate circuits to improve the fitting ability to complex nonlinear relationships. This solves the structural destruction and information loss problems existing in traditional methods when processing three-dimensional spatiotemporal temperature data, achieving accurate extraction of the spatiotemporal features and complex change patterns of the temperature field.

[0021] like Figure 1 As shown, this embodiment provides a short-term temperature prediction method based on quantum-enhanced Transformer, and the specific steps include: S1. Obtain multi-source historical temperature data, perform data cleaning and normalization on the multi-source historical temperature data, and construct a spatiotemporal dataset for a single temperature mode. S2. Input the temperature data in the spatiotemporal dataset into the quantum-enhanced lightweight Transformer model, and extract features from the input spatiotemporal dataset through the quantum convolutional attention module to obtain the quantum-enhanced attention features; S3. Input the quantum-enhanced attention features into the spatiotemporal encoder in the quantum-enhanced lightweight Transformer model to extract the spatial and temporal features of the temperature data and obtain the encoded spatiotemporal feature representation. S4. Input the encoded spatiotemporal feature representation into the quantum-enhanced feedforward network in the quantum-enhanced lightweight Transformer model, and perform feature optimization through the parameterized quantum gate circuits in the quantum-enhanced feedforward network to obtain the optimized quantum-enhanced features. S5. Based on the optimized quantum enhancement features, the output layer performs decoding mapping to output the temperature prediction results for the near future.

[0022] In S1, the specific steps for constructing a spatiotemporal dataset for a single temperature mode include: Collect multi-source temperature data, including temperature observation data from ground meteorological stations, temperature inversion data from satellite remote sensing, and temperature field data from numerical weather prediction. The collected multi-source temperature data were cleaned, and a hybrid interpolation method combining spatial inverse distance weighted interpolation and time cubic spline interpolation was used to complete the missing data. The completed temperature data is then normalized. A fixed-length sliding window is used to sample continuous time series of temperature data to construct sample pairs corresponding to the input sequence and the prediction target, i.e., a spatiotemporal dataset of a single temperature mode.

[0023] Specifically, for example, if a station is missing data, the missing meteorological element values ​​can be estimated by obtaining the values ​​from other known stations within a certain radius (50 kilometers in this example). The estimation weight is inversely proportional to the square of the distance between the missing station and the known stations, as expressed by: , in, To estimate the weights, The spatial interpolation result is obtained by weighted summation of the distances between missing and known stations. The total completion value is obtained by weighted summation of the spatial and temporal interpolation results, where the weight values ​​are determined based on the distribution density of surrounding stations and data stability.

[0024] In addition, outlier detection and correction processing are required for the completed temperature data. Reasonable ranges for each meteorological element in the multi-source temperature data are set based on physical meaning. Obvious outliers that exceed the range are removed, and the data is completed using a hybrid interpolation method consistent with the missing data. Potential outliers within the range are identified using the Z-score method and smoothed using the mean or interpolation results of adjacent time points.

[0025] For categorical time elements in the completed temperature data, such as days of the week, one-hot encoding is performed, converting them into binary vectors to highlight the discrete nature of the data and avoid misunderstanding the order of magnitude among discrete categories. Secondly, difference features are constructed by calculating the first difference of the target variable, i.e., the temperature value, to obtain instantaneous temperature change features, used to capture the trend and abrupt changes in temperature between adjacent time points. Finally, sliding window statistical features are extracted. Taking key meteorological elements such as temperature as the object, their statistics within a fixed-length historical window are calculated, including the sliding window mean, maximum, and minimum values. For example, the average temperature over the past 3 hours is calculated as a sliding average feature to smooth short-term random noise and reflect local temperature trends. All the features constructed above constitute the original feature set, used for subsequent normalization processing.

[0026] To eliminate the impact of dimensional differences and inconsistent numerical ranges among different features on the model training process and accelerate model convergence, the obtained original feature set is normalized. For meteorological element data, including the original collected temperature, humidity, and air pressure, as well as constructed difference features and sliding window statistical features, Z-score normalization is used. For each feature, its mean and standard deviation on the training set are calculated, and then transformed so that the normalized feature has a mean of 0 and a variance of 1. For time elements, i.e., year, month, and day, and geospatial elements, i.e., longitude, latitude, and elevation, Min-Max normalization is used to scale them to the [0, 1] interval, expressed as: The normalization range for the year is preset to 1950 to 2050, the month to 0 to 12, and the date to 0 to 30; the longitude range is fixed at -180 to 180, and the latitude range is fixed at -90 to 90; the elevation is normalized based on the minimum and maximum values ​​determined from actual data. All parameters for normalization operations are calculated from the training set and applied to the validation and test sets.

[0027] Normalized continuous time-series data is transformed into supervised learning samples suitable for training quantum-enhanced lightweight Transformer models. A fixed-length sliding window sampling strategy is used for sample construction. The length of the historical sequence is set. m This indicates the number of historical time steps used for prediction; it sets the prediction step size. t 2 , where represents the number of future time steps to be predicted. For a sampling window, the input sequence X and the prediction target Y are constructed as follows: the input sequence X is continuous. m All feature data at each time step constitute a dimension ( m , n A matrix of ), where n This represents the total number of features, including original meteorological elements, encoded temporal features, constructed difference features, and sliding window statistical features. The predicted target Y is the nth time segment after the end of the input sequence. t 2 The temperature values ​​at each time step. For multi-step prediction tasks, the temperature values ​​at each of the next few time steps are taken. t 2 The temperature values ​​at each time step constitute the predicted target sequence. A sliding window with a set step size is used to traverse the entire time series, generating a large number of (X, Y) sample pairs. These sample pairs are divided into training, validation, and test sets for subsequent training, validation, and performance evaluation of the quantum-enhanced lightweight Transformer model. The completed dataset is a single-temperature-modal spatiotemporal dataset, which simultaneously contains spatial distribution information of the temperature field, temporal evolution patterns, and constructed enhancement features.

[0028] Based on the single-temperature modal spatiotemporal dataset X, its dimension is ( batch size , m , n ),in batch size For batch size, m The length of the historical sequence. n The total number of features. The spatiotemporal dataset input sequence X is fed into a linear projection layer, and the feature dimensions are reduced through a linear transformation. n Mapped to the hidden layer dimension of the model d model After obtaining the projected features, their dimensionality transformation is ( batch siz ,m, d model This projection operation is used to unify multi-source heterogeneous features into a hidden representation space that the model can handle.

[0029] Because the Transformer architecture of the quantum-enhanced lightweight Transformer model lacks inherent awareness of sequence order, explicit injection of temporal order information is required. This embodiment employs a learnable positional encoding method, enabling the model to possess prior knowledge of the periodic variation patterns of temperature from the early stages of training. This solves the technical problem that general positional encoding struggles to capture the periodic physical laws of temperature data, enhancing the model's ability to perceive temporal sequence information. For each position index in the sequence, i.e., 0 to m-1, a learnable positional encoding vector is generated. The dimension of this vector is the same as the hidden layer dimension of the projected features. d model Matching. The learnable location encoding vector is element-wise added to the projected features to obtain the model input tensor after injecting location information, maintaining its dimension as (). batch siz ,m, d model The positional encoding is learned end-to-end during model training and initialized as a function related to the daily and annual temperature cycles, enabling it to better capture the inherent periodicity of temperature data.

[0030] Specifically, in S2, before inputting the spatiotemporal dataset into the quantum-enhanced lightweight Transformer model, a position information injection operation is performed on the spatiotemporal dataset, specifically including... Obtain sample data from the spatiotemporal dataset. The dimensions of the sample data include sample batch size, sequence length, and feature dimension. The sample data is input into the linear projection layer, and the feature dimensions of the sample data are mapped to the hidden layer of the quantum-enhanced lightweight Transformer model to obtain the projected data. Learnable position encoding vectors are generated for each position in the sequence of sample data, and the dimension of the position encoding vectors matches the feature dimension of the projected data. The projected data and the learnable position encoding vector are added element by element to obtain the model input tensor after injecting position information; The model input tensor is fed into a quantum-enhanced lightweight Transformer model for subsequent feature extraction.

[0031] The specific steps of the quantum convolutional attention module in S2 to quantum-enhanced attention features on the spatiotemporal dataset include: Acquire temperature data from the spatiotemporal dataset, with dimensions including batch size, spatial length, spatial width, spatial height, and time step. Temperature data is input into a convolutional layer, and spatial local features within each time step are extracted to obtain a convolutional feature map; Stack the convolutional feature maps along the time dimension to maintain the independence of each time step; The stacked feature maps are input into the quantum-enhanced channel attention submodule. The dependencies between feature channels are modeled using parameterized quantum circuits to generate channel attention weights, which are then weighted with the original feature maps along the channel dimension. The channel-weighted feature maps are then input into the quantum-enhanced temporal attention submodule. The importance of different time steps is modeled using parameterized quantum circuits to generate temporal attention weights, which are then weighted with the original feature maps along the temporal dimension. Finally, the feature maps weighted by channel attention and temporal attention are fused to obtain the quantum-enhanced attention features.

[0032] The spatiotemporal encoder in S3 employs a stacked encoder layer structure, using a three-dimensional convolutional kernel to simultaneously extract the spatial features and temporal correlations of the temperature field, and uses an adaptive pooling layer to unify temperature data from different sources to a fixed size. The position encoding used in the spatiotemporal encoder is initialized as a function related to the daily and annual cycles of temperature. The specific steps for processing the quantum-enhanced attention features by a single encoder layer in the spatiotemporal encoder include: The quantum-enhanced attention features are input into the multi-head attention module, and linear projection is performed through the parameter-shared query, key, and value weight matrix to obtain the shared projection attention features. The shared projected attention features are used to calculate the initial attention output using a multi-head attention mechanism. During training, the contribution of each attention head to the prediction loss is evaluated, the top 70% of attention heads are retained, the remaining attention heads are blocked, and the pruned attention output is obtained. The pruned attention output is residually concatenated with the quantum-enhanced attention feature to obtain the first residual output; The first residual output is subjected to layer normalization to obtain the normalized attention features; The normalized attention features are input into a quantum-enhanced feedforward network for processing to obtain quantum-enhanced feature output. The quantum-enhanced feature output is residually connected to the normalized attention feature to obtain the intermediate feature representation of the current encoder layer output; Before inputting the model's input tensor into the multi-head attention module, a 3D convolutional kernel is first used in a lightweight spatiotemporal encoder to process the data, simultaneously extracting the spatial distribution features and temporal evolution correlation of the temperature field. The size of the 3D convolutional kernel is set to 3×3×3, corresponding to the spatial height, spatial width, and temporal depth dimensions, respectively. Through 3D convolution operations, a convolutional feature map incorporating spatiotemporal information is obtained. Subsequently, an adaptive pooling layer unifies temperature data from different sources or with different resolutions to a fixed size, obtaining a dimensionally unified spatiotemporal feature representation. Addressing the characteristic that temperature data is 3D spatiotemporal data containing spatial height, spatial width, and time steps, the 3D convolutional kernel simultaneously captures the spatial topology and temporal evolution patterns, avoiding the problem of directly flattening the input and destroying the original data structure. The adaptive pooling layer is compatible with the size differences of multi-source heterogeneous data, and the periodic initialization of position encoding injects physical prior knowledge, solving the technical problem of incompatibility between high-dimensional meteorological data and model input, and providing more effective input features for the subsequent quantum attention mechanism.

[0033] The encoder's multi-head attention module employs a parameter-sharing mechanism, where the query, key, and value matrices share the same weight projection parameters. This allows for linear projection of the input spatiotemporal features, yielding shared-projection attention features. Multi-head attention calculations are then performed on these shared-projection features to obtain the initial attention output. During model training, the contribution of each attention head to the temperature prediction loss function is dynamically evaluated. The top 70% of attention heads are retained based on their contribution, while the remaining attention heads with lower contributions are disabled, resulting in a pruned attention output. The pruned attention output is then residually concatenated with the module input, followed by layer normalization to obtain the normalized attention features.

[0034] The quantum-enhanced feedforward network consists of a classical-to-quantum encoding layer, multiple quantum layers, and a quantum-to-classical decoding layer. The classical-to-quantum encoding layer maps classical temperature features to a low-dimensional quantum space through a linear layer. The multiple quantum layers include RX, RY, and RZ parameterized quantum gates and a CNOT entanglement gate. The quantum-to-classical decoding layer maps quantum measurement results back to the classical feature space.

[0035] The process of processing multiple quantum layers includes: The low-dimensional quantum space features of the classical-to-quantum coding layer output are obtained, and the numerical range is limited by the tanh function; Initialize the quantum device and reset the quantum state to create n qubits The quantum state is obtained by resetting the quantum state according to the batch size of the angle-encoded data; The restricted data is encoded into the phase of the quantum state through the RY gate; the angle-encoded data is used as the rotation angle parameter of the RY gate, and the RY gate operation is applied to each qubit to encode the classical data into the phase of the quantum state, thus obtaining the encoded quantum state; The RX, RY, RZ parameterization gates and CNOT entanglement gates are applied alternately in multiple quantum layers, with the rotation angle of the quantum gates initialized according to the typical period of temperature change. Measure the expected value of all qubits and combine them to obtain the quantum output characteristics; A residual connection is introduced between the quantum circuit and the quantum-to-classical decoding layer.

[0036] Specifically, the normalized attention features are input into a quantum-enhanced feedforward network for feature optimization. The quantum-enhanced feedforward network consists of a classical-to-quantum encoding layer, multiple quantum layers, and a quantum-to-classical decoding layer. The classical-to-quantum encoding layer uses linear layers to transfer the input features from the hidden layer dimension. d model Mapping to a low-dimensional quantum space, we obtain a dimension of ( batch size × m , n qubits The low-dimensional quantum space characteristics of ) n qubits Let be the number of qubits. Multiple quantum layers process the characteristics of this low-dimensional quantum space: First, the eigenvalues ​​are restricted to the interval [-π / 2, π / 2] using the tanh function to obtain angle-encoded data; the quantum device is initialized and the quantum state is reset; the angle-encoded data is encoded into the phase of the quantum state using the RY gate. The encoded quantum state is then input into multiple stacked quantum layers for processing. Each quantum layer performs the following operations sequentially: It obtains the preset learnable weight parameters for the current quantum layer, which include the RX, RY, and RZ rotation angles corresponding to each qubit; it calculates the actual rotation angle of each qubit based on the learnable weight parameters and the angle-encoded data; it sequentially applies RX, RY, and RZ gate operations to each qubit to obtain the rotated quantum state; it sequentially applies CNOT entanglement gates between adjacent qubits to create quantum entanglement, obtaining the entangled quantum state; and it inputs the entangled quantum state output from the last quantum layer into the measurement layer to measure the expected value of each qubit, obtaining a quantum state with dimension (π / 2, π / 2). batch siz × seq len ,n qubits The quantum measurement result is then passed as the output of a multi-layered quantum structure to the quantum-to-classical decoding layer. The quantum-to-classical decoding layer maps the quantum measurement result back to the hidden layer dimension via a linear layer. d model The classical feature space representation is restored. A residual connection is introduced between the quantum circuit and the quantum-to-classical decoding layer. The decoding layer output is added to the inputs of multiple quantum layers to obtain the final output of the quantum-enhanced feedforward network. This output is then residually connected to the module inputs and layer normalized to obtain the output tensor of the current encoder layer.

[0037] By compressing high-dimensional temperature features into a low-dimensional quantum space through a classical-to-quantum encoding layer, leveraging the high-dimensional representation capabilities of quantum states to enhance feature expression, and implementing complex nonlinear transformations through parameterized quantum gates and entanglement gates, a complete classical-quantum-classical feature optimization path is constructed through a quantum-to-classical decoding layer to restore quantum features to classical space for subsequent processing. This solves the technical problem of traditional neural networks' inability to effectively handle complex nonlinear relationships in temperature data. Addressing the physical characteristic of temperature data exhibiting typical 24-hour diurnal cycles, domain knowledge is integrated into the initialization process of the quantum circuit, guiding it to learn the intrinsic laws of temperature evolution more quickly. Residual connections effectively combat random disturbances introduced by quantum measurements, stabilizing gradient flow during training. This solves the technical problems of blind initialization of quantum circuit parameters and difficulty in integrating them with temperature data characteristics, improving the model's ability to fit complex temperature change patterns, especially periodic changes.

[0038] After stacking multiple encoder layers, the encoded spatiotemporal feature representation is obtained, which integrates the spatiotemporal structure information of the original temperature data, quantum-enhanced nonlinear features, and long-range dependencies captured by the attention mechanism.

[0039] The encoded spatiotemporal feature representation is input to the output decoding layer. The output decoding layer employs a multilayer perceptron structure to perform the final nonlinear mapping on the encoded features. The number of output nodes in the multilayer perceptron is set according to the specific requirements of the prediction task: for a single-step prediction task, the number of output nodes is 1, corresponding to the temperature value at a single future time point; for a multi-step prediction task, the number of output nodes is... t 2 Corresponding to the future continuous t 2 The temperature value sequence at each time step is calculated. Finally, the output of the multilayer sensor is the temperature prediction for the near future, with the prediction time range set from 0.5 to 2 hours.

[0040] The quantum-enhanced lightweight Transformer model uses mean squared error and mean absolute error as joint loss functions for model training, and applies gradient accumulation and mixed precision training to accelerate model convergence.

[0041] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the described module can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0042] The electronic device of this invention includes a central processing unit (CPU), which can perform various appropriate actions and processes according to computer program instructions stored in read-only memory (ROM) or loaded from a storage unit into random access memory (RAM). The RAM may also store various programs and data required for device operation. The CPU, ROM, and RAM are interconnected via a bus. Input / output (I / O) interfaces are also connected to the bus.

[0043] Multiple components in the device are connected to an I / O interface, including: input units such as a keyboard, mouse, etc.; output units such as various types of displays, speakers, etc.; storage units such as disks, optical disks, etc.; and communication units such as network interface cards, modems, wireless transceivers, etc. The communication unit allows the device to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks. The processing unit performs the various methods and processes described above, such as the method of the present invention. For example, in some embodiments, the method of the present invention may be implemented as a computer software program tangibly contained in a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and / or installed on the device via ROM and / or the communication unit. When the computer program is loaded into RAM and executed by the CPU, one or more steps of the method of the present invention described above may be performed. Alternatively, in other embodiments, the CPU may be configured to execute the method of the present invention by any other suitable means (e.g., by means of firmware).

[0044] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0045] The program code used to implement the methods of the present invention can be written in any combination of one or more programming languages. This program code can be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code can be executed entirely on the machine, partially on the machine, as a standalone software package partially on the machine and partially on a remote machine, or entirely on a remote machine or server.

[0046] In the context of this invention, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0047] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A short-term temperature prediction method based on quantum-enhanced Transformer, characterized in that, The specific steps include: S1. Obtain multi-source historical temperature data, perform data cleaning and normalization on the multi-source historical temperature data, and construct a spatiotemporal dataset for a single temperature mode. S2. Input the temperature data in the spatiotemporal dataset into the quantum-enhanced lightweight Transformer model, and extract features from the input spatiotemporal dataset through the quantum convolutional attention module to obtain the quantum-enhanced attention features; S3. Input the quantum-enhanced attention features into the spatiotemporal encoder in the quantum-enhanced lightweight Transformer model to extract the spatial and temporal features of the temperature data and obtain the encoded spatiotemporal feature representation. S4. Input the encoded spatiotemporal feature representation into the quantum enhancement feedforward network in the quantum-enhanced lightweight Transformer model, and perform feature optimization through the parameterized quantum gate circuits in the quantum enhancement feedforward network to obtain the optimized quantum-enhanced features. S5. Based on the optimized quantum enhancement features, the output layer performs decoding mapping to output the temperature prediction results for the near future.

2. The short-term temperature prediction method based on quantum-enhanced Transformer according to claim 1, characterized in that, In step S1, the specific steps for constructing a spatiotemporal dataset for a single temperature mode include: Collect multi-source temperature data, including temperature observation data from ground meteorological stations, temperature inversion data from satellite remote sensing, and temperature field data from numerical weather prediction. The collected multi-source temperature data were cleaned, and a hybrid interpolation method combining spatial inverse distance weighted interpolation and time cubic spline interpolation was used to complete the missing data. The completed temperature data is then normalized. A fixed-length sliding window is used to sample continuous time series of temperature data to construct sample pairs corresponding to the input sequence and the prediction target, i.e., a spatiotemporal dataset of a single temperature mode.

3. The short-term temperature prediction method based on quantum-enhanced Transformer according to claim 2, characterized in that, In step S2, before inputting the spatiotemporal dataset into the quantum-enhanced lightweight Transformer model, a position information injection operation is performed on the spatiotemporal dataset, specifically including: Obtain sample data from the spatiotemporal dataset, wherein the dimensions of the sample data include sample batch size, sequence length, and feature dimension; The sample data is input into a linear projection layer, and the feature dimensions of the sample data are mapped to the hidden layer of a quantum-enhanced lightweight Transformer model to obtain the projected data. Learnable position encoding vectors are generated for each position in the sequence of sample data, and the dimension of the position encoding vectors matches the feature dimension of the projected data; The projected data and the learnable position encoding vector are added element by element to obtain the model input tensor after injecting position information; The model input tensor is fed into the quantum-enhanced lightweight Transformer model for subsequent feature extraction.

4. The short-term temperature prediction method based on quantum-enhanced Transformer according to claim 1, characterized in that, The specific steps of the quantum convolutional attention module in S2 to perform quantum-enhanced attention features on the spatiotemporal dataset include: Obtain temperature data from the spatiotemporal dataset, with dimensions including batch size, spatial length, spatial width, spatial height, and time step. The temperature data is input into the convolutional layer, and the spatial local features within each time step are extracted to obtain the convolutional feature map; The convolutional feature maps are stacked in the time dimension to maintain the independence of each time step; The stacked feature maps are input into the quantum-enhanced channel attention submodule. The dependencies between feature channels are modeled using parameterized quantum circuits to generate channel attention weights, which are then weighted with the original feature maps along the channel dimension. The channel-weighted feature maps are then input into the quantum-enhanced temporal attention submodule. The importance of different time steps is modeled using parameterized quantum circuits to generate temporal attention weights, which are then weighted with the original feature maps along the temporal dimension. Finally, the feature maps weighted by channel attention and temporal attention are fused to obtain the quantum-enhanced attention features.

5. The short-term temperature prediction method based on quantum-enhanced Transformer according to claim 1, characterized in that, The spatiotemporal encoder in S3 employs a stacked encoder layer structure, using a three-dimensional convolutional kernel to simultaneously extract the spatial features and temporal correlations of the temperature field, and uses an adaptive pooling layer to unify temperature data from different sources to a fixed size. The position encoding used in the spatiotemporal encoder is initialized as a function related to the daily and annual cycles of temperature. The specific steps for processing the quantum-enhanced attention features by a single encoder layer in the spatiotemporal encoder include: The quantum-enhanced attention features are input into the multi-head attention module, and linear projection is performed through the parameter-shared query, key, and value weight matrix to obtain the shared projection attention features. The shared projected attention features are used to calculate the initial attention output using a multi-head attention mechanism. During training, the contribution of each attention head to the prediction loss is evaluated, the top 70% of attention heads are retained, the remaining attention heads are blocked, and the pruned attention output is obtained. The pruned attention output is residually concatenated with the quantum-enhanced attention feature to obtain the first residual output; The first residual output is subjected to layer normalization to obtain the normalized attention features; The normalized attention features are input into a quantum-enhanced feedforward network for processing to obtain quantum-enhanced feature output. The quantum-enhanced feature output is residually connected to the normalized attention feature to obtain the intermediate feature representation of the current encoder layer output; By stacking multiple encoder layers, the intermediate feature representation output by the last encoder layer is the encoded spatiotemporal feature representation in S3.

6. The short-term temperature prediction method based on quantum-enhanced Transformer according to claim 1, characterized in that, The quantum-enhanced feedforward network includes a classical-to-quantum encoding layer, multiple quantum layers, and a quantum-to-classical decoding layer. The classical-to-quantum encoding layer maps classical temperature features to a low-dimensional quantum space through a linear layer. The multiple quantum layers include RX, RY, and RZ parameterized quantum gates and a CNOT entanglement gate. The quantum-to-classical decoding layer maps quantum measurement results back to the classical feature space.

7. A short-term temperature prediction method based on quantum-enhanced Transformer according to claim 6, characterized in that, The processing of the multiple quantum layers includes: The low-dimensional quantum space features of the classical-to-quantum coding layer output are obtained, and the numerical range is limited by the tanh function; Initialize the quantum device and reset the quantum state; The restricted data is encoded into the phase of the quantum state through the RY gate; The RX, RY, RZ parameterization gates and CNOT entanglement gates are applied alternately in multiple quantum layers, with the rotation angle of the quantum gates initialized according to the typical period of temperature change. Measure the expected value of all qubits and combine them to obtain the quantum output characteristics; A residual connection is introduced between the quantum circuit and the quantum-to-classical decoding layer.

8. The short-term temperature prediction method based on quantum-enhanced Transformer according to claim 1, characterized in that, The quantum-enhanced lightweight Transformer model uses mean squared error and mean absolute error as joint loss functions for model training, and applies gradient accumulation and mixed precision training to accelerate model convergence, with a prediction time range of 0.5-2 hours.

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 8.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 8.