Attention mechanism meteorological prediction system based on geographical prior augmentation and differential constraints

By employing a geographic prior enhancement and differential constraint attention mechanism, the physical consistency of the meteorological forecasting model and the prediction accuracy of extreme weather systems are improved. This addresses the shortcomings of existing models in geographic perception and gradient consistency, resulting in more accurate weather forecasts.

CN122242579APending Publication Date: 2026-06-19XIAMEN UNIV MALAYSIA BRANCH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN UNIV MALAYSIA BRANCH
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing meteorological forecasting models based on attention mechanisms lack geographical prior perception, resulting in insufficient physical consistency of the forecast field. Furthermore, the mean square error loss function leads to underestimation of the intensity of extreme weather systems such as severe convection and heavy precipitation, and blurring of their boundaries.

Method used

We employ a geographic prior enhancement and differential constraint attention mechanism, injecting attention computation through latitude weights and anisotropic position biases. Combined with multi-scale feature preservation and differential constraint units, we optimize the model's attention weights and gradient loss function, thereby improving the model's geographic perception and spatial gradient consistency of meteorological data.

🎯Benefits of technology

It significantly improves the accuracy of intensity prediction and boundary clarity of severe weather systems, and the generated forecast field is more in line with the physical laws of atmospheric motion, thus solving the shortcomings of traditional models in extreme weather forecasting.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242579A_ABST
    Figure CN122242579A_ABST
Patent Text Reader

Abstract

This invention relates to the field of meteorological forecasting technology, and more particularly to a meteorological forecasting system based on a geographic prior enhancement and differential constraint attention mechanism. The system includes: a data acquisition module to construct a training dataset; a model construction module to construct a meteorological forecasting model containing geographic prior enhancement attention units, multi-scale feature preservation units, and differential constraint units; a geographic prior attention calculation module to obtain geographically sensed encoded features by injecting latitudinal weights and latitude-longitude anisotropy biases; a multi-scale feature fusion module to perform multi-scale downsampling of historical data and gating fusion with the encoded features to obtain decoded features; a prediction field generation module to generate a predicted meteorological field based on the decoded features; a model training module to construct a total loss function and iteratively train the model using content loss and spatial gradient loss; and a real-time prediction module to input real-time data into the trained model to generate meteorological forecast results. This invention improves the intensity prediction accuracy and physical consistency of extreme weather systems through geographic prior injection and gradient constraints.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of weather forecasting technology, and in particular to a weather forecasting system based on an attention mechanism with geographical prior enhancement and differential constraints. Background Technology

[0002] With the rapid development of deep learning technology, attention-based neural network models have shown great potential in the field of weather forecasting. These models, by learning the spatiotemporal evolution patterns in massive amounts of historical meteorological data, can achieve more efficient computation and competitive forecast accuracy than traditional numerical forecasting. In particular, Transformer models and their variants, with their powerful long-range dependency modeling capabilities, have been widely applied to tasks such as global medium-range weather forecasting and short-term precipitation forecasting, becoming an important research direction in the field of meteorological artificial intelligence.

[0003] However, existing attention-based meteorological forecasting models still face key technical challenges in practical applications. These models typically use mean squared error as the loss function during training, which tends to learn the "average" results of training samples. This leads to underestimation of the intensity and blurred boundaries of forecast fields for extreme weather systems such as severe convection and heavy precipitation. Simultaneously, standard attention mechanisms lack the ability to perceive the geographical characteristics of meteorological data, failing to distinguish the physical differences between different latitudinal zones and the anisotropic characteristics of information propagation in latitude and longitude directions. This results in insufficient physical consistency in the forecast fields generated by the models, making it difficult to meet the operational requirements for accurate forecasting of high-impact weather. Summary of the Invention

[0004] To overcome the above shortcomings, this invention provides a meteorological forecasting system based on geographic prior enhancement and differential constraints based on attention mechanism. It aims to improve the existing meteorological forecasting models based on attention mechanism, which suffer from insufficient physical consistency of forecast fields due to lack of geographic prior perception, and underestimate the intensity prediction and boundary ambiguity of extreme weather systems such as strong convection and heavy precipitation due to the inherent smoothing tendency of the mean square error loss function.

[0005] This invention provides the following technical solution: a meteorological forecasting system based on geographic prior enhancement and differential constraints based on an attention mechanism, comprising the following modules: The data acquisition module is used to acquire historical meteorological grid data and corresponding ground truth data to build a training dataset; The model building module is used to build a meteorological prediction model, which includes a geographic prior-enhanced attention unit, a multi-scale feature preservation unit, and a differential constraint unit. The geographic prior attention calculation module is used to input historical meteorological grid data from the training dataset into the meteorological prediction model, calculate attention weights carrying geographic prior information through the geographic prior enhanced attention unit, and perform weighted aggregation of input features based on the attention weights to obtain geographic-aware encoded features. The multi-scale feature fusion module is used to perform multi-scale downsampling on the historical meteorological grid data through the multi-scale feature preservation unit to obtain auxiliary road features at different resolutions, and to perform gated fusion of the auxiliary road features with the geographic perception coding features at different scales to obtain multi-scale enhanced decoding features. The prediction field generation module is used to generate the predicted weather field for the current training period based on the decoded features; The model training module is used to calculate the content loss and spatial gradient loss between the predicted meteorological field and the true data through the differential constraint unit, and to construct a total loss function based on the content loss and the spatial gradient loss. The parameters of the meteorological prediction model are iteratively updated according to the total loss function until the model converges, and a trained meteorological prediction model is obtained. The real-time prediction module is used to acquire real-time meteorological grid data, input the real-time meteorological grid data into the trained meteorological prediction model, and generate meteorological prediction results for the target forecast period through forward propagation of the geographically enhanced attention unit, the multi-scale feature preservation unit and the differential constraint unit.

[0006] Preferably, the data acquisition module is specifically configured as follows: Acquire meteorological observation data from multiple sources within a historical time period; The meteorological observation data were subjected to quality control, outliers were removed and missing values ​​were imputed; The meteorological observation data that has undergone quality control is uniformly interpolated onto a preset target spatial grid and time-aligned to obtain standardized historical meteorological grid data; Obtain historical true value data within the same spatiotemporal range as the historical meteorological grid data; The standardized historical meteorological grid data and the historical ground truth data are paired according to their spatiotemporal correspondence to construct a training dataset consisting of sample pairs.

[0007] Preferably, the model building module is specifically configured as follows: A geographic prior-enhanced attention unit is constructed, comprising a latitude weight generation subunit, an anisotropic position bias subunit, and an attention calculation subunit. The latitude weight generation subunit generates a latitude weight field based on grid latitudes. The anisotropic position bias subunit generates a learnable position bias carrying meridional and latitudinal propagation coefficients. The attention calculation subunit calculates attention weights based on the latitude weight field and the learnable position bias. A multi-scale feature preservation unit is constructed, comprising a main path encoding subunit, an auxiliary path downsampling subunit, and a gated fusion subunit. The main path encoding subunit is used to perform deep feature extraction on the input data. The auxiliary path downsampling subunit is used to perform multi-scale pooling on the input data to generate auxiliary path features of different resolutions. The gated fusion subunit is used to perform gated fusion of the auxiliary path features with the features output by the main path encoding subunit at different scales. A differential constraint unit is constructed, which includes a gradient calculation subunit and a loss construction subunit; the gradient calculation subunit is used to calculate the gradient fields of the predicted meteorological field and the true data in the spatial meridional and latitudinal directions, and the loss construction subunit is used to construct a spatial gradient loss function based on the gradient fields; The geographically enhanced attention unit, the multi-scale feature preservation unit, and the differential constraint unit are connected to form a complete meteorological prediction model architecture.

[0008] Preferably, the geographic prior attention calculation module is specifically configured as follows: The historical meteorological grid data is input into the geographic prior-enhanced attention unit to extract the initial feature tensor; A dimensional weight field is generated through the dimensional weight generation subunit, and a learnable position bias tensor is generated through the anisotropic position bias subunit. The initial feature tensor is projected into a query tensor, a key tensor, and a value tensor. The query tensor and the key tensor are then multiplied by a matrix and divided by a scaling factor to obtain the initial attention score tensor. The latitude weight field is broadcast and multiplied with the learnable location bias tensor to obtain the geographic prior modulation bias, which is then added to the initial attention score tensor. The attention score tensor after addition is normalized by softmax to obtain the attention weights for geographic prior enhancement; The attention weights and the value tensor are weighted and summed to obtain the encoded features of geographic awareness.

[0009] Preferably, the multi-scale feature fusion module is specifically configured as follows: The historical meteorological grid data is used as both the main road input and the auxiliary road input. Deep feature extraction is performed through the main path coding subunit to generate main path coding features at different scales; multi-scale pooling is performed through the auxiliary path downsampling subunit to generate multi-scale auxiliary path features with the same resolution as the main path coding features at each scale. For the deepest scale, the auxiliary road features of the corresponding scale are aligned with the encoded features of the geographic perception and then concatenated. The gating fusion subunit generates gating weights and performs weighted fusion to obtain the fusion features of the current scale. For the second-deep scale, the fused features of the deepest scale are upsampled and added to the main path coding features of the second-deep scale, and then concatenated with the aligned auxiliary path features of the second-deep scale. The gating fusion subunit generates gating weights and performs weighted fusion to obtain the fused features of the second-deep scale. Following the order from deep to shallow, the fused features of the previous scale are upsampled and added to the main path coding features of the current scale, and then concatenated with the aligned auxiliary path features of the current scale. Weighted fusion is performed through the gated fusion subunit until feature fusion of all scales is completed. The final fused features are output as the decoding features for multi-scale enhancement.

[0010] Preferably, the prediction field generation module is specifically configured as follows: Obtain the multi-scale enhanced decoding features output by the multi-scale feature fusion module; The decoded features are input into the prediction output layer, and convolution operations are performed through one or more convolutional layers to map the features to the channel space of the target meteorological elements, thereby obtaining the initial prediction field. The initial prediction field is post-processed to constrain the output values ​​to a physically reasonable range for meteorological elements; The post-processed initial prediction field is output as the prediction weather field for the current training cycle.

[0011] Preferably, the model training module is specifically configured as follows: Obtain the predicted weather field and corresponding ground truth data for the current training period; The content loss between the predicted weather field and the true data is calculated using the loss construction sub-unit. The gradient calculation subunit calculates the gradient fields of the predicted meteorological field and the true data in the spatial meridional and latitudinal directions, respectively. After normalizing the dimensions of each gradient field, the meridional gradient loss and the latitudinal gradient loss are calculated and summed to obtain the spatial gradient loss. The content loss and the spatial gradient loss are weighted and summed according to a preset weight ratio to construct the total loss function; The parameters of the weather prediction model are iteratively updated using the backpropagation algorithm based on the total loss function until the model converges, thus obtaining a trained weather prediction model.

[0012] Preferably, the real-time prediction module is specifically configured as follows: Obtain real-time meteorological observation data at the current moment; The real-time meteorological observation data is subjected to quality control, outliers are removed and missing values ​​are imputed; The quality-controlled real-time meteorological observation data is uniformly interpolated onto a preset target spatial grid and time-aligned to obtain standardized real-time meteorological grid data. The standardized real-time meteorological grid data is input into the trained meteorological prediction model; The real-time meteorological grid data is processed by the geographic prior enhanced attention unit to obtain the coding features of real-time geographic perception. The real-time auxiliary road features are obtained by multi-scale downsampling of the real-time meteorological grid data through the multi-scale feature preservation unit, and the real-time auxiliary road features are gated and fused with the real-time geographic perception coding features at different scales to obtain real-time multi-scale enhanced decoding features. The real-time multi-scale enhanced decoding features are convolved and mapped to the channel space of the target meteorological elements by the prediction output layer to obtain the initial real-time prediction field. The initial real-time forecast field is post-processed to constrain the output value to a physically reasonable range of meteorological elements, and the output value is used as the meteorological forecast result for the target forecast period.

[0013] The present invention has the following beneficial effects: 1. In this invention, a geographically enhanced attention unit injects latitudinal weight fields and anisotropic positional biases into the attention calculation process, enabling the model to perceive the physical differences between different latitude zones and the anisotropic characteristics of latitude and longitude information propagation. Compared with standard attention mechanisms, the attention weights of this invention carry explicit geographic prior information, effectively solving the physical inconsistency problem caused by the lack of geographic orientation in traditional models, and making the generated forecast field more consistent with the physical laws of atmospheric motion.

[0014] 2. In this invention, a spatial gradient loss is introduced through a differential constraint unit, which simultaneously constrains the numerical consistency and spatial gradient consistency between the predicted and true values ​​during training. Compared with existing techniques that only use mean squared error loss, the gradient loss of this invention directly optimizes the most essential intensity gradient characteristics of weather systems, effectively suppressing the model's tendency to average extreme weather systems such as strong convection and heavy precipitation, and significantly improving the intensity prediction accuracy and boundary clarity of severe weather systems. Attached Figure Description

[0015] Figure 1 This is a schematic diagram of the architecture of the meteorological forecasting system based on the attention mechanism of geographical prior enhancement and differential constraints proposed in this invention. Detailed Implementation

[0016] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0017] This invention provides a weather forecasting system based on an attention mechanism using geographical prior enhancement and differential constraints, such as... Figure 1 As shown, it includes the following modules: The data acquisition module is used to acquire historical meteorological grid data and corresponding ground truth data to build a training dataset.

[0018] Specifically, firstly, meteorological observation data from multiple sources within a historical time period is acquired, including data from ground-based meteorological stations, weather radar observations, meteorological satellite remote sensing data, and numerical model reanalysis data. Ground-based observation data provides hourly data on temperature, air pressure, humidity, wind speed, wind direction, and precipitation; radar data provides reflectivity factors every 6 minutes; satellite data provides cloud image brightness and temperature data every 15 minutes; and reanalysis data such as ERA5 provides hourly gridded meteorological element fields as the background field.

[0019] Secondly, quality control was implemented for various types of observation data. Climate extreme value checks were used to remove outliers exceeding reasonable ranges, such as records with temperatures below -90 degrees Celsius or above 60 degrees Celsius. Consistency checks were used to remove isolated echoes that differed significantly from surrounding spatiotemporal points. For missing values, spatiotemporal interpolation methods, such as linear interpolation or interpolation from nearby stations, were used for imputation.

[0020] Then, the quality-controlled data are uniformly interpolated onto a preset target spatial grid and time-aligned. The target grid uses an equal latitude and longitude grid with a resolution of 0.25 degrees × 0.25 degrees, covering a range of 70 degrees east longitude to 140 degrees east longitude and 0 degrees north latitude to 60 degrees north latitude. Bilinear interpolation is used for continuous variables, and nearest neighbor interpolation is used for discrete variables. All data are unified to hourly frequencies; high-resolution data is transformed through time aggregation, and low-resolution data is transformed through time interpolation. After the above processing, standardized historical meteorological grid data X is obtained, with dimensions T × C × H × W, where T is the number of time steps, C is the number of meteorological element channels, H = 240, and W = 560.

[0021] In this embodiment, the general parameters are set as follows: input time step L=12, forecast lead time K=24, number of meteorological element channels C=5, and feature dimension D=64. The following modules are all described based on these parameters.

[0022] Next, historical ground truth data Y, which is in the same spatiotemporal range as historical meteorological grid data, is obtained, with the same dimension as X. Ground truth data sources include measured data from ground meteorological observation stations, radar quantitative precipitation estimation products, satellite inversion products, and high-resolution reanalysis data. For elements such as temperature, air pressure, and wind speed, spatially interpolated station observation data are used as ground truth; for precipitation elements, radar-rain gauge fusion quantitative precipitation estimation products are used as ground truth.

[0023] Finally, standardized historical meteorological grid data and historical ground truth data are paired according to their spatiotemporal correspondence to construct a training dataset consisting of sample pairs. Each sample pair consists of model input data and supervision label data. For meteorological forecasting tasks, the typical approach is to "predict the meteorological field for the next K times given the meteorological field at L past times". Therefore, sample pairs are truncated from historical data: for time indices t from L to TK, input samples are constructed. The dimensions are L×C×H×W; construct the output labels. The dimensions are K×C×H×W. The input sample and output label constitute a sample pair. .

[0024] Taking this embodiment as an example, L=12 represents using the weather field of the past 12 hours as input, and K=24 represents predicting the weather field of the next 24 hours. Therefore, the dimension of each input sample is 12×5×240×560, and the dimension of each output label is 24×5×240×560. Using a sliding window approach, a total of 87600-12-24+1=87565 sample pairs can be obtained from 87600 hours of historical data. These sample pairs are divided into training, validation, and test sets. For example, the first 80% is used as the training set, the middle 10% as the validation set, and the last 10% as the test set, in chronological order, for subsequent model training, optimization, and performance evaluation.

[0025] The model building module is used to build a weather forecasting model, which includes a geographic prior-enhanced attention unit, a multi-scale feature preservation unit, and a differential constraint unit.

[0026] Specifically, firstly, a geographic prior-enhanced attention unit is constructed. This geographic prior-enhanced attention unit includes a latitude weight generation subunit, an anisotropic position bias subunit, and an attention calculation subunit. The latitude weight generation subunit is used to generate a latitude weight field based on the grid latitude. Let H be the number of grid points in the height direction of the target spatial grid, and let lat[i] be the corresponding latitude value array, i=1,2,...,H. Latitude weight field The generation method is as follows: Based on prior meteorological knowledge, the mid-latitude region (30° to 60°) is the area with the highest concentration of baroclinic instability energy and should be assigned a higher weight. Latitude weight field The calculation formula is: ; in, This is a learnable parameter, initially set to 0. The sigmoid activation function constrains the output to between 0 and 1. The preset mid-latitude enhancement factor ranges from 0.5 to 2.0; in this embodiment, it is set to... =1.0; I(·) is an indicator function, taking a value of 1 when the latitude meets the mid-latitude condition, and 0 otherwise. Through this design, the latitude weight field incorporates both fixed physical priors and allows the model to utilize learnable parameters. Make minor adjustments.

[0027] An anisotropic position bias sub-unit is used to generate a learnable position bias carrying meridional and zonal propagation coefficients. This sub-unit generates a learnable position bias tensor. Its dimension matches the attention score tensor. Let the feature map space size for attention computation be H×W, and the number of attention heads be... .but The dimension is ×H×W. The generation method is as follows: First, initialize a basic bias tensor. , dimension ×H×W, all elements are initialized to 0. Simultaneously, learnable meridional propagation coefficients are set for each attention head. and zonal propagation coefficient h=1,2,..., Then, anisotropic biases are generated based on the mesh coordinates: ; Where i is the height index and j is the width index. Meridional propagation coefficient. Controlling the information transmission intensity along the meridian direction, zonal propagation coefficient The intensity of information transmission along the latitude circle is controlled. Both are learnable parameters, initialized to 0. Through this design, the model can learn the differences in information propagation along the latitude and longitude directions.

[0028] The attention calculation subunit is used to calculate attention weights based on the latitude weight field and learnable location bias. The specific calculation process is described in the geographic prior attention calculation module.

[0029] Secondly, a multi-scale feature preservation unit is constructed, including a main-path encoding subunit, an auxiliary-path downsampling subunit, and a gated fusion subunit. The main-path encoding subunit is used for deep feature extraction from the input data and is constructed using a convolutional neural network, containing N=4 encoding blocks. Each encoding block consists of two 3×3 convolutional layers, a batch normalization layer, a ReLU activation function, and a 2×2 max-pooling layer with a stride of 2. After the nth encoding block, the feature map resolution is reduced to 1 / The number of channels has been increased to the initial number of 64. The main path coding features at four scales were obtained by multiplying the values ​​by 1: F1 (64×120×280), F2 (128×60×140), F3 (256×30×70), and F4 (512×15×35). The auxiliary path downsampling subunit applied average pooling to the original input data (C=5), with both the pooling kernel size and stride being 1. Auxiliary features S1~S4 with the same resolution as the main path coding features at each scale are generated, with the number of channels maintained at 5, to preserve the structural skeleton of the original data. The specific fusion process of the gated fusion subunit is described in the multi-scale feature fusion module.

[0030] The gated fusion subunit generates gate weights using the sigmoid activation function, enabling learnable weighted fusion of auxiliary and primary path features. It dynamically determines the degree of trust between the two types of features at the current spatial location. The specific fusion process is described in the multi-scale feature fusion module. The differential constraint unit includes a gradient calculation subunit and a loss construction subunit. The gradient calculation subunit uses the central difference method to calculate the gradient fields of the predicted meteorological field and the ground truth data in the spatial meridional and latitudinal directions, which are used for subsequent gradient loss construction. ; ; Where Δx and Δy are the grid spacing in the meridional and latitudinal directions, respectively, determined according to the resolution of the target space grid. In this embodiment, the grid resolution is 0.25 degrees, corresponding to a meridional distance of approximately 27.8 kilometers. The latitudinal distance varies with latitude, but since gradient calculations are usually performed in units of grid points, Δx = Δy = 1 can be taken. For boundary grid points, forward or backward difference calculations are used.

[0031] The specific calculation process of the loss construction sub-unit is described in the model training module. The geographic prior-enhanced attention unit, multi-scale feature preservation unit, and differential constraint unit are connected to form a complete meteorological prediction model architecture. In this embodiment, the overall model architecture is as follows: The input layer receives historical meteorological grid data with dimensions L×C×H×W (L=12). The input data is processed by an initial convolutional layer to map the number of channels from C to the feature dimension D=64, resulting in an initial feature tensor with dimensions D×H×W. This tensor is input to the geographic prior-enhanced attention unit, outputting geographic-aware encoded features with dimensions preserved at D×H×W. The geographic-aware encoded features are input to the main path encoding sub-unit, while the original input data is input to the auxiliary path downsampling sub-unit. Both are gated and fused in the multi-scale feature fusion module, outputting multi-scale-enhanced decoded features. The decoded features are input to the prediction output layer, convolved and mapped to the target meteorological element channel space to generate a predicted meteorological field with dimensions K×C×H×W (K=24). During the training phase, the differential constraint unit calculates the content loss and spatial gradient loss based on the predicted weather field and ground truth data, and constructs the total loss function for updating model parameters.

[0032] The geographic prior attention calculation module is used to input historical meteorological grid data from the training dataset into the meteorological prediction model. It calculates attention weights carrying geographic prior information through geographic prior-enhanced attention units, and performs weighted aggregation of input features based on attention weights to obtain geographic-aware encoded features.

[0033] Specifically, firstly, historical meteorological grid data is input into a geographic prior-enhanced attention unit with dimensions L×C×H×W. In this embodiment, L=12, C=5, H=240, and W=560. This data is first processed through an initial convolutional layer for feature embedding, mapping the number of channels from C to a feature dimension D=64, resulting in an initial feature tensor Z with dimensions D×H×W. Secondly, a latitude weight field is generated through a latitude weight generation subunit. Let the latitude array be lat[i], i=1,2,...,H, corresponding to 240 grid points from 0°N to 60°N. Latitude weight field The formula for calculating [i] is: ; in, For learnable parameters, each dimensional grid point corresponds to one learnable parameter, and the initial value is set to 0; It is the sigmoid activation function. (x)=1 / (1+ The output value is constrained to be between 0 and 1; β is a preset mid-latitude enhancement coefficient, which is taken as in this embodiment. =1.0; I(·) is an indicator function, taking a value of 1 when the latitude meets the mid-latitude condition, and 0 otherwise. Using this formula, the weight of a latitude region in the latitude weight field is twice the base weight, and can be fine-tuned using the learnable parameter α_i. The dimension of the latitude weight field W_lat is H×1, indicating that each latitude row shares the same weight value.

[0034] Anisotropic position bias subunits generate learnable position bias tensors Let the number of attention heads be . In this embodiment, we take =8. Set a learnable meridional propagation coefficient for each attention head. and zonal propagation coefficient h = 1, 2, ..., 8, all initialized to 0. A basic bias tensor is also initialized. The dimension is N_head×H×W, and all elements are initialized to 0. Then the position bias tensor... The calculation formula is: ; Where i is the height index and j is the width index. This design allows the position offset to vary linearly with spatial location, and the rates of change in the longitudinal and latitudinal directions are independently controlled by learnable parameters, enabling the model to learn the differences in the propagation of longitudinal and latitudinal information. The dimension is ×H×W.

[0035] Then, the initial feature tensor is projected into a query tensor, a key tensor, and a value tensor. The initial feature tensor Z is projected through three independent linear projection layers. Let the feature dimension of each attention head be . =D / In this embodiment =64 / 8=8. The query projection layer uses a 1×1 convolution with D input channels and D output channels to obtain the query feature Q', with dimensions D×H×W. Q' is then reshaped into... × ×H×W, then flatten it according to the spatial dimensions. × ×N, where N=H×W is the total number of spatial locations, to obtain the query tensor Q, with dimension _____. × ×N. Similarly, the bond tensor K and value tensor V are obtained through the bond projection layer and value projection layer, respectively, both with dimensions of ×N. × ×N.

[0036] Next, the query tensor and key tensor are multiplied by a matrix and then divided by a scaling factor to obtain the initial attention score tensor. For each attention head h, the matrix multiplication of the query tensor Q_h and the transpose of the key tensor K_h is calculated to obtain the attention score matrix. : ; in The dimension is d_k×N. Dimensions ×N, the transpose dimension of K_h is N× After multiplication The dimension is N×N. Then... Divide by scaling factor To prevent the softmax gradient from vanishing due to an excessively large inner product, an initial attention fraction tensor is obtained. : ; Stack the initial attention score tensors of all attention heads to obtain , dimension ×N×N. To facilitate the subsequent addition of geographical prior modulation bias, the following will be used: Remodeling ×H×W×H×W represents the attention score of each query position (i,j) for all key positions (p,q).

[0037] Then, the latitudinal weight field is broadcast and multiplied by the learnable location bias tensor to obtain the geographic prior modulation bias, which is then added to the initial attention score tensor. Latitude weight field The dimension is H×1, and it needs to be broadcast to the same dimension as the attention score tensor. Specifically, it will... Copy W times along the width direction to obtain The dimension is H×W; then broadcast it to ×H×W×H×W, where for each query position (i,j), its modulation bias with respect to all key positions (p,q) is multiplied by the same dimensional weight. [i]. Position bias tensor The dimension is ×H×W represents the bias value of each attention head at each query position. To apply this to the attention score, it needs to be expanded to have the same bias for each key position. Therefore, ... Expanded dimension is ×H×W×1×1, then broadcast to ×H×W×H×W means that for each query position (i,j), the same bias is added to all key positions (p,q). [h,i,j]. The formula for calculating the geographical prior modulation bias M is: ; That is, for each attention head h and each query position (i,j), its dimension weights [i,j] and position offset Multiplying [h, i, j] yields a scalar value, which is added to the attention score of this query location for all key locations. The geographic prior modulation bias M is then added element-wise to the initial attention score tensor. Above, we obtain the geographically enhanced attention fractional tensor. : ; Through this additive operation, latitude weights and location biases are injected into the attention score as modulated biases, thus incorporating explicit geographic prior information into the attention calculation process. Next, the attention score tensor is softmax normalized to obtain the geographic prior-enhanced attention weights. Reshaping ×N×N, for each attention head h and each query position n, perform softmax normalization on its attention scores across all key positions: ; The softmax function is defined as follows: Normalized Dimensions ×N×N represents the attention weights enhanced by geographical priors. These weights incorporate both data-driven attention relationships and physical priors related to latitude weights and latitude-longitude anisotropy.

[0038] Finally, the attention weights are summed with the value tensor to obtain the encoded features of geographic awareness. For each attention head h, the attention weights are... AND-value tensor Perform matrix multiplication: ; in Dimensions ×N, The dimension is N×N, its transpose is N×N, and after multiplication... Dimensions ×N. Concatenate the outputs of all attention heads along the feature dimension to obtain... The dimension is D×N. [The following is a list of dimensions, likely related to a given context:] Reconstructing it into D×H×W yields the encoded features of geographic perception. Each spatial location in this coding feature aggregates information from other locations, and the aggregation weights are modulated by geographical priors. Therefore, modeling the spatial dependencies of meteorological data is more in line with physical laws.

[0039] The multi-scale feature fusion module is used to perform multi-scale downsampling on historical meteorological grid data through the multi-scale feature preservation unit to obtain auxiliary road features at different resolutions. The auxiliary road features are then gated and fused with the geographic perception coding features at different scales to obtain multi-scale enhanced decoding features.

[0040] Specifically, firstly, historical meteorological grid data is used as both the main path input and the auxiliary path input. The initial feature tensor after the initial convolutional layer, with dimensions D×H×W (D=64, H=240, W=560 in this embodiment), serves as the input to the main path coding subunit; the original multi-channel data, with dimensions C×H×W (C=5), serves as the input to the auxiliary path downsampling subunit. The main path coding subunit contains N=4 coding blocks, each consisting of a convolutional layer, a batch normalization layer, a ReLU activation function, and a max-pooling layer with a stride of 2. After the nth coding block, the feature map resolution is reduced to... The number of channels becomes the initial number of channels D multiplied by This yields the main path coding features at four scales: F1: D×H / 2×W / 2, i.e., 64×120×280; F2: 2D×H / 4×W / 4, i.e., 128×60×140; F3: 4D×H / 8×W / 8, i.e., 256×30×70; F4: 8D×H / 16×W / 16, i.e., 512×15×35. The auxiliary path downsampling subunits employ average pooling, with both the pooling kernel size and stride being [missing information]. Auxiliary features S1~S4 with the same resolution as the main path coding features at each scale are generated, and the number of channels is kept at C=5.

[0041] The auxiliary path downsampling subunit samples the raw input data. The dimensions are C×H×W, and average pooling is used to generate auxiliary path features at different resolutions. The pooling kernel size and stride are consistent with the downsampling factor of the main path encoding subunit. For scale n, a kernel size of [missing value] is used. Step size is Average pooling is used to obtain the auxiliary path features. Its resolution and main path coding features The same applies, with the number of channels remaining constant at C. In this embodiment: the first-scale auxiliary path feature S1 is obtained through average pooling with a kernel size of 2 and a stride of 2, with a dimension of C×H / 2×W / 2, i.e., 5×120×280; the second-scale auxiliary path feature S2 is obtained through average pooling with a kernel size of 4 and a stride of 4, with a dimension of 5×60×140; the third-scale auxiliary path feature S3 is obtained through average pooling with a kernel size of 8 and a stride of 8, with a dimension of 5×30×70; and the fourth-scale auxiliary path feature S4 is obtained through average pooling with a kernel size of 16 and a stride of 16, with a dimension of 5×15×35.

[0042] Next, for the deepest scale, i.e., the fourth scale, the auxiliary path features of the corresponding scale are aligned with the encoded features of geographic perception and then concatenated. Gated weights are generated through a gated fusion subunit and weighted fusion is performed to obtain the fused features at the current scale. (Geographic perception encoded features) The output from the geographic prior attention calculation module has dimensions of D×H×W, i.e., 64×240×560. The dimension of the deepest-scale auxiliary path feature S4 is 5×15×35. First, the geographic-aware encoded features need to be downsampled to the same resolution as S4. Using the same downsampling method as the main path encoding subunit, the geographic-aware encoded features are... Perform four consecutive max pooling operations with a step size of 2 to obtain the downsampled geographic perception features. 4. The dimensions are 64×15×35.

[0043] Then S4 and S4 are concatenated along the channel dimension. Since S4 has 5 channels, The number of channels in step 4 is 64. After concatenation, the resulting concatenated feature C4 has dimensions of (64+5)×15×35, or 69×15×35. This concatenated feature C4 is then input into a gated fusion subunit. The gated fusion subunit first compresses the number of channels from 69 to 64 using a 3×3 convolutional layer, and then generates gate weights G4 using a sigmoid activation function: G4=σ(Conv3×3(C4)). Here, σ is the sigmoid function, Conv3×3 represents a 3×3 convolution, and the output number of channels is 64. G4 has dimensions of 64×15×35, with each channel's value at each spatial location ranging from 0 to 1, indicating the degree to which the main path feature should be trusted at that location.

[0044] Based on the gating weight G4 S4 and S4 are weighted and fused. Since S4 has 5 channels, it needs to be mapped from 5 to 64 channels using a 1×1 convolution to obtain S4_proj, with dimensions of 64×15×35. Then, weighted fusion is performed: ; in This indicates element-wise multiplication. Fusion4 is the fourth-scale fusion feature, with dimensions of 64×15×35.

[0045] Then, for other scales, the fused features from the previous scale are upsampled and added to the main path coding features of the current scale, then concatenated with the aligned auxiliary path features of the current scale, and weighted fusion is performed through a gated fusion subunit. Taking the third scale as an example, the fused features from the fourth scale (Fusion4) are first upsampled to the resolution of the third scale. Bilinear interpolation upsampling is used, with an upsampling factor of 2, to obtain... The dimensions are 64×30×70. Adding it to the third-scale main path feature F3 yields the third-scale enhanced feature E3: ; The dimensions of F3 are 256×30×70, while The dimensions are 64×30×70, and the number of channels in both are mismatched. Therefore, in actual implementation, it is necessary to... Channel projection is performed, mapping the number of channels from 64 to 256 using a 1×1 convolution, and then the channels are summed. For simplicity, E3 is used to represent the channel-aligned result.

[0046] Simultaneously, the third-scale auxiliary path feature S3 is mapped from 5 to 256 channels using a 1×1 convolution, resulting in... The dimensions are 256×30×70. Then E3 and... The concatenation is performed along the channel dimension to obtain the concatenated feature C3, with dimensions of 512×30×70. C3 is then input into the gated fusion subunit, where a 3×3 convolution compresses the number of channels from 512 to 256. A sigmoid function is then used to generate the gate weights G3, with dimensions of 256×30×70. Weighted fusion is then performed. ; The third-scale fusion feature, Fusion3, with dimensions of 256×30×70, is obtained. The second and first scales are processed in the same way. For the second scale: Fusion3 is upsampled by a factor of 2. Adding the second-scale main road feature F2 to obtain E2; projecting the second-scale auxiliary road feature S2 to obtain... splicing E2 and Fusion2 was then obtained through gated fusion, with dimensions of 128×60×140. For the first scale: Fusion2 was upsampled by a factor of 2 to obtain... Adding the first-scale main road feature F1 to obtain E1; projecting the first-scale auxiliary road feature S1 to obtain... splicing E1 and Fusion1 was then obtained through gating fusion, with dimensions of 64×120×280.

[0047] Finally, the first-scale fused features are output as the multi-scale enhanced decoding features. This Fusion1 decoding feature has dimensions of 64×120×280, compared to the original geographic perception coding features. Its spatial resolution was reduced from 240×560 to 120×280, but it integrated information from multi-scale main road features and auxiliary road features, which contained both deep semantics and preserved the structural skeleton of the original data.

[0048] The prediction field generation module is used to generate the predicted weather field for the current training period based on the decoded features.

[0049] Specifically, firstly, the multi-scale enhanced decoding features output by the multi-scale feature fusion module are obtained, namely the first-scale fusion feature Fusion1, whose dimensions are D×H / 2×W / 2. In this embodiment, D=64, H=240, and W=560, so the dimensions of the decoding feature F_dec are 64×120×280. Secondly, the dimensions of the decoding feature F_dec are... Input prediction output layer. The prediction output layer adopts a three-layer convolutional structure: the first layer is a 3×3 convolution, increasing the number of channels from 64 to 128; the second layer is a 3×3 convolution, increasing the number of channels from 128 to 256; the third layer is a 3×3 convolution, with the number of output channels being K×C=24×5=120, mapping the features to the target meteorological element channel space to obtain the initial prediction field.

[0050] After the above three convolutional layers, the initial prediction tensor is obtained. Its dimensions are (K×C)×H / 2×W / 2, which is 120×120×280. Since the resolution of the original meteorological grid is H×W, while the resolution of the decoded features is H / 2×W / 2, it is necessary to... Upsampling is performed to restore the original resolution. Bilinear interpolation upsampling with an upsampling factor of 2 is used to obtain the upsampled prediction tensor. Its dimensions are (K×C)×H×W, which is 120×240×560. Then... Reshaping it into K×C×H×W, i.e., 24×5×240×560, yields the initial prediction field. The initial forecast field contains the hourly forecasts of five meteorological elements over the next 24 hours on a 240×560 grid.

[0051] Next, post-processing is performed on the initial forecast field to constrain the output values ​​to a physically reasonable range for the meteorological elements. The purpose of post-processing is to ensure that the model's output conforms to the basic physical laws of meteorology, avoiding unrealistic forecasts. Different physical constraint ranges are set for different meteorological elements: for 2-meter air temperature (T2M), the physically reasonable range is set to 200K to 330K. The post-processing operation is as follows: The `clip` function truncates the predicted values ​​to a specified range; for Sea Level Pressure (SLP), the physically reasonable range is set to 870 hPa to 1100 hPa. Post-processing operations are as follows: For a wind speed of 10 meters per second (W10), the physically reasonable range is set to 0 m / s to 50 m / s. Post-processing operations are as follows: For relative humidity (RH), the physically reasonable range is set to 0% to 100%. Post-processing steps are as follows: For Precipitation Reduction (PRE), the physically acceptable range is set to non-negative values, and there is usually an upper limit, such as 0 mm to 500 mm. Post-processing is as follows: In addition, for some meteorological elements, additional physical consistency constraints can be imposed. For example, ensuring that relative humidity meets basic physical relationships with temperature and air pressure, but such constraints are usually reflected in the loss function, and only univariate range constraints are applied in the post-processing stage.

[0052] Finally, the post-processed initial prediction field is used as the output of the predicted weather field for the current training cycle. The dimensions are K×C×H×W, or 24×5×240×560, containing the hourly predicted values ​​of five meteorological elements for each grid point in the target area for the next 24 hours. This predicted meteorological field will serve as input to the model training module, used to calculate the loss function between the predicted and ground truth data, and to guide the updating of model parameters.

[0053] The model training module is used to calculate the content loss and spatial gradient loss between the predicted meteorological field and the ground truth data through differential constraint units, and to construct a total loss function based on the content loss and spatial gradient loss. The parameters of the meteorological prediction model are iteratively updated according to the total loss function until the model converges, and the trained meteorological prediction model is obtained.

[0054] Specifically, first, obtain the predicted weather field and corresponding ground truth data for the current training period. Predicted Weather Field The output from the prediction field generation module has dimensions K×C×H×W, where in this embodiment K=24, C=5, H=240, and W=560. The ground truth data Y comes from the training dataset constructed by the data acquisition module, and has the same dimensions K×C×H×W as the prediction meteorological field, representing the actual observation values ​​of five meteorological elements hourly at each grid point in the target area for the next 24 hours. Next, a loss function is used to construct sub-units to calculate the content loss between the prediction meteorological field and the ground truth data. Content loss measures the numerical difference between the predicted and ground truth values, using the mean absolute error (L1 loss) function. For each meteorological element c and each forecast lead time k, the mean absolute error for all spatial locations is calculated, and then averaged over all elements and lead times. The calculation formula is: ; in To predict the value of the meteorological field at forecast lead time k, meteorological element c, and grid position (i,j), Y[k,c,i,j] represents the corresponding ground truth value. L1 loss has good robustness to outliers and can stably guide model training.

[0055] Then, the gradient calculation subunit calculates the spatial gradient fields of the predicted meteorological field and the true data in the meridional and latitudinal directions, respectively. For a two-dimensional meteorological field F, its spatial gradient includes the meridional gradient (east-west gradient along the x-direction) and the latitudinal gradient (north-south gradient along the y-direction). The gradient calculation uses the central difference method: for internal grid points (i,j), where 2≤i≤H⁻¹, 2≤j≤W⁻¹; meridional gradient: ; Latitudinal gradient: For boundary grid points, forward or backward difference is used: left boundary j=1: Right boundary j=W: Upper boundary i=1: Lower boundary i=H: ; respectively for the predicted meteorological field The meridional and zonal gradient fields are calculated from the true value Y to obtain... The meridional gradient field of the prediction field, with dimensions K×C×H×W; The latitudinal gradient field of the prediction field, with dimensions K×C×H×W; The true meridional gradient field has dimensions K×C×H×W; : The true latitudinal gradient field, with dimensions K×C×H×W.

[0056] Next, dimensional normalization is performed on each gradient field. Because the dimensions of gradient fields for different meteorological elements vary significantly—for example, the unit of temperature gradient is K / grid point, and the unit of pressure gradient is hPa / grid point—directly calculating gradient loss can lead to gradient-dominant loss functions for certain elements. Therefore, it is necessary to normalize each gradient field to eliminate the influence of dimensions. The standard deviation normalization method is used: for each forecast lead time k and each meteorological element c, the standard deviation of the meridional gradient field of the forecast field is calculated: ; Dividing the predicted meridional gradient field by this standard deviation yields the normalized predicted meridional gradient field: ; Where ε is a very small constant, taken as ε = 1e-8, to prevent division by zero errors. Similarly, the same normalization process is applied to the predicted zonal gradient field, the true meridional gradient field, and the true zonal gradient field, respectively, to obtain: , and .

[0057] Then, the meridional gradient loss and zonal gradient loss are calculated. The meridional gradient loss is the mean absolute error between the normalized predicted meridional gradient field and the true meridional gradient field: ; The same applies to lateral gradient loss: ; Summing the meridional gradient loss and the zonal gradient loss yields the spatial gradient loss. : ; Next, the content loss and spatial gradient loss are weighted and summed according to a preset weight ratio to construct the total loss function L_total: ; Here, λ is the weighting coefficient of the spatial gradient loss, used to balance the contributions of content loss and gradient loss. The value of λ typically ranges from 0.01 to 0.5; in this embodiment, λ = 0.1 is used. This weighting coefficient can be adjusted based on the performance on the validation set. If the model is insufficient in maintaining the intensity of severe weather systems, λ can be appropriately increased; if the model's numerical accuracy decreases, λ can be appropriately decreased.

[0058] To further improve training effectiveness, a progressive training strategy can be adopted: In the first stage, λ=0 is set to learn only the basic morphology of the meteorological field; in the second stage, λ is linearly increased from 0 to 0.05 to learn the gradient structure; in the third stage, λ is increased to 0.1 to strengthen gradient consistency. This embodiment uses a fixed λ=0.1 for training, but the progressive strategy can be selected according to actual needs. The model parameters are iteratively updated using the backpropagation algorithm based on the total loss function. The Adam optimizer is used with an initial learning rate of 1e-4, momentum parameters β1=0.9 and β2=0.999, and a batch size of 32. The above steps are repeated until the model converges or reaches the preset training epochs; in this embodiment, the maximum training epochs are set to 100 epochs. After each epoch, the content loss, gradient loss, and precipitation forecast TS score are evaluated on the validation set. Training is stopped early when the validation set loss no longer decreases for 10 consecutive epochs, and the best-performing model parameters are saved as the final trained meteorological prediction model.

[0059] The real-time forecasting module is used to acquire real-time meteorological grid data, input the real-time meteorological grid data into the trained meteorological forecasting model, and generate meteorological forecast results for the target forecast period through forward propagation of geographically enhanced attention units, multi-scale feature preservation units and differential constraint units.

[0060] Specifically, firstly, real-time meteorological observation data at the current moment is acquired, including data from ground meteorological observation stations, weather radar observation data, meteorological satellite remote sensing data, and initial field data from numerical weather prediction models, covering temperature, air pressure, humidity, wind speed, precipitation, radar reflectivity, cloud image brightness temperature, and gridded meteorological element fields.

[0061] Secondly, the real-time observation data underwent quality control and standardization using the same methods as the data acquisition module: outliers were removed using climate extreme value checks and consistency checks, and missing values ​​were imputed using spatiotemporal interpolation. All data that underwent quality control were uniformly interpolated onto the same target spatial grid as the training phase, i.e., a 0.25° × 0.25° isotropic grid covering 70°E to 140°E and 0°N to 60°N, with grid sizes H=240 and W=560. Continuous variables were interpolated using bilinear interpolation, and discrete variables were interpolated using nearest-neighbor interpolation. Simultaneously, all data were aligned to the current forecast start time t0 using temporal interpolation or by selecting the nearest neighbor time, resulting in standardized real-time meteorological grid data. Its dimensions are L×C×H×W, where L=12 and C=5.

[0062] After the above processing, standardized real-time meteorological grid data is obtained. Its dimensions are L×C×H×W, where L=12, C=5, H=240, W=560, and it contains meteorological element fields for 12 time periods from time t0-11 to time t0. Input a pre-trained weather forecasting model with fixed parameters and perform only forward propagation inference. First, feature embedding is performed through an initial convolutional layer to obtain the initial feature tensor. The dimensions are D×H×W, and in this embodiment, D=64. The input geographic prior-enhanced attention unit generates a latitude weight field through a latitude weight generation sub-unit, and a learnable position bias tensor is generated through an anisotropic position bias sub-unit. The projection consists of the query tensor, key tensor, and value tensor. An initial attention score is calculated, a geographic prior modulation bias is added, and the attention weights are obtained after softmax normalization. Finally, these weighted sums with the value tensor yield the encoded features for real-time geographic awareness. The dimensions are 64×240×560.

[0063] Will Input multi-scale feature preservation unit, while simultaneously processing the original input As input to the auxiliary path, main path coding features at different scales are generated through the main path coding subunit, and auxiliary path features at corresponding resolutions are generated through the auxiliary path downsampling subunit. For the deepest scale, the auxiliary path features are compared with the downsampled features. After concatenation, weighted fusion is performed through a gated fusion subunit. For other scales, the fused features from the previous scale are upsampled and added to the main coding features of the current scale, then concatenated with the aligned auxiliary features of the current scale before gated fusion. This process is repeated until feature fusion at all scales is completed, resulting in real-time multi-scale enhanced decoding features. The dimensions are 64×120×280. The input prediction output layer uses three convolutional layers to progressively map the number of channels from 64 to 256 and then to 120. After bilinear interpolation and upsampling, the original resolution H×W is restored, and the result is reshaped into K×C×H×W to obtain the initial real-time prediction field. The dimensions are 24×5×240×560.

[0064] It should be noted that in the real-time prediction phase, the differential constraint unit does not participate in the forward propagation calculation, because this unit is only used for loss function construction during the training phase and not for prediction generation during the inference phase. Finally, post-processing is performed on the initial real-time prediction field to constrain the output values ​​to a physically reasonable range of meteorological elements, which is then output as the meteorological prediction result for the target forecast period. The post-processing operation is the same as the method in the prediction field generation module. For 2-meter temperature T2M, the clip function is used to truncate the predicted value to the range of 200K to 330K: For sea level pressure (SLP), the predicted values ​​are truncated to the range of 870 hPa to 1100 hPa: For a 10-meter wind speed W10, the predicted value is truncated to the range of 0 m / s to 50 m / s: For relative humidity (RH), the predicted values ​​are truncated to the range of 0% to 100%. For precipitation PRE, the forecast values ​​are truncated to the range of 0 mm to 500 mm: .

[0065] After post-processing, the final real-time weather forecast result is obtained. Its dimensions are 24×5×240×560. The prediction results include the hourly predicted values ​​of five meteorological elements at each grid point in the target area, starting from the current time t0, for the next 24 hours. It can be directly used for meteorological business services, such as weather forecasting, disaster warning, energy dispatch and other application scenarios.

[0066] Finally, it should be noted that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A meteorological forecasting system based on an attention mechanism using geographical prior enhancement and differential constraints, characterized in that, Includes the following modules: The data acquisition module is used to acquire historical meteorological grid data and corresponding ground truth data to build a training dataset; The model building module is used to build a meteorological prediction model, which includes a geographic prior-enhanced attention unit, a multi-scale feature preservation unit, and a differential constraint unit. The geographic prior attention calculation module is used to input historical meteorological grid data from the training dataset into the meteorological prediction model, calculate attention weights carrying geographic prior information through the geographic prior enhanced attention unit, and perform weighted aggregation of input features based on the attention weights to obtain geographic-aware encoded features. The multi-scale feature fusion module is used to perform multi-scale downsampling on the historical meteorological grid data through the multi-scale feature preservation unit to obtain auxiliary road features at different resolutions, and to perform gated fusion of the auxiliary road features with the geographic perception coding features at different scales to obtain multi-scale enhanced decoding features. The prediction field generation module is used to generate the predicted weather field for the current training period based on the decoded features; The model training module is used to calculate the content loss and spatial gradient loss between the predicted meteorological field and the true data through the differential constraint unit, and to construct a total loss function based on the content loss and the spatial gradient loss. The parameters of the meteorological prediction model are iteratively updated according to the total loss function until the model converges, and a trained meteorological prediction model is obtained. The real-time prediction module is used to acquire real-time meteorological grid data, input the real-time meteorological grid data into the trained meteorological prediction model, and generate meteorological prediction results for the target forecast period through forward propagation of the geographically enhanced attention unit, the multi-scale feature preservation unit and the differential constraint unit.

2. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 1, characterized in that, The data acquisition module is specifically configured as follows: Acquire meteorological observation data from multiple sources within a historical time period; The meteorological observation data were subjected to quality control, outliers were removed and missing values ​​were imputed; The meteorological observation data that has undergone quality control is uniformly interpolated onto a preset target spatial grid and time-aligned to obtain standardized historical meteorological grid data; Obtain historical true value data within the same spatiotemporal range as the historical meteorological grid data; The standardized historical meteorological grid data and the historical ground truth data are paired according to their spatiotemporal correspondence to construct a training dataset consisting of sample pairs.

3. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 1, characterized in that, The specific configuration of the model building module is as follows: A geographic prior-enhanced attention unit is constructed, comprising a latitude weight generation subunit, an anisotropic position bias subunit, and an attention calculation subunit. The latitude weight generation subunit generates a latitude weight field based on the grid latitude, the anisotropic position bias subunit generates a learnable position bias carrying meridional and latitudinal propagation coefficients, and the attention calculation subunit calculates attention weights based on the latitude weight field and the learnable position bias. A multi-scale feature preservation unit is constructed, comprising a main path encoding subunit, an auxiliary path downsampling subunit, and a gated fusion subunit. The main path encoding subunit is used to perform deep feature extraction on the input data. The auxiliary path downsampling subunit is used to perform multi-scale pooling on the input data to generate auxiliary path features of different resolutions. The gated fusion subunit is used to perform gated fusion of the auxiliary path features with the features output by the main path encoding subunit at different scales. A differential constraint unit is constructed, which includes a gradient calculation subunit and a loss construction subunit; the gradient calculation subunit is used to calculate the gradient fields of the predicted meteorological field and the true data in the spatial meridional and latitudinal directions, and the loss construction subunit is used to construct a spatial gradient loss function based on the gradient fields; The geographically enhanced attention unit, the multi-scale feature preservation unit, and the differential constraint unit are connected to form a complete meteorological prediction model architecture.

4. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 3, characterized in that, The specific configuration of the geographic prior attention calculation module is as follows: The historical meteorological grid data is input into the geographic prior-enhanced attention unit to extract the initial feature tensor; A dimensional weight field is generated through the dimensional weight generation subunit, and a learnable position bias tensor is generated through the anisotropic position bias subunit. The initial feature tensor is projected into a query tensor, a key tensor, and a value tensor. The query tensor and the key tensor are then multiplied by a matrix and divided by a scaling factor to obtain the initial attention score tensor. The latitude weight field is broadcast and multiplied with the learnable location bias tensor to obtain the geographic prior modulation bias, which is then added to the initial attention score tensor. The attention score tensor after addition is normalized by softmax to obtain the attention weights for geographic prior enhancement; The attention weights and the value tensor are weighted and summed to obtain the encoded features of geographic awareness.

5. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 3, characterized in that, The multi-scale feature fusion module is specifically configured as follows: The historical meteorological grid data is used as both the main road input and the auxiliary road input. Deep feature extraction is performed through the main path coding subunit to generate main path coding features at different scales; multi-scale pooling is performed through the auxiliary path downsampling subunit to generate multi-scale auxiliary path features with the same resolution as the main path coding features at each scale. For the deepest scale, the auxiliary road features of the corresponding scale are aligned with the encoded features of the geographic perception and then concatenated. The gating fusion subunit generates gating weights and performs weighted fusion to obtain the fusion features of the current scale. For the second-deep scale, the fused features of the deepest scale are upsampled and added to the main path coding features of the second-deep scale, and then concatenated with the aligned auxiliary path features of the second-deep scale. The gating fusion subunit generates gating weights and performs weighted fusion to obtain the fused features of the second-deep scale. Following the order from deep to shallow, the fused features of the previous scale are upsampled and added to the main path coding features of the current scale, and then concatenated with the aligned auxiliary path features of the current scale. Weighted fusion is performed through the gated fusion subunit until feature fusion of all scales is completed. The final fused features are output as the decoding features for multi-scale enhancement.

6. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 3, characterized in that, The prediction field generation module is specifically configured as follows: Obtain the multi-scale enhanced decoding features output by the multi-scale feature fusion module; The decoded features are input into the prediction output layer, and convolution operations are performed through one or more convolutional layers to map the features to the channel space of the target meteorological elements, thereby obtaining the initial prediction field. The initial prediction field is post-processed to constrain the output values ​​to a physically reasonable range for meteorological elements; The post-processed initial prediction field is output as the prediction weather field for the current training cycle.

7. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 3, characterized in that, The specific configuration of the model training module is as follows: Obtain the predicted weather field and corresponding ground truth data for the current training period; The content loss between the predicted weather field and the true data is calculated using the loss construction sub-unit. The gradient calculation subunit calculates the gradient fields of the predicted meteorological field and the true data in the spatial meridional and latitudinal directions, respectively. After normalizing the dimensions of each gradient field, the meridional gradient loss and the latitudinal gradient loss are calculated and summed to obtain the spatial gradient loss. The content loss and the spatial gradient loss are weighted and summed according to a preset weight ratio to construct the total loss function; The parameters of the weather prediction model are iteratively updated using the backpropagation algorithm based on the total loss function until the model converges, thus obtaining a trained weather prediction model.

8. The meteorological forecasting system based on geographical prior enhancement and differential constraints based on the attention mechanism according to claim 1, characterized in that, The real-time prediction module is specifically configured as follows: Obtain real-time meteorological observation data at the current moment; The real-time meteorological observation data is subjected to quality control, outliers are removed and missing values ​​are imputed; The quality-controlled real-time meteorological observation data is uniformly interpolated onto a preset target spatial grid and time-aligned to obtain standardized real-time meteorological grid data. The standardized real-time meteorological grid data is input into the trained meteorological prediction model; The real-time meteorological grid data is processed by the geographic prior enhanced attention unit to obtain the coding features of real-time geographic perception. The real-time auxiliary road features are obtained by multi-scale downsampling of the real-time meteorological grid data through the multi-scale feature preservation unit, and the real-time auxiliary road features are gated and fused with the real-time geographic perception coding features at different scales to obtain real-time multi-scale enhanced decoding features. By performing convolutional mapping on the real-time multi-scale enhanced decoded features through the prediction output layer, the features are mapped to the channel space of the target meteorological elements to obtain the initial real-time prediction field. The initial real-time forecast field is post-processed to constrain the output value to a physically reasonable range of meteorological elements, and the output value is used as the meteorological forecast result for the target forecast period.